Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Universal Morphological Analysis using ReinforcementLearning

Thesis title in Czech:	Univerzální morfologická analýza s využitím reinforcement learning
Thesis title in English:	Universal Morphological Analysis using ReinforcementLearning
Key words:	morfologická analýza, reinforcement learning
English key words:	morphological analysis, reinforcement learning
Academic year of topic announcement:	2018/2019
Thesis type:	diploma thesis
Thesis language:	angličtina
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	doc. RNDr. Daniel Zeman, Ph.D.
Author:	Mgr. Ronald Ahmed Cardenas Acosta - assigned and confirmed by the Study Dept.
Date of registration:	25.02.2019
Date of assignment:	07.03.2019
Confirmed by Study dept. on:	25.04.2019
Date and time of defence:	04.02.2020 09:00
Date of electronic submission:	04.01.2020
Date of submission of printed version:	06.01.2020
Date of proceeded defence:	04.02.2020
Opponents:	RNDr. David Mareček, Ph.D.

Guidelines

In this thesis we take a universal approach to morphological analysis in context. The approach consists of jointly simulating word formation steps and morphological label assignment, one step at a time. Such mechanism is modeled as a neural WFSA (Schwartz et al., 2018), in an effort to add interpretability to an otherwise ‘blackbox’ architecture. Then, the problem is formulated as a multi-armed bandit problem in which each arm captures a specific kind of word formation process. Each arm can then learn how word formation processes are carried out in different languages. Moreover, the model has the potential to learn how to combine processes from different arms, i.e. to model how a language can combine different kind of processes in the same derivation (e.g. German exhibits circumfixation, affixation, and compounding).
Our model leverages paradigm annotations and morphologically labeled sentences in a varied sample of high resource languages made available by the CONLL-SIGMORPHON shared tasks. We evaluate the effectiveness of our approach in high and low-resource scenarios against strong neural baselines for the languages of English, Spanish, German, Czech, Turkish, and Shipibo-Konibo.

References

Ramy Eskander, Owen Rambow, and Smaranda Muresan. 2018. Automatically tailoring
unsupervised morphological segmentation to the language. In Proceedings of the Fifteenth
Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages
78-83.

Ramy Eskander, Owen Rambow, and Tianchun Yang. 2016. Extending the Use of Adaptor
Grammars for Unsupervised Morphological Segmentation of Unseen Languages. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 900-910.

Mark Johnson. 2008. Unsupervised word segmentation for sesotho using adaptor grammars.
In Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational
Morphology and Phonology, pages 20-27. Association for Computational Linguistics.

Hao Peng, Roy Schwartz, Sam Thomson, and Noah A Smith. 2018. Rational recurrences. In
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,
pages 1203-1214.

Roy Schwartz, Sam Thomson, and Noah A. Smith. 2018. SoPa: Bridging CNNs, RNNs, and
Weighted Finite-State Machines.

Kairit Sirts and Sharon Goldwater. 2013. Minimally-supervised morphological segmentation
using adaptor grammars. Transactions of the Association of Computational Linguistics,
1:255-266.