Universal Morphological Analysis using ReinforcementLearning
Thesis title in Czech: | Univerzální morfologická analýza s využitím reinforcement learning |
---|---|
Thesis title in English: | Universal Morphological Analysis using ReinforcementLearning |
Key words: | morfologická analýza, reinforcement learning |
English key words: | morphological analysis, reinforcement learning |
Academic year of topic announcement: | 2018/2019 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. RNDr. Daniel Zeman, Ph.D. |
Author: | Mgr. Ronald Ahmed Cardenas Acosta - assigned and confirmed by the Study Dept. |
Date of registration: | 25.02.2019 |
Date of assignment: | 07.03.2019 |
Confirmed by Study dept. on: | 25.04.2019 |
Date and time of defence: | 04.02.2020 09:00 |
Date of electronic submission: | 04.01.2020 |
Date of submission of printed version: | 06.01.2020 |
Date of proceeded defence: | 04.02.2020 |
Opponents: | RNDr. David Mareček, Ph.D. |
Guidelines |
In this thesis we take a universal approach to morphological analysis in context. The approach consists of jointly simulating word formation steps and morphological label assignment, one step at a time. Such mechanism is modeled as a neural WFSA (Schwartz et al., 2018), in an effort to add interpretability to an otherwise ‘blackbox’ architecture. Then, the problem is formulated as a multi-armed bandit problem in which each arm captures a specific kind of word formation process. Each arm can then learn how word formation processes are carried out in different languages. Moreover, the model has the potential to learn how to combine processes from different arms, i.e. to model how a language can combine different kind of processes in the same derivation (e.g. German exhibits circumfixation, affixation, and compounding).
Our model leverages paradigm annotations and morphologically labeled sentences in a varied sample of high resource languages made available by the CONLL-SIGMORPHON shared tasks. We evaluate the effectiveness of our approach in high and low-resource scenarios against strong neural baselines for the languages of English, Spanish, German, Czech, Turkish, and Shipibo-Konibo. |
References |
Ramy Eskander, Owen Rambow, and Smaranda Muresan. 2018. Automatically tailoring
unsupervised morphological segmentation to the language. In Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 78-83. Ramy Eskander, Owen Rambow, and Tianchun Yang. 2016. Extending the Use of Adaptor Grammars for Unsupervised Morphological Segmentation of Unseen Languages. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 900-910. Mark Johnson. 2008. Unsupervised word segmentation for sesotho using adaptor grammars. In Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology, pages 20-27. Association for Computational Linguistics. Hao Peng, Roy Schwartz, Sam Thomson, and Noah A Smith. 2018. Rational recurrences. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1203-1214. Roy Schwartz, Sam Thomson, and Noah A. Smith. 2018. SoPa: Bridging CNNs, RNNs, and Weighted Finite-State Machines. Kairit Sirts and Sharon Goldwater. 2013. Minimally-supervised morphological segmentation using adaptor grammars. Transactions of the Association of Computational Linguistics, 1:255-266. |