Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Syntaktická analýza textů se střídáním kódů

Thesis title in Czech:	Syntaktická analýza textů se střídáním kódů
Thesis title in English:	Parsing of Texts with Code-Switching
Key words:	syntaktická analýza, závislostní analýza, treebank, universal dependencies, střídání kódů
English key words:	parsing, dependency parsing, treebank, universal dependencies, code switching
Academic year of topic announcement:	2017/2018
Thesis type:	diploma thesis
Thesis language:	angličtina
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	doc. RNDr. Daniel Zeman, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	08.03.2018
Date of assignment:	11.03.2018
Confirmed by Study dept. on:	07.08.2018
Date and time of defence:	11.09.2018 09:00
Date of electronic submission:	20.07.2018
Date of submission of printed version:	20.07.2018
Date of proceeded defence:	11.09.2018
Opponents:	RNDr. David Mareček, Ph.D.

Guidelines

The aim of this thesis is to create and evaluate systems for dependency parsing of code-switched language data (i.e. utterances where speakers use two languages and switch between them freely). This involves several tasks. Besides selecting and training existing dependency parsers, it will be also necessary to adapt them for the domain of the task (code-switching is often tied to informal domains such as social media). Some attention should be paid to tokenization and preprocessing so that the parser can operate on raw text. The main task is then the model selection (i.e. language recognition) and/or training a joint model for the two languages. The parsing system will be evaluated on at least one language pair, depending on data availability. Code-switched corpora are being developed for several language pairs but their manual syntactic annotation may not be available in time for this thesis. If gold-standard data cannot be obtained from other sources, a small evaluation dataset will be manually annotated as a part of this thesis project.

References

* Bhat, Irshad & Bhat, Riyaz & Shrivastava, Manish. (2018). Universal Dependency Parsing for Hindi-English Code-switching.

* Özlem Çetinoğlu and Çağrı Çöltekin. (2016). Part of Speech Annotation of a Turkish-German Code-Switching Corpus. In the Proceedings of the 10th Linguistic Annotation Workshop (LAW-X), August 2016, Berlin, Germany.