Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Řešení dezinformací v srbochorvatštině: korpusy a experimenty

Thesis title in Czech:	Řešení dezinformací v srbochorvatštině: korpusy a experimenty
Thesis title in English:	Tackling misinformation in Serbo-Croatian: corpora and experiments
Key words:	NLP\|dezinformace\|srbochorvatština\|korpus\|klasifikace
English key words:	NLP\|misinformation\|fake news\|Serbo-Croatian\|corpora\|classification
Academic year of topic announcement:	2022/2023
Thesis type:	diploma thesis
Thesis language:
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	RNDr. Jiří Hana, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	29.03.2023
Date of assignment:	30.03.2023
Confirmed by Study dept. on:	08.03.2024

Guidelines

Explore the area of misinformation in the news written in Serbo-Croatian (Serbian, Croatian, Bosnian, Montenegrin; closely related South Slavic languages).

- Create a news corpus with metadata describing whether the articles is trustworthy and if not then in which respect
- Evaluate the possibilities of automatic processing, for example:
- Classification of news articles: binary (truthful vs misinformation) or mutlilabel (fake news, pseudoscience, conspiracy theory, etc.)
- Claim detection - extraction of claims from articles

References

- Max Glockner, Yufang Hou, and Iryna Gurevych. 2022. Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for Misinformation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5916–5936, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A Survey on Automated Fact-Checking. Transactions of the Association for Computational Linguistics, 10:178–206.
- Isabelle Augenstein. 2021. Towards Explainable Fact Checking. ArXiv, abs/2108.10274.
- Nikola Ljubešić and Davor Lauc. 2021. BERTić - The Transformer Language Model for Bosnian, Croatian, Montenegrin and Serbian. In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, pages 37–42, Kiyv, Ukraine. Association for Computational Linguistics.
- James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: a Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana. Association for Computational Linguistics.