Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Vícejazyčné víceoborové generování školních testů, které se obtížně řeší automaticky

Thesis title in Czech:	Vícejazyčné víceoborové generování školních testů, které se obtížně řeší automaticky
Thesis title in English:	Multilingual multidomain generation of school tests that are hard to solve automatically
Key words:	generování přirozeného jazyka\|velké jazykové modely
English key words:	natural language generation\|large language models
Academic year of topic announcement:	2023/2024
Thesis type:	diploma thesis
Thesis language:
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	Mgr. Rudolf Rosa, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	22.04.2024
Date of assignment:	22.04.2024
Confirmed by Study dept. on:	22.04.2024

Guidelines

With the advance of Large Language Models (LLMs), it has been found that many tests commonly used at schools and universities can be solved automatically to a certain degree by providing the test question as the input prompt for a LLM and using the generated output as the answer.

The goal of the thesis is to explore ways of automatically generating test questions that are hard to solve automatically in this way.

The suggested approach is to iteratively generate test questions with a LLM and verify whether the LLM can generate a satisfactory answer.

The work on the thesis is expected to include the following components:
- reviewing existing literature on employing LLMs for test generation and test answer generation
- gathering and processing data representing various school test questions and answers
- designing a setup utilizing LLMs to generate test questions
- designing a setup utilizing LLMs to generate answers to the test questions
- devising and implementing methods to evaluate the quality of the generated questions
- devising and implementing methods to evaluate the correctness of the generated answers
- comparison of the performance on open-book and closed-book questions and answers
- comparison of the performance across multiple domains (e.g. mathematics, geography, history)
- comparison of the performance across multiple levels (e.g. primary school, secondary school, university)
- comparison of the performance across multiple languages
- comparison of the performance across multiple available LLMs

References

- Denny, P., Khosravi, H., Hellas, A., Leinonen, J., & Sarsa, S. (2023). Can We Trust AI-Generated Educational Content? Comparative Analysis of Human and AI-Generated Learning Resources (arXiv:2306.10509). arXiv. https://doi.org/10.48550/arXiv.2306.10509
- Doughty, J., Wan, Z., Bompelli, A., Qayum, J., Wang, T., Zhang, J., Zheng, Y., Doyle, A., Sridhar, P., Agarwal, A., Bogart, C., Keylor, E., Kultur, C., Savelka, J., & Sakr, M. (2024). A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education. Proceedings of the 26th Australasian Computing Education Conference, 114–123. https://doi.org/10.1145/3636243.3636256
- Elkins, S., Kochmar, E., Cheung, J. C. K., & Serban, I. (2023). How Useful are Educational Questions Generated by Large Language Models? (arXiv:2304.06638). arXiv. https://doi.org/10.48550/arXiv.2304.06638
- Perkoff, E. M., Bhattacharyya, A., Cai, J., & Cao, J. (2023). Comparing Neural Question Generation Architectures for Reading Comprehension. In E. Kochmar, J. Burstein, A. Horbach, R. Laarmann-Quante, N. Madnani, A. Tack, V. Yaneva, Z. Yuan, & T. Zesch (Eds.), Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023) (pp. 556–566). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.bea-1.47
- Savelka, J., Agarwal, A., Bogart, C., & Sakr, M. (2023). Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code (arXiv:2303.08033). arXiv. https://doi.org/10.48550/arXiv.2303.08033