Thesis (Selection of subject)Thesis (Selection of subject)(version: 385)
Thesis details
   Login via CAS
Content classification in legal documents
Thesis title in Czech: Klasifikace obsahu právních dokumentů
Thesis title in English: Content classification in legal documents
Key words: NLP, klasifikace obsahu, právní doména
English key words: NLP, content classification, legal domain
Academic year of topic announcement: 2016/2017
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: prof. Ing. Zdeněk Žabokrtský, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 20.02.2017
Date of assignment: 20.02.2017
Confirmed by Study dept. on: 24.02.2017
Date and time of defence: 07.06.2017 09:00
Date of electronic submission:12.05.2017
Date of submission of printed version:12.05.2017
Date of proceeded defence: 07.06.2017
Opponents: RNDr. Martin Holub, Ph.D.
 
 
 
Guidelines
In the present day, the amount of text-based data that businesses or single users have is growing very fast. People are unable of manual processing such amounts of data, which gives space for NLP to take care of it. This work focuses on processing official documents such as contracts, leases, deeds, invoices and orders. The main goal of this work is to design, implement and evaluate a software module capable of finding and labeling paragraphs in a given document which contain specific information such as contract parties, lease terms or clauses. The system will be able to process documents in English and in Czech. This thesis requires understanding of the ‘legal language’ at least on a basic level in both languages, knowledge of the most common machine learning and/or rule based approaches to text classification and implementation skills.
References
Aggarwal, Charu C a Zhai, ChengXiang. 2012. Mining Text Data. Boston : Springer, 2012. 978-1-4614-3223-4.
Alpaydin, Ethem. 2014. Introduction to Machine Learning. s.l. : The MIT Press, 2014. 0262028182.
Corpus Based Classification of Text in Australian Contracts. Curtotti, Michael a Mccreath, Eric.
Duda, Richard O., Hart, Peter E. and Stork, David G. 2000. Pattern Classification. s.l. : Wiley-Interscience, 2000. 0471056693.
Jurafsky, Daniel a Martin, James H. 2015. Speech and Language Processing. 2015.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html