Content classification in legal documents
Thesis title in Czech: | Klasifikace obsahu právních dokumentů |
---|---|
Thesis title in English: | Content classification in legal documents |
Key words: | NLP, klasifikace obsahu, právní doména |
English key words: | NLP, content classification, legal domain |
Academic year of topic announcement: | 2016/2017 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | prof. Ing. Zdeněk Žabokrtský, Ph.D. |
Author: | hidden![]() |
Date of registration: | 20.02.2017 |
Date of assignment: | 20.02.2017 |
Confirmed by Study dept. on: | 24.02.2017 |
Date and time of defence: | 07.06.2017 09:00 |
Date of electronic submission: | 12.05.2017 |
Date of submission of printed version: | 12.05.2017 |
Date of proceeded defence: | 07.06.2017 |
Opponents: | RNDr. Martin Holub, Ph.D. |
Guidelines |
In the present day, the amount of text-based data that businesses or single users have is growing very fast. People are unable of manual processing such amounts of data, which gives space for NLP to take care of it. This work focuses on processing official documents such as contracts, leases, deeds, invoices and orders. The main goal of this work is to design, implement and evaluate a software module capable of finding and labeling paragraphs in a given document which contain specific information such as contract parties, lease terms or clauses. The system will be able to process documents in English and in Czech. This thesis requires understanding of the ‘legal language’ at least on a basic level in both languages, knowledge of the most common machine learning and/or rule based approaches to text classification and implementation skills. |
References |
Aggarwal, Charu C a Zhai, ChengXiang. 2012. Mining Text Data. Boston : Springer, 2012. 978-1-4614-3223-4.
Alpaydin, Ethem. 2014. Introduction to Machine Learning. s.l. : The MIT Press, 2014. 0262028182. Corpus Based Classification of Text in Australian Contracts. Curtotti, Michael a Mccreath, Eric. Duda, Richard O., Hart, Peter E. and Stork, David G. 2000. Pattern Classification. s.l. : Wiley-Interscience, 2000. 0471056693. Jurafsky, Daniel a Martin, James H. 2015. Speech and Language Processing. 2015. |