SubjectsSubjects(version: 945)
Course, academic year 2023/2024
   Login via CAS
Unsupervised Machine Learning in NLP - NPFL097
Title: Neřízené strojové učení v NLP
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2020
Semester: winter
E-Credits: 3
Hours per week, examination: winter s.:1/1, C [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Teaching methods: full-time
Additional information: http://ufal.mff.cuni.cz/courses/npfl097
Guarantor: RNDr. David Mareček, Ph.D.
Class: Informatika Mgr. - volitelný
Classification: Informatics > Computer and Formal Linguistics
Annotation -
Last update: doc. Mgr. Barbora Vidová Hladká, Ph.D. (25.04.2019)
The goal of the course is to introduce basic methods of unsupervised machine learning and their applications in natural language processing. We will discuss methods like Bayesian inference, Expectation-Maximization, Cluster analysis, methods using neural networks and other currently used methods. Selected applications will be discussed in detail and implemented at the lab sessions.
Course completion requirements -
Last update: RNDr. David Mareček, Ph.D. (05.05.2022)

To get the credit, students are required to implement and deliver in time (usually three) programming assignments. Missing points can be obtained in the final test.

Literature -
Last update: RNDr. David Mareček, Ph.D. (24.04.2019)

Christopher Bishop: Pattern Recognition and Machine Learning, Springer-Verlag New York, 2006

Kevin P. Murphy: Machine Learning: A Probabilistic Perspective, The MIT Press, Cambridge, Massachusetts, 2012

Kar Wi Lim, Wray Buntine, Changyou Chen, Lan Du: Nonparametric Bayesian topic modelling with the hierarchical Pitman-Yor processes, International Journal of Approximate Reasoning 78, Elsevier, 2016

Kevin Knight: Bayesian Inference with Tears, 2009, http://www.isi.edu/natural-language/people/bayes-with-tears.pdf

Syllabus -
Last update: RNDr. David Mareček, Ph.D. (05.05.2022)

1. Introduction

2. Beta-Bernouli and Dirichlet-Categorial models

3. Modeling document collections, Categorical Mixture models, Expectation-Maximization

4. Gibbs Sampling, Latent Dirichlet allocation

5. Unsupervised Text Segmentation

6. Unsupervised tagging, Word alignment, Unsupervised parsing

7. K-means, Mixture of Gaussians, Hierarchical clustering, evaluation

8. T-SNE, Principal Component Analysis, Independent Component Analysis

9. Linguistic Interpretation of Neural Networks

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html