SubjectsSubjects(version: 945)
Course, academic year 2023/2024
   Login via CAS
Introduction to Machine Learning with Python - NPFL129
Title: Úvod do strojového učení v Pythonu
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2023
Semester: winter
E-Credits: 5
Hours per week, examination: winter s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Teaching methods: full-time
Additional information: http://ufal.mff.cuni.cz/courses/npfl129
Guarantor: Mgr. Jindřich Libovický, Ph.D.
Incompatibility : NPFL054
Interchangeability : NPFL054
Is incompatible with: NPFL054
Is interchangeable with: NPFL054
Annotation -
Last update: doc. Mgr. Barbora Vidová Hladká, Ph.D. (15.05.2019)
Machine learning is reaching notable success when solving complex tasks in many fields. This course serves as in introduction to basic machine learning concepts and techniques, focusing both on the theoretical foundation, and on implementation and utilization of machine learning algorithms in Python programming language. High attention is paid to the ability of application of the machine learning techniques on practical tasks, in which the students try to devise a solution with highest performance.
Aim of the course -
Last update: Mgr. Jindřich Libovický, Ph.D. (12.03.2024)

After this course, students should…

  • Be able to reason about tasks/problems suitable for ML
  • Know when to use classification, regression, and clustering
  • Be able to choose from this method Linear and Logistic Regression, Multilayer Perceptron, Nearest Neighbors, Naive Bayes, Gradient Boosted Decision Trees, kk-means clustering
  • Think about learning as (mostly probabilistic) optimization on training data
  • Know how the ML methods learn, including theoretical explanation
  • Know how to properly evaluate ML
  • Think about generalization (and avoiding overfitting)
  • Be able to choose a suitable evaluation metric
  • Responsibly decide what model is better
  • Be able to implement ML algorithms on a conceptual level
  • Be able to use Scikit-learn to solve ML problems in Python
Course completion requirements -
Last update: RNDr. Milan Straka, Ph.D. (10.05.2020)

Students pass the practicals by submitting sufficient number of assignments. The assignments are announced regularly through the whole semester (usually two per lecture) and are due in several weeks. Considering the rules for completing the practicals, it is not possible to retry passing it. Passing the practicals is not a requirement for going to the exam.

Literature -
Last update: RNDr. Milan Straka, Ph.D. (10.05.2020)
  • Christopher M. Bishop: Pattern Recognition and Machine Learning. Springer Verlag. 2006.
  • John Platt: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. 1998.
  • Tianqi Chen, Carlos Guestrin: XGBoost: A Scalable Tree Boosting System. 2016.
  • https://scikit-learn.org/
Requirements to the exam -
Last update: RNDr. Milan Straka, Ph.D. (15.06.2020)

The exam is written and consists of questions randomly chosen from a publicly known list. The requirements of the exam correspond to the course syllabus, in the level of detail which was presented on the lectures.

Syllabus -
Last update: Mgr. Jindřich Libovický, Ph.D. (12.03.2024)

Basic machine learning concepts

  • supervised learning, unsupervised learning, reinforcement learning
  • fitting, generalization, overfitting, regularization
  • data generating distribution, train/development/test set

Linear regression

  • analytical solution
  • a solution based on stochastic gradient descent (SGD)

Classification

  • binary classification via perceptron
  • binary classification using logistic regression
  • multiclass classification using logistic regression
  • deriving sigmoid and softmax functions from the maximum entropy principle
  • classification with a multilayer perceptron (MLP)
  • naive Bayes classifier
  • maximum margin binary classifiers
  • Support vector machines (SVM)

Text representation

  • TF-IDF
  • Word embeddings

Decision trees

  • classification and regression trees (CART)
  • random forests
  • gradient boosting decision trees (GBDT)

Clustering

  • K-Means algorithm

Dimensionality reduction

  • singular value decomposition
  • principal component analysis (PCA)

Training

  • dataset preparation, classification features design
  • constructing loss functions according to the maximum likelihood estimation principle
  • first-order gradient methods (SGD) and second-order methods
  • regularization

Statistical testing

  • Student t-test
  • Chi-squared test
  • correlation coefficients
  • paired bootstrap test

Used Python libraries

  • numpy (n-dimensional array representation and their manipulation)
  • scikit-learn (construction of machine learning pipelines)
  • matplotlib (visualization)

This course is also part of the inter-university programme prg.ai Minor. It pools the best of AI education in Prague to provide students with a deeper and broader insight into the field of artificial intelligence. More information is available at prg.ai/minor.

Entry requirements -
Last update: RNDr. Milan Straka, Ph.D. (08.10.2021)

Basic programming skills in Python and basic knowledge of differential calculus and linear algebra (working with vectors and matrices) are required; knowledge of probability and statistics is recommended.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html