SubjectsSubjects(version: 945)
Course, academic year 2023/2024
   Login via CAS
Algorithms in Speech Recognition - NPFX079
Title: Algoritmy rozpoznávání mluvené řeči
Guaranteed by: Student Affairs Department (32-STUD)
Faculty: Faculty of Mathematics and Physics
Actual: from 2022
Semester: summer
E-Credits: 6
Hours per week, examination: summer s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: taught
Language: Czech
Teaching methods: full-time
Teaching methods: full-time
Is provided by: NPFL079
Guarantor: Mgr. Nino Peterek, Ph.D.
Class: DS, matematická lingvistika
Informatika Mgr. - volitelný
Classification: Informatics > Computer and Formal Linguistics
Pre-requisite : {NXXX011, NXXX012, NXXX013, NXXX070, NXXX071}
Incompatibility : NPFL079
Interchangeability : NPFL079
Annotation -
Last update: doc. Mgr. Barbora Vidová Hladká, Ph.D. (25.04.2019)
The course presents recent methodologies and software toolkits for speech recognition. Students will learn how to develop systems of automatic speech recognition and transcription, computer dialogue systems and speaker identification. The course shows principles, preparation and decoding algorithms of statistical acoustic and language models (HMM, n-gram and structured language models, final state transducers, graphical models, Viterbi dynamic programming, heuristic hypothesis search strategies, stack decoder, neural networks).
Course completion requirements -
Last update: Mgr. Nino Peterek, Ph.D. (10.06.2019)

For successful completion of course programming of three small projects necessary (speech library functions and a small speech application) and oral exam.

Literature -
Last update: Mgr. Nino Peterek, Ph.D. (11.05.2022)

[JEL] F. Jelinek, Statistical Methods for Speech Recognition, MIT Press, 1998

[PSU] J. Psutka, L. Müller, J. Matoušek, V. Radová, Mluvíme s počítačem česky, Academia, 2006

[SPO] X. Huang, A. Acero, H. Hon, Spoken Language Processing, Prentice-Hall, 2001

[DLA] Dong Yu,Li Deng, Automatic Speech Recognition A Deep Learning Approach, Springer, 2015

[KLW] U. Kamath, J. Liu, J. Whitaker, Deep Learning for NLP and Speech Recognition, Springer, 2019

Requirements to the exam -
Last update: Mgr. Nino Peterek, Ph.D. (10.06.2019)

Exam covers presented themes, there is only oral exam.

Finalisation of practical part is not necessary before the exam.

Syllabus -
Last update: Mgr. Nino Peterek, Ph.D. (11.06.2019)

Overview of speech technologies

  • wonders of speech recognition,
  • main applications and their architectures,
  • theories and models overview,
  • software toolkits and libraries,
  • speech processing books and magazines.

Acoustic Modelling (SPO C8-C9 | JEL C2-C3 | PSU C5.3 | DLA C3+C6, partially repetition of NPFL038)

  • definition and parameters of the hidden Markov model (HMM),
  • evaluation of an HMM (Forward algorithm),
  • training of an HMM (Baum-Welch algorithm),
  • extracting features of speech, scoring acoustic features (MFCC, Gaussians mixtures, parameters clustering),
  • adaptive techniques (MAP, MLLR),
  • confidence measures,
  • software toolkits for speech recognition.

Language Modelling (NPFL067 | JEL C4 | SPO C11 | PSU 5.4)

  • methods of language modelling,
  • n-gram models, smoothing (Good-Turing, Katz), adaptive language models,
  • structured language models (PCFG),
  • specifics of spoken and writen language modelling,
  • transducers and software tools for language modelling.

Basic decoding techniques (SPO C12 | JEL C5-C6 | PSU C6)

  • search algorithms (search space and heuristics, A*),
  • combining acoustic and language models (uni-, bi-, trigrams),
  • time-synchronous search (Viterbi, beam, tree lexicon),
  • state-synchronous search.

Large vocabulary search algorithms (SPO C13 | JEL C5-C6 | PSU 6.7.3, 6.7.5, 6.10)

  • efficient manipulation of tree lexicon,
  • N-best and multipass search strategies.

Automatic dialogue systems (SPO C17 | PSU C11)

  • characteristics of spontaneous dialogues,
  • prosody and structure of dialogues,
  • semantic representation,
  • dialogue management, emotion detection,
  • VoiceXML.

Speaker identification (PSU C9)

  • identification systems overview,
  • selected speech features for speaker identification,
  • basic methods.

This course can be preceded by NPFL038 and combined with NPFL067, NPFL068, NPFL123.

The software tools and libraries will be introduced and trained in the practical part of course.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html