Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 385)
Detail práce
   Přihlásit přes CAS
Speech Reconstruction
Název práce v češtině: Rekonstrukce mluvené řeči
Název v anglickém jazyce: Speech Reconstruction
Klíčová slova: Automatická editace a korekce textu, transkripce, rozpoznávání řeči, strojové učení
Klíčová slova anglicky: Automatic editing and text correction, transcription, speech recognition, machine learning
Akademický rok vypsání: 2022/2023
Typ práce: diplomová práce
Jazyk práce: angličtina
Ústav: Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel: prof. RNDr. Jan Hajič, Dr.
Řešitel:
Zásady pro vypracování
Speech Reconstruction is an area of speech processing in which Standard speech is extracted
from spontaneous speech database. Spontaneous speech is quite different, both acoustically and
linguistically, from speech produced from written text in a sense that it contains useless
information in the form of pauses, hesitations, repetitions, partial words and dysfluencies.
Therefore, some robust acoustic and language models are required to handle these discrepancies.
From this perspective, there are two different task involved in speech reconstruction

i) Automatic Speech Recognition
ii) Creation of Standard Speech from Recognized speech as delivered by (i).

The main focus of this thesis will be on task (ii) where there is a need to have some flexible
models that can handle post speech recognition discrepancies. For this task, we have
Spontaneous speech database available for English and Czech Language from different sources
and also its standardized version, manually annotated. The data contains dialogue from daily
routine speech. Therefore it contains lots of discrepancies as defined above. The task here is to
develop a Language and Translation Model, using Deep Learning methods, that can eliminate
these discrepancies from speech and make speech available for further processing by standard
language tools.

This task can be perceived as a Machine Translation task where output from ASR is
considered as information in source language and goal is to convert that information into target
language (standard text).

Technical Aspects:

Using a Deep Learning system of choice, to develop a sequence-to-sequence "translation" system as
defined above. Experimentally test various DNN architectures and experiment with
hyperparameter settings.
Seznam odborné literatury
Manning, Schuetze: Foundations of Statistical NLP. MIT Press, 2000.
PIRE: Investigation of Meaning Representations in Language Understanding for Speech
Reconstruction and Machine Translation Systems: http://www.clsp.jhu.edu/research/pire/
DNN toolkits, e.g. Tensorflow and their documentation.
Deep Learning course(s), such as Milan Straka's NPFL114 (http://ufal.mff.cuni.cz/courses/npfl114/1718-summer,
or online (from 2017/8) at https://slideslive.com/s/milan-straka-10654
Předběžná náplň práce
Rekonstrukce mluvené řeči je problém, který řeší konverzi výstupu automatického rozpoznávače řeči do spisovné podoby. Tato úloha má mnoho možných postupů řešení; cílem DP je najít aspoň jeden postup pomocí paradigmatu strojového překladu, který zlepší současnou "baseline" přesnost.
Předběžná náplň práce v anglickém jazyce
Speech Reconstruction is an area of speech processing in which Standard speech is extracted from spontaneous speech database. Spontaneous speech is quite different, both acoustically and linguistically, from speech produced from written text in a sense that it contains useless information in the form of pauses, hesitations, repetitions, partial words and disfluencies. Therefore, some robust acoustic and language models are required to handle these discrepancies.
 
Univerzita Karlova | Informační systém UK