Speech Reconstruction
Název práce v češtině: | Rekonstrukce mluvené řeči |
---|---|
Název v anglickém jazyce: | Speech Reconstruction |
Klíčová slova: | Automatická editace a korekce textu, transkripce, rozpoznávání řeči, strojové učení |
Klíčová slova anglicky: | Automatic editing and text correction, transcription, speech recognition, machine learning |
Akademický rok vypsání: | 2022/2023 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | prof. RNDr. Jan Hajič, Dr. |
Řešitel: |
Zásady pro vypracování |
Speech Reconstruction is an area of speech processing in which Standard speech is extracted
from spontaneous speech database. Spontaneous speech is quite different, both acoustically and linguistically, from speech produced from written text in a sense that it contains useless information in the form of pauses, hesitations, repetitions, partial words and dysfluencies. Therefore, some robust acoustic and language models are required to handle these discrepancies. From this perspective, there are two different task involved in speech reconstruction i) Automatic Speech Recognition ii) Creation of Standard Speech from Recognized speech as delivered by (i). The main focus of this thesis will be on task (ii) where there is a need to have some flexible models that can handle post speech recognition discrepancies. For this task, we have Spontaneous speech database available for English and Czech Language from different sources and also its standardized version, manually annotated. The data contains dialogue from daily routine speech. Therefore it contains lots of discrepancies as defined above. The task here is to develop a Language and Translation Model, using Deep Learning methods, that can eliminate these discrepancies from speech and make speech available for further processing by standard language tools. This task can be perceived as a Machine Translation task where output from ASR is considered as information in source language and goal is to convert that information into target language (standard text). Technical Aspects: Using a Deep Learning system of choice, to develop a sequence-to-sequence "translation" system as defined above. Experimentally test various DNN architectures and experiment with hyperparameter settings. |
Seznam odborné literatury |
Manning, Schuetze: Foundations of Statistical NLP. MIT Press, 2000.
PIRE: Investigation of Meaning Representations in Language Understanding for Speech Reconstruction and Machine Translation Systems: http://www.clsp.jhu.edu/research/pire/ DNN toolkits, e.g. Tensorflow and their documentation. Deep Learning course(s), such as Milan Straka's NPFL114 (http://ufal.mff.cuni.cz/courses/npfl114/1718-summer, or online (from 2017/8) at https://slideslive.com/s/milan-straka-10654 |
Předběžná náplň práce |
Rekonstrukce mluvené řeči je problém, který řeší konverzi výstupu automatického rozpoznávače řeči do spisovné podoby. Tato úloha má mnoho možných postupů řešení; cílem DP je najít aspoň jeden postup pomocí paradigmatu strojového překladu, který zlepší současnou "baseline" přesnost. |
Předběžná náplň práce v anglickém jazyce |
Speech Reconstruction is an area of speech processing in which Standard speech is extracted from spontaneous speech database. Spontaneous speech is quite different, both acoustically and linguistically, from speech produced from written text in a sense that it contains useless information in the form of pauses, hesitations, repetitions, partial words and disfluencies. Therefore, some robust acoustic and language models are required to handle these discrepancies. |