Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Detekce střihů a vyhledávání známých scén ve videu s pomocí metod hlubokého učení

Thesis title in Czech:	Detekce střihů a vyhledávání známých scén ve videu s pomocí metod hlubokého učení
Thesis title in English:	Deep learning based approaches for shot transition detection and known-item search in video
Key words:	hluboké učení, detekce střihů, hledání známé scény, učení reprezentací
English key words:	deep learning, shot transition detection, known-item search, representation learning
Academic year of topic announcement:	2019/2020
Thesis type:	diploma thesis
Thesis language:	angličtina
Department:	Department of Software Engineering (32-KSI)
Supervisor:	doc. RNDr. Jakub Lokoč, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	02.05.2020
Date of assignment:	02.05.2020
Confirmed by Study dept. on:	05.08.2020
Date and time of defence:	14.09.2020 09:00
Date of electronic submission:	30.07.2020
Date of submission of printed version:	30.07.2020
Date of proceeded defence:	14.09.2020
Opponents:	Mgr. Ladislav Peška, Ph.D.

Guidelines

Video retrieval still represents a challenging problem, where video analysis and ranking models require new advancements to improve search effectiveness.
At the same time, deep learning has become a well-established technology to address the above mentioned issues. The goal of this thesis is to design effective
deep learning based approaches for automatic shot transition detection and text-image search. The approaches will be experimentally evaluated using benchmark datasets.

References

Jakub Lokoč, Werner Bailer, Klaus Schoeffmann, Bernd Münzer, George Awad: On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015-2017. IEEE Trans. Multimedia 20(12): 3361-3376 (2018)

Michael Gygli: Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks. CBMI 2018: 1-4

Xirong Li, Chaoxi Xu, Gang Yang, Zhineng Chen, Jianfeng Dong: W2VV++: Fully Deep Learning for Ad-hoc Video Search. ACM Multimedia 2019: 1786-1794

Ian J. Goodfellow, Yoshua Bengio, Aaron C. Courville: Deep Learning. Adaptive computation and machine learning, MIT Press 2016, ISBN 978-0-262-03561-3, pp. 1-775

Preliminary scope of work

Vyhledávání ve videu představuje náročný problém s mnoha záludnostmi a dílčími problémy. Tato práce se zaměřuje na dva z těchto podproblémů, konkrétně na detekci střihů a textové vyhledávání. V případě detekce střihů bylo v posledních desetiletích navrženo mnoho řešení. Nedávné přístupy založené na hlubokém učení zlepšily přesnost detekce pomocí 3D konvolučních architektur a uměle vytvořených trénovacích dat, ale stoprocentní přesnost je stále nedosažitelným ideálem. V této práci představujeme TransNet V2, hlubokou síť pro detekci střihů, která dosahuje nejlepších výsledků v porovnání s konkurenčními metodami na respekovaných datasetech. V případě druhého námi řešeného problému textového vyhledávání se ukázaly jako efektivní řešení hluboké neuronové sítě promítající textové dotazy a snímky videa do společného prostoru. V této práci zkoumáme použítí těchto sítí pro případ hledání známého objektu ve videu a navrhujeme vylepšení způsobu, jakým lze zakódovat textový dotaz.

Preliminary scope of work in English

Video retrieval represents a challenging problem with many caveats and sub-problems. This thesis focuses on two of these sub-problems, namely shot transition detection and text-based search. In the case of shot detection, many solutions have been proposed over the last decades. Recently, deep learning-based approaches improved the accuracy of shot transition detection using 3D convolutional architectures and artificially created training data, but one hundred percent accuracy is still an unreachable ideal. In this thesis we present a deep network for shot transition detection TransNet V2 that reaches state-of-the-art performance on respected benchmarks. In the second case of text-based search, deep learning models projecting textual query and video frames into a joint space proved to be effective for text-based video retrieval. We investigate these query representation learning models in a setting of known-item search and propose improvements for the text encoding part of the model.