Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Semantic disambiguation using Distributional Semantics

Thesis title in Czech:	Semantic disambiguation using Distributional Semantics
Thesis title in English:	Semantic disambiguation using Distributional Semantics
Key words:	-
English key words:	WORD SENSE DISAMBIGUATION, VECTOR SPACE MODEL, PRAGUE DEPENDENCY TREEBANK
Academic year of topic announcement:	2010/2011
Thesis type:	diploma thesis
Thesis language:	angličtina
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	RNDr. Jiří Hana, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	10.12.2010
Date of assignment:	14.01.2011
Date and time of defence:	10.05.2012 00:00
Date of electronic submission:	13.04.2012
Date of submission of printed version:	13.04.2012
Date of proceeded defence:	10.05.2012
Opponents:	doc. Mgr. Barbora Vidová Hladká, Ph.D.

Guidelines

The goal of this thesis is to employ the combination of Distributional Semantics as used in Natural Language Programming (e.g. Schütze 1998) and of the traditional propositional semantics, as suggested for example by E. Hovy (2010), in a task of automatic categorization (for example, lemma disambiguation on the Prague Dependency Treebank).
E. Hovy's semantics combines traditional propositional semantics based on symbolic logic and statistical word distribution information of Distributional Semantics as used in Natural Language Programming (e.g. Schütze 1998). The core resource is a single lexico-semantic lexicon where concepts are organized as tensors encoding strenght of relations
to other concepts. Using these strenghts of relations, appropriateness of terms given a particular context can be determined, and used for a variety of tasks, including term disambiguation. Distributional Semantics has a strong cognitive plausibility, as shown for example by its ability to predict human brain activity associated with the meanings of nouns (Mitchell et al 2008).
The result of this thesis should be a system performing automatic categorization using Hovy's semantics, for example, a system for lexical disambiguation tested on the Prague Dependency Treebank. Lexical disambiguation is a process of determining the correct meaning of a word based on its context (e.g. determining whether 'bank' refers to an institution or to a river bank).

References

Hovy, Eduard (2010): Distributional Semantics and the Lexicon, Keynote speech at COLLING 2010.

Landauer, Thomas K. and Dumais, Susan T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104(2), 211-240.

Mitchell, Tom M.; Shinkareva, Svetlana V.; Carlson, Andrew; Chang, Kai-Min; Malave, Vicente L.; Mason, Robert A.; Just, Marcel Adam (2008). Predicting human brain activity
associated with the meanings of nouns. Science, 320, 1191-1195.

Schütze, Hinrich (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97-123.

Stefan Evert, Alessandro Lenci: Distributional Semantic Models - A course at ESSLLI 2009, Bordeaux, July 27-31 2009.

Lin, Dekang (1998). Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics (COLING-ACL 1998), pages 768-774, Montreal, Canada.