Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

A study of applying copulas in data mining

Název práce v češtině:	Dobývání znalostí z dat pomocí kopulí
Název v anglickém jazyce:	A study of applying copulas in data mining
Klíčová slova:	data mining, vztahy mezi atributy, pravděpodobnostní vztahy, kopule, typy kopulí
Klíčová slova anglicky:	data mining, relationships between attributes, probabilistic relationships, copulas, kinds of copulas
Akademický rok vypsání:	2012/2013
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Katedra teoretické informatiky a matematické logiky (32-KTIML)
Vedoucí / školitel:	prof. RNDr. Ing. Martin Holeňa, CSc.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	13.03.2013
Datum zadání:	13.03.2013
Datum potvrzení stud. oddělením:	15.03.2013
Datum a čas obhajoby:	15.05.2013 10:00
Datum odevzdání elektronické podoby:	11.04.2013
Datum odevzdání tištěné podoby:	11.04.2013
Datum proběhlé obhajoby:	15.05.2013
Oponenti:	Mgr. David Hauzar, Ph.D.

Zásady pro vypracování

At first, student will introduce himself to the copula theory with emphasis on copula families used in existing applications. Further he will study methods for fitting copulas to data and also measures used to assess the fit of copulas to data. Based on studied literature he will choose several copula families, for which he will implement standard methods for fitting copulas to data, including assessing the quality of the fit by chosen measures. The implementation will be done in Matlab environment. Using implemented methods he will test suitability of selected copula families for fitting data. Student will use at least two datasets used in publications and one dataset provided by his supervisor.

Seznam odborné literatury

See http://www.cs.cas.cz/~martin/diplomka48.htm

Předběžná náplň práce v anglickém jazyce

Copulas are functions that have been used in the probability theory since the beginning of 50s to describe a relationship between a multivariate cumulative distribution function and distributions of its marginals. With increasing significance of probability approaches in computer science, copulas have found their applications in this field as well. During the last decade they have been applied in genetic algorithms for estimation of probability distributions (EDA algorithms) and also in the quickly growing area of data mining. Here copulas provide ways to find interesting relationships between attributes that can not be obtained using traditional methods. So far, the practical usage of copulas is to be found only in finance, where models created in the process of data mining are used for prediction. There are however many different kinds of copulas. This is a consequence of the fact that we often expect some interesting properties from the copulas (e.g. in case of Archimedean copulas) and also thanks to the easy creation of new copulas by parametrization. Dozens of copula families obtained by parametrization have already been described in the literature. So far almost no attention has been paid to the differences between copula families from the data mining point of view. The proposed master thesis should contribute to such research.