At first, student will introduce himself to the copula theory with emphasis on copula families used in existing applications. Further he will study methods for fitting copulas to data and also measures used to assess the fit of copulas to data. Based on studied literature he will choose several copula families, for which he will implement standard methods for fitting copulas to data, including assessing the quality of the fit by chosen measures. The implementation will be done in Matlab environment. Using implemented methods he will test suitability of selected copula families for fitting data. Student will use at least two datasets used in publications and one dataset provided by his supervisor.
Seznam odborné literatury
See http://www.cs.cas.cz/~martin/diplomka48.htm
Předběžná náplň práce v anglickém jazyce
Copulas are functions that have been used in the probability theory since the beginning of 50s to describe a relationship between a multivariate cumulative distribution function and distributions of its marginals. With increasing significance of probability approaches in computer science, copulas have found their applications in this field as well. During the last decade they have been applied in genetic algorithms for estimation of probability distributions (EDA algorithms) and also in the quickly growing area of data mining. Here copulas provide ways to find interesting relationships between attributes that can not be obtained using traditional methods. So far, the practical usage of copulas is to be found only in finance, where models created in the process of data mining are used for prediction. There are however many different kinds of copulas. This is a consequence of the fact that we often expect some interesting properties from the copulas (e.g. in case of Archimedean copulas) and also thanks to the easy creation of new copulas by parametrization. Dozens of copula families obtained by parametrization have already been described in the literature. So far almost no attention has been paid to the differences between copula families from the data mining point of view. The proposed master thesis should contribute to such research.