This course acquaints students with data science methods with application in the environment of the R. language. It expands the previous knowledge of statistical methods acquired in the bachelor's degree or self-study.
Data science is a combination of various fields, including mathematics, statistics, computer science, information science, machine learning and artificial intelligence. An article in the Harward Business Review refers to data science as "The Sexiest Job of the 21st Century" (Davenport & Patil, 2012). The most commonly used tools in this area are Python, SQL and R.
R is a programming language and environment designed for statistical analysis of data and their graphical display. It is an implementation of the programming language S under a free license. Because it's free, R has already outpaced commercial software such as SPSS in terms of users. At the same time, it provides users with a number of features beyond the free software, such as Jasp or Jam. The functionality of the R environment can be extended using libraries called packages, of which more than 15,000 are available in the CRAN repository. R is thus very variable and can be used for a number of different tasks.
Davenport, Thomas H., and D. J. Patil. "Data Scientist: The Sexiest Job of the 21st Century." Harvard Business Review 90, no. 10 (October 2012): 70–76.
Rodriguez Salgado, J. J. (2021, December 9). What does a data scientist do? breaking down the responsibilities of data scientists. DataCamp Community. Retrieved December 19, 2021, from https://www.datacamp.com/community/blog/what-does-a-data-scientist-do
Literature -
Last update: Mgr. Jana Dlouhá (13.01.2022)
Required: Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. URL: https://ggplot2.tidyverse.org
Wickham, H., & Grolemund, G. (2017). R for data science: import, tidy, transform, visualize and model data. O'Reilly. URL: https://r4ds.had.co.nz/index.html
R package documentation - https://www.rdocumentation.org/ (used packages)
R manuals https://cran.r-project.org/manuals.html
Recommended: Field, A. P., Miles, J., & Field Zoë. (2014). Discovering statistics using R. Sage.
Mair, P. (2018). Modern Psychometrics with R. In Use R! Springer International Publishing. https://doi.org/10.1007/978-3-319-93177-7
Grolemund, G., & Wickham, H. (2017). R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/
Zamora Saiz, A., Quesada González, C., Hurtado Gil, L., & Mondéjar Ruiz, D. (2020). An Introduction to Data Analysis in R. In Use R! Springer International Publishing. https://doi.org/10.1007/978-3-030-48997-7 (selected chapters)
R Document Collections, Journals and Proceedings https://www.r-project.org/other-docs.html (including a list of books and other publications related to R)
Last update: Mgr. Jana Dlouhá (13.01.2022)
Required: Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. URL: https://ggplot2.tidyverse.org
Wickham, H., & Grolemund, G. (2017). R for data science: import, tidy, transform, visualize and model data. O'Reilly. URL: https://r4ds.had.co.nz/index.html
R package documentation - https://www.rdocumentation.org/ (used packages)
R manuals https://cran.r-project.org/manuals.html
Recommended: Field, A. P., Miles, J., & Field Zoë. (2014). Discovering statistics using R. Sage.
Mair, P. (2018). Modern Psychometrics with R. In Use R! Springer International Publishing. https://doi.org/10.1007/978-3-319-93177-7
Grolemund, G., & Wickham, H. (2017). R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/
Zamora Saiz, A., Quesada González, C., Hurtado Gil, L., & Mondéjar Ruiz, D. (2020). An Introduction to Data Analysis in R. In Use R! Springer International Publishing. https://doi.org/10.1007/978-3-030-48997-7 (selected chapters)
R Document Collections, Journals and Proceedings https://www.r-project.org/other-docs.html (including a list of books and other publications related to R)
Requirements to the exam -
Last update: Mgr. Jana Dlouhá (13.01.2022)
Attendance is not mandatory, but highly recommended.
Credit will be awarded to students who actively participate in a reasonable number of lectures and exercises. Attendance can be compensated by completing assigned tasks and reading.
The exam will take place at the agreed date at the end of the semester. Each student will be assigned a case study based on the knowledge covered during the course. Students perform an analysis and briefly (10 min) present their procedure and conclusions to their classmates.
Exam evaluation
preparation and cleaning of data for analysis
performing a basic exploratory analysis
choice of methods suitable for answering questions from the case study
analysis using selected methods
Interpretation of results and answers to case study questions
presentation of results
Last update: Mgr. Jana Dlouhá (13.01.2022)
Attendance is not mandatory, but highly recommended.
Credit will be awarded to students who actively participate in a reasonable number of lectures and exercises. Attendance can be compensated by completing assigned tasks and reading.
The exam will take place at the agreed date at the end of the semester. Each student will be assigned a case study based on the knowledge covered during the course. Students perform an analysis and briefly (10 min) present their procedure and conclusions to their classmates.
Exam evaluation
preparation and cleaning of data for analysis
performing a basic exploratory analysis
choice of methods suitable for answering questions from the case study
analysis using selected methods
Interpretation of results and answers to case study questions
presentation of results
Syllabus
Last update: Mgr. Jana Dlouhá (21.02.2023)
Introduction a. R framework and available software to use it, installing packages and solving problems (different OS, missing libraries, R versions, StackOverflow) b. data types, base R functions c. saving and loading data, Rdata files, work environment d. R documentation and CRAN, creating project and its structure e. DataCamp courses
Introduction II a. R syntax, cycles, conditions, apply family functions, writing functions, „OOP" in R b. Git, installing packages from GitHub c. best practices, defensive programming,
Statistics in R – correlation, regression, t-test, anova, chí-square, probability and distributions
Data visualization in R (ggplot, plotly, lattice, gganim)
Visualization best practices
R Markdown – slides, HTML pages, pdf files, docx documents, LaTeX, bibTeX and CSS basics
R shiny
Psychometrics in R – lavaan, psych, psychometrics, mirt, mirtCAT
Missing data, types of missing data, consequences to parametric statistics, imputation methods, multiple imputation
Text analysis and text mining – quanteda, word2vec; basic steps (); text statistics and summaries, readability indices, word frequency, similarity
Basics of unsupervised and supervised machine learning
Preparation for the exam - selection of suitable methods of analysis and visualization for different types of data, communication of your findings
Entry requirements -
Last update: Mgr. Jana Dlouhá (13.01.2022)
Students will need their own laptops with any operating system (Win, Linux, MacOS) during the course. Downloading and installing the necessary software will be part of the introductory lesson.
Last update: Mgr. Jana Dlouhá (13.01.2022)
Students will need their own laptops with any operating system (Win, Linux, MacOS) during the course. Downloading and installing the necessary software will be part of the introductory lesson.