SubjectsSubjects(version: 945)
Course, academic year 2023/2024
   Login via CAS
Web Knowledge Mining - NSWI107
Title: Dobývání informací z webu
Guaranteed by: Department of Software Engineering (32-KSI)
Faculty: Faculty of Mathematics and Physics
Actual: from 2007
Semester: summer
E-Credits: 6
Hours per week, examination: summer s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: cancelled
Language: Czech
Teaching methods: full-time
Teaching methods: full-time
Guarantor: RNDr. Leo Galamboš, Ph.D.
Class: Informatika Mgr. - volitelný
Classification: Informatics > Informatics, Software Applications, Computer Graphics and Geometry, Database Systems, Didactics of Informatics, Discrete Mathematics, External Subjects, General Subjects, Computer and Formal Linguistics, Optimalization, Programming, Software Engineering, Theoretical Computer Science
Pre-requisite : NDBI010, NPRG013
Annotation -
Last update: T_KSI (29.03.2005)
This course is intended to provide the student with an understanding of the fundamental concepts and advanced techniques for text-based information systems on the Web. This course covers efficient Web indexing, searching and crawling; Clustering, classification, text mining. The student will implement a project from diverse topics in the Web information retrieval.
Literature - Czech
Last update: T_KSI (29.03.2005)

Soumen Chakrabarti: Mining the Web: Discovering Knowledge from Hypertext Data. Amsterdam: Morgan Kaufmann, 2003.

Ricardo Baeza-Yates, Berthier Ribeiro-Neto: Modern Information Retrieval. Addison Wesley, 1999.

Ian H. Witten, Alistair Moffat, and Timothy C. Bell: Managing

Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, 1994.

Syllabus -
Last update: T_KSI (29.03.2005)

Engineering Large-Scale Crawlers.

The Vector-Space Model, Inverted Index, Recall, Precision.

Stopwords, stemming, lemmatization, soundex.

Handling "Find-Similar" Queries, Eliminating Near Duplicates.

Clustering: Bottom-Up/Top-Down; The k-Means Algorithm, Self-Organizing

Maps, Multidimensional Scaling, Latent Semantic Indexing,

Collaborative Filtering.

(Semi)supervised Learning.

PageRank, HITS.

Measuring and Modeling the Web.

Resource Discovery, Communities.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html