Thesis (Selection of subject)Thesis (Selection of subject)(version: 390)
Thesis details
   Login via CAS
Active learning in E-Commerce Merchant Classification using Website Information
Thesis title in Czech: Aktivní učení pro klasifikaci
Thesis title in English: Active learning in E-Commerce Merchant Classification using Website Information
Key words: aktivní učení|klasifikace|e-komerce
English key words: Active learning|Web mining|Classification|E-commerce
Academic year of topic announcement: 2022/2023
Thesis type: diploma thesis
Thesis language: angličtina
Department: Department of Theoretical Computer Science and Mathematical Logic (32-KTIML)
Supervisor: Mgr. Marta Vomlelová, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 25.05.2022
Date of assignment: 27.06.2022
Confirmed by Study dept. on: 23.11.2022
Date and time of defence: 12.06.2023 09:00
Date of electronic submission:03.05.2023
Date of submission of printed version:09.05.2023
Date of proceeded defence: 12.06.2023
Opponents: doc. Mgr. Martin Pilát, Ph.D.
 
 
 
Guidelines
The aim of this Master‘s Thesis will be to design a machine learning model that will be able to classify input data in the form of merchant URL into the main category and corresponding subcategories.

One of the challenges is to have a complete categorization of the whole available market, where each merchant can be classified into a main category based on what they offer, and then into more specific sub-categories, e.g., Eco, Zero Waste or Bike Sharing.

Although the available database has categorised merchants mainly from the CEE region, we want to create a global coverage, thus the demand for accurate automated categorization is increasing. Thanks to the collected data, it is possible to use modern machine learning approaches to process it.

The work assumes an existing dataset of a small number of manually categorised merchants. The student should build a merchant dataset based on the web information. On this partially labelled dataset he will test selection strategies for active learning, based on misclassification error and based on a Bayesian approach.
References
Galuh Tunggadewi Sahid, Rahmad Mahendra, and Indra Budi. 2019. E-Commerce Merchant Classification using Website Information. In Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics (WIMS2019). Association for Computing Machinery, New York, NY, USA, Article 5, 1–10. https://doi.org/10.1145/3326467.3326486

Kottke, D., Herde, M., Sandrock, C. et al. Toward optimal probabilistic active learning using a Bayesian approach. Mach Learn 110, 1199–1231 (2021). https://doi.org/10.1007/s10994-021-05986-9

Galuh Tunggadewi Sahid, Rahmad Mahendra, and Indra Budi. 2019. E-Commerce Merchant Classification using Website Inforimation. In 9th International Conference on Web Intelligence,
Mining and Semantics (WIMS2019), June 26–28, 2019, Seoul, Republic of Korea. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3326467.3326486

Michael Färber, Benjamin Scheer, and Frederic Bartscherer. 2020. Who’s Behind That Website? Classifying Websites by the Degree of Commercial Intent. In Web Engineering: 20th International Conference, ICWE 2020, Helsinki, Finland, June 9–12, 2020, Proceedings. Springer-Verlag, Berlin, Heidelberg, 130–145. https://doi.org/10.1007/978-3-030-50578-3_10
Preliminary scope of work in English
This topic is created for a specific student.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html