Active learning in E-Commerce Merchant Classification using Website Information
Thesis title in Czech: | Aktivní učení pro klasifikaci |
---|---|
Thesis title in English: | Active learning in E-Commerce Merchant Classification using Website Information |
Key words: | aktivní učení|klasifikace|e-komerce |
English key words: | Active learning|Web mining|Classification|E-commerce |
Academic year of topic announcement: | 2022/2023 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Department of Theoretical Computer Science and Mathematical Logic (32-KTIML) |
Supervisor: | Mgr. Marta Vomlelová, Ph.D. |
Author: | hidden![]() |
Date of registration: | 25.05.2022 |
Date of assignment: | 27.06.2022 |
Confirmed by Study dept. on: | 23.11.2022 |
Date and time of defence: | 12.06.2023 09:00 |
Date of electronic submission: | 03.05.2023 |
Date of submission of printed version: | 09.05.2023 |
Date of proceeded defence: | 12.06.2023 |
Opponents: | doc. Mgr. Martin Pilát, Ph.D. |
Guidelines |
The aim of this Master‘s Thesis will be to design a machine learning model that will be able to classify input data in the form of merchant URL into the main category and corresponding subcategories.
One of the challenges is to have a complete categorization of the whole available market, where each merchant can be classified into a main category based on what they offer, and then into more specific sub-categories, e.g., Eco, Zero Waste or Bike Sharing. Although the available database has categorised merchants mainly from the CEE region, we want to create a global coverage, thus the demand for accurate automated categorization is increasing. Thanks to the collected data, it is possible to use modern machine learning approaches to process it. The work assumes an existing dataset of a small number of manually categorised merchants. The student should build a merchant dataset based on the web information. On this partially labelled dataset he will test selection strategies for active learning, based on misclassification error and based on a Bayesian approach. |
References |
Galuh Tunggadewi Sahid, Rahmad Mahendra, and Indra Budi. 2019. E-Commerce Merchant Classification using Website Information. In Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics (WIMS2019). Association for Computing Machinery, New York, NY, USA, Article 5, 1–10. https://doi.org/10.1145/3326467.3326486
Kottke, D., Herde, M., Sandrock, C. et al. Toward optimal probabilistic active learning using a Bayesian approach. Mach Learn 110, 1199–1231 (2021). https://doi.org/10.1007/s10994-021-05986-9 Galuh Tunggadewi Sahid, Rahmad Mahendra, and Indra Budi. 2019. E-Commerce Merchant Classification using Website Inforimation. In 9th International Conference on Web Intelligence, Mining and Semantics (WIMS2019), June 26–28, 2019, Seoul, Republic of Korea. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3326467.3326486 Michael Färber, Benjamin Scheer, and Frederic Bartscherer. 2020. Who’s Behind That Website? Classifying Websites by the Degree of Commercial Intent. In Web Engineering: 20th International Conference, ICWE 2020, Helsinki, Finland, June 9–12, 2020, Proceedings. Springer-Verlag, Berlin, Heidelberg, 130–145. https://doi.org/10.1007/978-3-030-50578-3_10 |
Preliminary scope of work in English |
This topic is created for a specific student. |