Title:
Investigating reliability of machine learning results depending on a method and the feature pre-processing - the case of e-customer session classification
Authors:
- Jacek Iwanski
- Grazyna Suchacka
Published in:
(2024). ECMS 2024, 38th Proceedings
Edited by: Daniel Grzonka, Natalia Rylko, Grazyna Suchacka, Vladimir Mityushev, European Council for Modelling and Simulation.
DOI: http://doi.org/10.7148/2024
ISSN: 2522-2422 (ONLINE)
ISSN: 2522-2414 (PRINT)
ISSN: 2522-2430 (CD-ROM)
ISBN: 978-3-937436-84-5
ISBN: 978-3-937436-83-8 (CD) Communications of the ECMS Volume 38, Issue 1, June 2024, Cracow, Poland June 4th – June 7th, 2024
DOI:
https://doi.org/10.7148/2024-0528
Citation format:
Jacek iwanski, Grazyna suchacka (2024). Investigating reliability of machine learning results depending on a method and the feature pre-processing - the case of e-customer session classification, ECMS 2024, Proceedings Edited by: Daniel Grzonka, Natalia Rylko, Grazyna Suchacka, Vladimir Mityushev, European Council for Modelling and Simulation. doi:10.7148/2024-0528
Abstract:
A plenty of studies in recent years have aimed at developing machine learning (ML) models for various tasks, especially with the use of supervised learning (classification) methods. Very little attention, however, has been paid to the problem of evaluating variability and reliability of classification results. In this paper we address this problem through assessing the dispersion of classification results with the use of multiple independent training/validation and test set splits, as well as the multiple bootstrap resampling of the test set data. A dataset of real e-customer sessions was pre-processed after an in-depth investigation of features in the context of the probability of making a purchase in a current session. Classification was performed with the use of two state-of-the-art ML methods: artificial neural networks and random forests, based on two session datasets – with raw and pre-processed feature values. Our findings related to variability of the obtained results provide important methodological guidelines for the practical use of supervised learning methods.