ecms_neu_mini.png

Digital Library

of the European Council for Modelling and Simulation

 

Title:

Time series clustering with different distance measures to tell Web bots and humans apart

Authors:

Grazyna Suchacka

Published in:

 

 

(2022). ECMS 2022, 36th Proceedings
Edited by: Ibrahim A. Hameed, Agus Hasan, Saleh Abdel-Afou Alaliyat, European Council for Modelling and Simulation.

 

DOI: http://doi.org/10.7148/2022

ISSN: 2522-2422 (ONLINE)

ISSN: 2522-2414 (PRINT)

ISSN: 2522-2430 (CD-ROM)

 

ISBN: 978-3-937436-77-7
ISBN: 978-3-937436-76-0(CD)

 

Communications of the ECMS , Volume 36, Issue 1, June 2022,

Ålesund, Norway May 30th - June 3rd, 2022

 

Citation format:

Grazyna Suchacka (2022). Time series clustering with different distance measures to tell Web bots and humans apart, ECMS 2022 Proceedings Edited By: Ibrahim A. Hameed, Agus Hasan, Saleh Abdel-Afou Alaliyat, European Council for Modeling and Simulation.

doi:10.7148/2022-0303

DOI:

https://doi.org/10.7148/2022-0303

Abstract:

The paper deals with the problem of differentiating Web sessions of bots and human users by observing some characteristics of their traffic at the Web server input. We propose an approach to cluster bots’ and humans’ sessions represented as time series. First, sessions are expressed as sequences of HTTP requests coming to the server at specific timestamps; then, they are pre-preprocessed to form time series of limited length. Time series are clustered and the clustering performance is evaluated in terms of the ability to partition bots and humans into separate clusters. The proposed approach is applied to real server log data and validated with the use of different time series distance measures and clustering algorithms. Results show that the choice of a distance measure and a clustering method significantly affects clustering efficiency. The best results for the considered scenario were achieved for distance measures based on nonparametric spectral estimators and the Euclidean distance with a complexity correction factor.

Full text: