|
Digital
Library of the European Council for Modelling and Simulation |
Title: |
Time series clustering with different distance measures to tell Web bots and humans apart |
Authors: |
Grazyna Suchacka |
Published in: |
(2022). ECMS 2022,
36th Proceedings DOI: http://doi.org/10.7148/2022 ISSN:
2522-2422 (ONLINE) ISSN:
2522-2414 (PRINT) ISSN:
2522-2430 (CD-ROM) ISBN: 978-3-937436-77-7 Communications of the ECMS , Volume 36, Issue 1, June 2022, Ålesund, Norway May 30th - June 3rd, 2022 |
Citation
format: |
Grazyna Suchacka (2022). Time series clustering with different distance measures to tell Web bots and humans apart, ECMS 2022 Proceedings Edited By: Ibrahim A. Hameed, Agus Hasan, Saleh Abdel-Afou Alaliyat, European Council for Modeling and Simulation. doi:10.7148/2022-0303 |
DOI: |
https://doi.org/10.7148/2022-0303 |
Abstract: |
The paper deals with the problem of differentiating Web sessions of bots and human users by observing some characteristics of their traffic at the Web server input. We propose an approach to cluster bots’ and humans’ sessions represented as time series. First, sessions are expressed as sequences of HTTP requests coming to the server at specific timestamps; then, they are pre-preprocessed to form time series of limited length. Time series are clustered and the clustering performance is evaluated in terms of the ability to partition bots and humans into separate clusters. The proposed approach is applied to real server log data and validated with the use of different time series distance measures and clustering algorithms. Results show that the choice of a distance measure and a clustering method significantly affects clustering efficiency. The best results for the considered scenario were achieved for distance measures based on nonparametric spectral estimators and the Euclidean distance with a complexity correction factor. |
Full
text: |