ecms_neu_mini.png

Digital Library

of the European Council for Modelling and Simulation

 

Title:

Predicting Performance of Heterogeneous AI Systems with Discrete-Event Simulations

Authors:

Vyacheslav Zhdanovskiy, Lev Teplyakov, Anton Grigoryev

Published in:

 

 

(2022). ECMS 2022, 36th Proceedings
Edited by: Ibrahim A. Hameed, Agus Hasan, Saleh Abdel-Afou Alaliyat, European Council for Modelling and Simulation.

 

DOI: http://doi.org/10.7148/2022

ISSN: 2522-2422 (ONLINE)

ISSN: 2522-2414 (PRINT)

ISSN: 2522-2430 (CD-ROM)

 

ISBN: 978-3-937436-77-7
ISBN: 978-3-937436-76-0(CD)

 

Communications of the ECMS , Volume 36, Issue 1, June 2022,

Ă…lesund, Norway May 30th - June 3rd, 2022

 

Citation format:

Vyacheslav Zhdanovskiy, Lev Teplyakov, Anton Grigoryev (2022). Predicting Performance of Heterogeneous AI Systems with Discrete-Event Simulations, ECMS 2022 Proceedings Edited By: Ibrahim A. Hameed, Agus Hasan, Saleh Abdel-Afou Alaliyat, European Council for Modeling and Simulation.

doi:10.7148/2022-0278

DOI:

https://doi.org/10.7148/2022-0278

Abstract:

In recent years, artificial intelligence (AI) technologies have found industrial applications in various fields. AI systems typically possess complex software and heterogeneous CPU/GPU hardware architecture, making it difficult to answer basic questions considering performance evaluation and software optimization. Where is the bottleneck impeding the system? How does the performance scale with the workload? How the speed-up of a specific module would contribute to the whole system? Finding the answers to these questions through experiments on the real system could require a lot of computational, human, financial, and time resources. A solution to cut these costs is to use a fast and accurate simulation model preparatory to implementing anything in the real system. In this paper, we propose a discrete-event simulation model of a high-load heterogeneous AI system in the context of video analytics. Using the proposed model, we estimate: 1) the performance scalability with the increasing number of cameras; 2) the performance impact of integrating a new module; 3) the performance gain from optimizing a single module. We show that the performance estimation accuracy of the proposed model is higher than 90%. We also demonstrate, that the considered system possesses a counter-intuitive relationship between workload and performance, which nevertheless is correctly inferred by the proposed simulation model.

Full text: