Logo ECMS

Digital Library

of the European Council for Modelling and Simulation

Title:

Vggish for music/speech classification in radio broadcasting

Authors:
  • Salvatore Serrano
  • Marco Lucio Scarpa
  • Omar Serghini
Published in:

(2024). ECMS 2024, 38th Proceedings
Edited by: Daniel Grzonka, Natalia Rylko, Grazyna Suchacka, Vladimir Mityushev, European Council for Modelling and Simulation.
DOI: http://doi.org/10.7148/2024
ISSN: 2522-2422 (ONLINE)
ISSN: 2522-2414 (PRINT)
ISSN: 2522-2430 (CD-ROM)
ISBN: 978-3-937436-84-5
ISBN: 978-3-937436-83-8 (CD) Communications of the ECMS Volume 38, Issue 1, June 2024, Cracow, Poland June 4th – June 7th, 2024

DOI:

https://doi.org/10.7148/2024-0550

Citation format:

Salvatore serrano, Marco lucio scarpa, Omar serghini (2024). VGGISH for Music/Speech Classification in Radio Broadcasting, ECMS 2024, Proceedings Edited by: Daniel Grzonka, Natalia Rylko, Grazyna Suchacka, Vladimir Mityushev, European Council for Modelling and Simulation. doi:10.7148/2024-0550

Abstract:

In the realm of audio signal processing, distinguishing between music and speech poses a significant challenge due to the nuanced similarities and complexities inherent in both domains. This study delves into this challenge by employing deep learning techniques to classify audio segments as either music or speech. Our approach involves utilizing the VGGish architecture and Mel-spectrograms as input to provide a rich representations of audio signals. These representations serve as inputs to our classification models, enabling us to discern intricate patterns characteristic of music and speech. We explore the efficacy of our models in this classification task, particularly focusing on their performance in various windowed audio segments. Through rigorous experimentation and evaluation, we observe notable results. Models exhibit remarkable accuracy, exceeding $96\%$ in distinguishing between music and speech. These findings underscore the effectiveness of deep learning models in discerning between music and speech. This work contributes to the understanding of deep learning applications in audio signal processing.

Full text: Download full text download paper in pdf