Title:
Vggish for music/speech classification in radio broadcasting
Authors:
- Salvatore Serrano
- Marco Lucio Scarpa
- Omar Serghini
Published in:
(2024). ECMS 2024, 38th Proceedings
Edited by: Daniel Grzonka, Natalia Rylko, Grazyna Suchacka, Vladimir Mityushev, European Council for Modelling and Simulation.
DOI: http://doi.org/10.7148/2024
ISSN: 2522-2422 (ONLINE)
ISSN: 2522-2414 (PRINT)
ISSN: 2522-2430 (CD-ROM)
ISBN: 978-3-937436-84-5
ISBN: 978-3-937436-83-8 (CD) Communications of the ECMS Volume 38, Issue 1, June 2024, Cracow, Poland June 4th – June 7th, 2024
DOI:
https://doi.org/10.7148/2024-0550
Citation format:
Salvatore serrano, Marco lucio scarpa, Omar serghini (2024). VGGISH for Music/Speech Classification in Radio Broadcasting, ECMS 2024, Proceedings Edited by: Daniel Grzonka, Natalia Rylko, Grazyna Suchacka, Vladimir Mityushev, European Council for Modelling and Simulation. doi:10.7148/2024-0550
Abstract:
In the realm of audio signal processing, distinguishing between music and speech poses a significant challenge due to the nuanced similarities and complexities inherent in both domains. This study delves into this challenge by employing deep learning techniques to classify audio segments as either music or speech. Our approach involves utilizing the VGGish architecture and Mel-spectrograms as input to provide a rich representations of audio signals. These representations serve as inputs to our classification models, enabling us to discern intricate patterns characteristic of music and speech. We explore the efficacy of our models in this classification task, particularly focusing on their performance in various windowed audio segments. Through rigorous experimentation and evaluation, we observe notable results. Models exhibit remarkable accuracy, exceeding $96\%$ in distinguishing between music and speech. These findings underscore the effectiveness of deep learning models in discerning between music and speech. This work contributes to the understanding of deep learning applications in audio signal processing.