ISCA - International Speech
Communication Association


ISCApad Archive  »  2024  »  ISCApad #314  »  Resources  »  Database  »  AVID (Aalto Vocal Intensity Database): An open speech/electroglottography repository for machine learning -based studies on vocal intensity

ISCApad #314

Friday, August 09, 2024 by Chris Wellekens

5-2-29 AVID (Aalto Vocal Intensity Database): An open speech/electroglottography repository for machine learning -based studies on vocal intensity
  

AVID (Aalto Vocal Intensity Database): An open speech/electroglottography repository for machine learning -based studies on vocal intensity

 

AVID is an open database, which includes speech and electroglottography (EGG) signals produced by 50 speakers (25 males, 25 females). The speakers varied their vocal intensity in four categories (soft, normal, loud and very loud). Each speaker produced 25 isolated sentences in English and read two paragraphs of text using the four intensity modes. These speaking tasks were repeated twice in two sessions. Recordings were conducted using a constant mouth-to-microphone distance and by recording a sound pressure level (SPL) calibration tone. The speech data is labeled sentence-wise with a total of 19 labels (1 categorical intensity category label and 18 continuous SPL labels). By launching the open AVID repository, the authors would like to raise awareness of the speech and voice research communities for machine learning (ML) - based studies of vocal intensity. We are particularly advocating the utilization of ML in a scenario where the original intensity information of speech is lost because the signal has been recorded without SPL calibration and is therefore presented on an arbitrary amplitude scale. In order to demonstrate how ML can be used together with the AVID database for these kinds of research problems, the interested reader is referred to our article (Alku, Kodali, Laaksonen, Kadiri, “AVID: A speech database for machine learning studies on vocal intensity”, Speech Communication, Vol. 157, Article 103039, 2024).

The AVID database is freely available at:

https://zenodo.org/records/10524873




Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA