ISCA - International Speech
Communication Association


ISCApad Archive  »  2024  »  ISCApad #310  »  Resources

ISCApad #310

Tuesday, April 09, 2024 by Chris Wellekens

5 Resources
5-1 Books
5-1-1Proceedings of SLTU-CCURL2020

Dear all,

 
we are very happy to announce that the SLTU-CCURL2020 Proceedings are available online: https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/SLTUCCURLbook.pdf
 
This year, LREC2020 would have featured an extraordinary event: the first joint SLTU-CCURL2020 Workshop, which was planned as a two-day workshop, with 54 papers accepted either as oral and poster presentations.
The workshop program was enriched by two tutorials and two keynote speeches.
 
We will miss the presentations, the discussions and the overall stimulating environment very deeply. 
 
We are thankful to ELRA and ISCA for their support to the workshop, to our Google sponsor and to the 60 experts of the Program Committee, who worked tirelessly in order to help us to select the best papers representing a wide perspective over NLP, speech and computational linguistics addressing less-resource languages.
 
Looking forward to better times when we will be able to meet in person again, we hope that you will find these workshop proceedings relevant and stimulating for your own research.
 
With our best wishes,

Claudia Soria, Laurent Besacier, Dorothee Beermann, and Sakriani Sakti
Back  Top

5-1-2J.Blauert, J. Braasch, 'The Technology of Binaural Understanding', Springer and ASA-Press

 My name is Jens Blauert, and you may recall me as an ESCA-Founder and ISCA-Goldmedalist. Although I am professor emeritus since many years, I am still active in science.
     Recently I published an edited book together with Jonas Braasch (Professor at RPI, Troy NY) at both SPRINGER and ASA-Press. The book?s title is ?The Technology of Binaural Understanding?. It attempts to bridge between audition and cognition and thus, as far as speech communication is concerned, between speech and language. The book's motto is:

     People, when exposed to acoustic stimuli, do not react directly to what they hear
     but rather to what they hear means to them.


     Obviously this is an issue which is also relevant for speech communication at large. Thus, I kindly  ask you to consider whether this book would qualify for a book announcement in the ISCAPad. If a full book review were also feasible, I can send you an eBook-link, and after a reviewer is has been assigned, SPRINGER will deliver a printed copy to him/her.
--   Thank you for your efforts in advance!
     With best regards, Jens Blauert


.PS. Preface and  table of contents of the book are available under the following link:
      https://www.springer.com/gp/book/9783030003852
      

Back  Top

5-1-3Weiss B., Trouvain J., Barkat-Defradas M., Ohala J.J., 'Voice Attractiveness', Springer 2021

Voice Attractiveness

Voice attractiveness

Studies on Sexy, Likable, and Charismatic Speakers

  • Editors
  • Benjamin Weiss
  • Jürgen Trouvain
  • Melissa Barkat-Defradas
  • John J. Ohala
  • Describes the multifaceted aspects of voice that may seduce, irritate or euphorize, and determine the way people perceive us

  • Provides approaches to, and exemplary studies on various methods of addressing voice attractiveness

  • Discusses voice attractiveness in passive listening scenarios and in conversation

  • Presents studies on normal, pathological and professional speakers

 
Back  Top

5-1-4Proceedings Speech Prosody 2022

Dear Speech Prosody SIG Members,

 

As many of you already know, Speech Prosody 2022 was a great technical and social success! Our thanks to Sonia Frota, Marina Vigario, and all the organizers.

 

The proceedings are already available, at https://www.isca-speech.org/archive/speechprosody_2022/ .

 

Also, to share a link announced at the closing ceremony, there are now digitized, searchable versions of all past ICPhS proceedings, at https://www.coli.uni-saarland.de/groups/BM/phonetics/resources.html#icphs .

 

 

Nigel Ward, SProSIG Chair,

Professor of Computer Science, University of Texas at El Paso

nigel@utep.edu    https://www.cs.utep.edu/nigel/  

Back  Top

5-1-5Diana Sidtis, Foundations of Familiar Language, Wiley

-----

Back  Top

5-1-6Cf Contribution to a book on interactions in the production, acoustics, and perception of speech and music, DeGruyter Mouton Publishers.
Dear colleagues!
 
Our special session on the relationships between speech and music at the ICPhS in Prague aroused great interest and received a very positive, lasting response. We are therefore very pleased that we can now build on this positive feedback and invite scholars to submit chapter proposals for an edited collection on the intersections and interactions in the production, acoustics, and perception of speech and music – to be published with DeGruyter Mouton in early 2025.
 
Please find the complete Call for Papers with all details in topics and timeline under the following link: https://cloud.newsletter.degruyter.com/Speech%20Music  – or directly via EasyChair: https://easychair.org/cfp/SLM2023 
 
Submission Guidelines
 
1.    Abstract submission
Please submit a 500-word abstract by February 4th, 2024. In your abstract, please clearly state how your work relates to one or more of the above areas of interest as this will help us structure the volume and invite matching reviewers. All abstracts must be in English. Notification of acceptance of your abstract will be sent by February 11th, 2024.
 
2.    Full paper submission
Upon acceptance of your abstract, you are required to submit your full paper by June 30th, 2024 (approx. 8000 words, excluding references). To ensure the scientific quality of the volume, all submitted papers will undergo a thorough peer review process. Each manuscript will be reviewed by one of the volume editors and an external reviewer, likely chosen from the pool of contributing authors. The review will focus on assessing relevance, originality, clarity, adherence to the thematic scope, scientific rigor, contribution to the field, methodology, and overall scientific quality. Authors will be given the opportunity to revise their papers in response to the reviewers’ feedback.
 
We look forward to receiving your contributions, and in the meantime we wish you a happy and healthy pre-Christmas time,
 
Jianjing Kuang, University of Pennsylvania
Oliver Niebuhr, University of Southern Denmark
(Co-editors)
***************************************
Back  Top

5-2 Database
5-2-1Linguistic Data Consortium (LDC) update (March 2024)

In this newsletter:
LDC data and commercial technology development

New publications:
RATS Low Speech Density
BabyEars Affective Vocalizations


LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing page for further information.


New publications:
RATS Low Speech Density was developed by LDC and is comprised of 87 hours of English, Levantine Arabic, Farsi, Pashto, and Urdu speech, and non-speech samples. The recordings were assembled by concatenating a randomized selection of speech, communications systems sounds, and silence. This corpus was created to measure false alarm performance in RATS speech activity detection systems.

The source audio was extracted from RATS development and progress sets and consists of conversational telephone speech recordings collected by LDC. Non-speech samples were selected from communications systems sounds, including telephone network special information tones, radio selective calling signals, HF/VHF/UHF digital mode radio traffic, radio network control channel signals, two-way radio traffic containing roger beeps, and short duration shift-key modulated handset data transmissions.

The goal of the RATS (Robust Automatic Transcription of Speech) program was to develop human language technology systems capable of performing speech detection, language identification, speaker identification, and keyword spotting on the severely degraded audio signals that are typical of various radio communication channels, especially those employing various types of handheld portable transceiver systems.

2024 members can access this corpus through their LDC accounts. Non-members may license this data for $2000.

*

BabyEars Affective Vocalizations contains 22 minutes of spontaneous English speech by 12 adults interacting with their infant children, for a total of 509 infant-directed utterances and 185 adult-directed or neutral utterances. Speech data was collected in a quiet room during a one-hour session where each sparent was asked to play and otherwise interact normally with their infant (aged 10-18 months). A trained research assistant then extracted discrete utterances and classified them in three categories: approval, attention, and prohibition.

2024 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for $250.


To unsubscribe from this newsletter, log in to your 
LDC account and uncheck the box next to “Receive Newsletter” under Account Options or contact LDC for assistance.

Membership Coordinator

Linguistic Data Consortium

University of Pennsylvania

T: +1-215-573-1275

E: ldc@ldc.upenn.edu

M: 3600 Market St. Suite 810

      Philadelphia, PA 19104

Back  Top

5-2-2ELRA - Language Resources Catalogue - Update (November and December2023)


We are happy to announce that 1 new written corpus, 1 new monolingual lexicon and 2 new speech resources are now available in our catalogue.

Corpus for fine-grained analysis and automatic detection of irony on Twitter
ISLRN: 478-366-550-085-8

This corpus was annotated by trained annotators (Master’s students in Linguistics) using a detailed annotation scheme for irony categorization, which describes four labels: ‘ironic by means of a polarity contrast’, ‘situational irony’, ‘other verbal irony’ and ‘not ironic’. It consists of 4791 instances with an irony label and a tweet ID.

Bitext Synonym Data - General Language
ISLRN: 470-885-612-363-1

The Bitext Synonym Data - General Language includes 31,723 entries and more than 100,000 synonyms for English language. This dataset is a set of synonyms developed to augment the English version of Wordnet, a powerful open-source
lexical database, released in 2005. All synonyms can be linked to Bitext Lexical Data - English (see ELRA-L0140) for lemmatization, POS and morphological information.

Corpus of Spontaneous Japanese (CSJ)
ISLRN: 280-594-494-328-0

The 'Corpus of Spontaneous Japanese' (or CSJ) contains about 650 hours of spontaneous speech that correspond to about 7000k words. All these speech materials are recorded using head-worn close-talking microphones and DAT, and down-sampled to 16kHz, 16bit accuracy. The speech material is transcribed both at orthographic and phonetic levels. In addition, segment label, intonation label, and other miscellaneous annotations are provided for a subset of CSJ, called the Core, which contains about 500k words or 45 hours of speech.

EWA-DB – Early Warning of Alzheimer speech database
ISLRN: 730-022-142-264-9

EWA-DB is a speech database that contains data from 3 clinical groups: Alzheimer's disease, Parkinson's disease, mild cognitive impairment, and a control group of healthy subjects. Speech samples of each clinical group were obtained using the EWA smartphone application, which contains 4 different language tasks: sustained vowel phonation, diadochokinesis, object and action naming (30 objects and 30 actions), picture description (two single pictures and three complex pictures). The total number of speakers in the database is 1649. Of these, there are 87 people with Alzheimer's disease, 175 people with Parkinson's disease, 62 people with mild cognitive impairment, 2 people with a mixed diagnosis of Alzheimer's + Parkinson's disease and 1323 healthy controls.


For more information on the catalogue or if you would like to enquire about having your resources distributed by ELRA, please contact us.
_________________________________________

Visit the ELRA Catalogue of Language Resources
Visit the Universal Catalogue

Archives of  ELRA Language Resources Catalogue Updates

***************************************************************

We are happy to announce that 3 new monolingual lexicons are now available in our catalogue.

DiaLEX – Egyptian (DiaLEX-EA)
ISLRN: 697-328-151-668-9
A comprehensive full-form lexicon of Egyptian Arabic general vocabulary (DiaLEX-EA) including 78 million entries for 31,000 lemmas with all inflected forms, enclitics, proclitics, case endings, declensions, and conjugated forms.
Each entry is accompanied by a full and accurate diacriticization (vocalization) as well as an extensive coverage of variants. The lexicon is ideally suited to support natural language processing applications for Egyptian Arabic, especially
morphological analysis and speech technology.
Quantity and size: 75,204,644 lines / 11,217 MB (11.0 GB)

DiaLEX – Emirati (DiaLEX-UA)
ISLRN: 836-793-503-213-8
A comprehensive full-form lexicon of Emirati Arabic general vocabulary (DiaLEX-UA) including 28 million entries for 29,000 lemmas with all inflected forms, enclitics, proclitics, case endings, declensions, and conjugated forms.
Each entry is accompanied by a full and accurate diacriticization (vocalization) as well as an extensive coverage of variants. The lexicon is ideally suited to support natural language processing applications for Emirati Arabic, especially
morphological analysis and speech technology.
Quantity and size: 24,976,871 lines / 3,841 MB (3.8 GB)

DiaLEX – Saudi Arabian Hijazi (DiaLEX-HA)
ISLRN: 849-157-479-216-3
A comprehensive full-form lexicon of Hijazi Arabic general vocabulary (DiaLEX-HA) including 21 million entries for 30,000 lemmas with all inflected forms, enclitics, proclitics, case endings, declensions, and conjugated forms.
Each entry is accompanied by a full and accurate diacriticization (vocalization) as well as an extensive coverage of variants. The lexicon is ideally suited to support natural language processing applications for Hijazi Arabic, especially
morphological analysis and speech technology.
Quantity and size: 20,247,655 lines / 2,835 MB (2.8 GB)


For more information on the catalogue or if you would like to enquire about having your resources distributed by ELRA, please contact us.

_________________________________________

Visit the ELRA Catalogue of Language Resources
Visit the Universal Catalogue
Archives of ELRA Language Resources Catalogue Updates 

 

 

 
 
Back  Top

5-2-3Speechocean – update (August 2019)

 

English Speech Recognition Corpus - Speechocean

 

At present, Speechocean has produced more than 24,000 hours of English Speech Recognition Corpora, including some rare corpora recorded by kids. Those corpora were recorded by 23,000 speakers in total. Please check the form below:

 

Name

Speakers

Hours

American English

8,441

8,029

Indian English

2,394

3,540

British English

2,381

3,029

Australian English

1,286

1,954

Chinese (Mainland) English

3,478

1,513

Canadian English

1,607

1,309

Japanese English

1,005

902

Singapore English

404

710

Russian English

230

492

Romanian English

201

389

French English

225

378

Chinese (Hong Kong) English

200

378

Italian English

213

366

Portugal English

201

341

Spainish English

200

326

German English

196

306

Korean English

116

207

Indonesian English

402

126

 

 

If you have any further inquiries, please do not hesitate to contact us.

Web: en.speechocean.com

Email: marketing@speechocean.com

 

 

 

 

 

 


 


 

 

Back  Top

5-2-4Google 's Language Model benchmark
 Here is a brief description of the project.

'The purpose of the project is to make available a standard training and test setup for language modeling experiments.

The training/held-out data was produced from a download at statmt.org using a combination of Bash shell and Perl scripts distributed here.

This also means that your results on this data set are reproducible by the research community at large.

Besides the scripts needed to rebuild the training/held-out data, it also makes available log-probability values for each word in each of ten held-out data sets, for each of the following baseline models:

  • unpruned Katz (1.1B n-grams),
  • pruned Katz (~15M n-grams),
  • unpruned Interpolated Kneser-Ney (1.1B n-grams),
  • pruned Interpolated Kneser-Ney (~15M n-grams)

 

Happy benchmarking!'

Back  Top

5-2-5Forensic database of voice recordings of 500+ Australian English speakers

Forensic database of voice recordings of 500+ Australian English speakers

We are pleased to announce that the forensic database of voice recordings of 500+ Australian English speakers is now published.

The database was collected by the Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales as part of the Australian Research Council funded Linkage Project on making demonstrably valid and reliable forensic voice comparison a practical everyday reality in Australia. The project was conducted in partnership with: Australian Federal Police,  New South Wales Police,  Queensland Police, National Institute of Forensic Sciences, Australasian Speech Sciences and Technology Association, Guardia Civil, Universidad Autónoma de Madrid.

The database includes multiple non-contemporaneous recordings of most speakers. Each speaker is recorded in three different speaking styles representative of some common styles found in forensic casework. Recordings are recorded under high-quality conditions and extraneous noises and crosstalk have been manually removed. The high-quality audio can be processed to reflect recording conditions found in forensic casework.

The database can be accessed at: http://databases.forensic-voice-comparison.net/

Back  Top

5-2-6Audio and Electroglottographic speech recordings

 

Audio and Electroglottographic speech recordings from several languages

We are happy to announce the public availability of speech recordings made as part of the UCLA project 'Production and Perception of Linguistic Voice Quality'.

http://www.phonetics.ucla.edu/voiceproject/voice.html

Audio and EGG recordings are available for Bo, Gujarati, Hmong, Mandarin, Black Miao, Southern Yi, Santiago Matatlan/ San Juan Guelavia Zapotec; audio recordings (no EGG) are available for English and Mandarin. Recordings of Jalapa Mazatec extracted from the UCLA Phonetic Archive are also posted. All recordings are accompanied by explanatory notes and wordlists, and most are accompanied by Praat textgrids that locate target segments of interest to our project.

Analysis software developed as part of the project – VoiceSauce for audio analysis and EggWorks for EGG analysis – and all project publications are also available from this site. All preliminary analyses of the recordings using these tools (i.e. acoustic and EGG parameter values extracted from the recordings) are posted on the site in large data spreadsheets.

All of these materials are made freely available under a Creative Commons Attribution-NonCommercial-ShareAlike-3.0 Unported License.

This project was funded by NSF grant BCS-0720304 to Pat Keating, Abeer Alwan and Jody Kreiman of UCLA, and Christina Esposito of Macalester College.

Pat Keating (UCLA)

Back  Top

5-2-7EEG-face tracking- audio 24 GB data set Kara One, Toronto, Canada

We are making 24 GB of a new dataset, called Kara One, freely available. This database combines 3 modalities (EEG, face tracking, and audio) during imagined and articulated speech using phonologically-relevant phonemic and single-word prompts. It is the result of a collaboration between the Toronto Rehabilitation Institute (in the University Health Network) and the Department of Computer Science at the University of Toronto.

 

In the associated paper (abstract below), we show how to accurately classify imagined phonological categories solely from EEG data. Specifically, we obtain up to 90% accuracy in classifying imagined consonants from imagined vowels and up to 95% accuracy in classifying stimulus from active imagination states using advanced deep-belief networks.

 

Data from 14 participants are available here: http://www.cs.toronto.edu/~complingweb/data/karaOne/karaOne.html.

 

If you have any questions, please contact Frank Rudzicz at frank@cs.toronto.edu.

 

Best regards,

Frank

 

 

PAPER Shunan Zhao and Frank Rudzicz (2015) Classifying phonological categories in imagined and articulated speech. In Proceedings of ICASSP 2015, Brisbane Australia

ABSTRACT This paper presents a new dataset combining 3 modalities (EEG, facial, and audio) during imagined and vocalized phonemic and single-word prompts. We pre-process the EEG data, compute features for all 3 modalities, and perform binary classi?cation of phonological categories using a combination of these modalities. For example, a deep-belief network obtains accuracies over 90% on identifying consonants, which is signi?cantly more accurate than two baseline supportvectormachines. Wealsoclassifybetweenthedifferent states (resting, stimuli, active thinking) of the recording, achievingaccuraciesof95%. Thesedatamaybeusedtolearn multimodal relationships, and to develop silent-speech and brain-computer interfaces.

 

Back  Top

5-2-8TORGO data base free for academic use.

In the spirit of the season, I would like to announce the immediate availability of the TORGO database free, in perpetuity for academic use. This database combines acoustics and electromagnetic articulography from 8 individuals with speech disorders and 7 without, and totals over 18 GB. These data can be used for multimodal models (e.g., for acoustic-articulatory inversion), models of pathology, and augmented speech recognition, for example. More information (and the database itself) can be found here: http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html.

Back  Top

5-2-9Datatang

Datatang is a global leading data provider that specialized in data customized solution, focusing in variety speech, image, and text data collection, annotation, crowdsourcing services.

 

Summary of the new datasets (2018) and a brief plan for 2019.

 

 

 

? Speech data (with annotation) that we finished in 2018 

 

Language
Datasets Length
  ( Hours )
French
794
British English
800
Spanish
435
Italian
1,440
German
1,800
Spanish (Mexico/Colombia)
700
Brazilian Portuguese
1,000
European Portuguese
1,000
Russian
1,000

 

?2019 ongoing  speech project 

 

Type

Project Name

Europeans speak English

1000 Hours-Spanish Speak English

1000 Hours-French Speak English

1000 Hours-German Speak English

Call Center Speech

1000 Hours-Call Center Speech

off-the-shelf data expansion

1000 Hours-Chinese Speak English

1500 Hours-Mixed Chinese and English Speech Data

 

 

 

On top of the above,  there are more planed speech data collections, such as Japanese speech data, children`s speech data, dialect speech data and so on.  

 

What is more, we will continually provide those data at a competitive price with a maintained high accuracy rate.

 

 

 

If you have any questions or need more details, do not hesitate to contact us jessy@datatang.com 

 

It would be possible to send you with a sample or specification of the data.

 

 

 


Back  Top

5-2-10SIWIS French Speech Synthesis Database
The SIWIS French Speech Synthesis Database includes high quality French speech recordings and associated text files, aimed at building TTS systems, investigate multiple styles, and emphasis. A total of 9750 utterances from various sources such as parliament debates and novels were uttered by a professional French voice talent. A subset of the database contains emphasised words in many different contexts. The database includes more than ten hours of speech data and is freely available.
 
Back  Top

5-2-11JLCorpus - Emotional Speech corpus with primary and secondary emotions
JLCorpus - Emotional Speech corpus with primary and secondary emotions:
 

For further understanding the wide array of emotions embedded in human speech, we are introducing an emotional speech corpus. In contrast to the existing speech corpora, this corpus was constructed by maintaining an equal distribution of 4 long vowels in New Zealand English. This balance is to facilitate emotion related formant and glottal source feature comparison studies. Also, the corpus has 5 secondary emotions along with 5 primary emotions. Secondary emotions are important in Human-Robot Interaction (HRI), where the aim is to model natural conversations among humans and robots. But there are very few existing speech resources to study these emotions,and this work adds a speech corpus containing some secondary emotions.

Please use the corpus for emotional speech related studies. When you use it please include the citation as:

Jesin James, Li Tian, Catherine Watson, 'An Open Source Emotional Speech Corpus for Human Robot Interaction Applications', in Proc. Interspeech, 2018.

To access the whole corpus including the recording supporting files, click the following link: https://www.kaggle.com/tli725/jl-corpus, (if you have already installed the Kaggle API, you can type the following command to download: kaggle datasets download -d tli725/jl-corpus)

Or if you simply want the raw audio+txt files, click the following link: https://www.kaggle.com/tli725/jl-corpus/downloads/Raw%20JL%20corpus%20(unchecked%20and%20unannotated).rar/4

The corpus was evaluated by a large scale human perception test with 120 participants. The link to the survey are here- For Primary emorion corpus: https://auckland.au1.qualtrics.com/jfe/form/SV_8ewmOCgOFCHpAj3

For Secondary emotion corpus: https://auckland.au1.qualtrics.com/jfe/form/SV_eVDINp8WkKpsPsh

These surveys will give an overall idea about the type of recordings in the corpus.

The perceptually verified and annotated JL corpus will be given public access soon.

Back  Top

5-2-12OPENGLOT –An open environment for the evaluation of glottal inverse filtering

OPENGLOT –An open environment for the evaluation of glottal inverse filtering

 

OPENGLOT is a publically available database that was designed primarily for the evaluation of glottal inverse filtering algorithms. In addition, the database can be used in evaluating formant estimation methods. OPENGLOT consists of four repositories. Repository I contains synthetic glottal flow waveforms, and speech signals generated by using the Liljencrants–Fant (LF) waveform as an excitation, and an all-pole vocal tract model. Repository II contains glottal flow and speech pressure signals generated using physical modelling of human speech production. Repository III contains pairs of glottal excitation and speech pressure signal generated by exciting 3D printed plastic vocal tract replica with LF excitations via a loudspeaker. Finally, Repository IV contains multichannel recordings (speech pressure signal, EGG, high-speed video of the vocal folds) from natural production of speech.

 

OPENGLOT is available at:

http://research.spa.aalto.fi/projects/openglot/

Back  Top

5-2-13Corpus Rhapsodie

Nous sommes heureux de vous annoncer la publication d¹un ouvrage consacré
au treebank Rhapsodie, un corpus de français parlé de 33 000 mots
finement annoté en prosodie et en syntaxe.

Accès à la publication : https://benjamins.com/catalog/scl.89 (voir flyer
ci-joint)

Accès au treebank : https://www.projet-rhapsodie.fr/
Les données librement accessibles sont diffusées sous licence Creative
Commons.
Le site donne également accès aux guides d¹annotations.

Back  Top

5-2-14The My Science Tutor Children's Conversational Speech Corpus (MyST Corpus) , Boulder Learning Inc.

The My Science Tutor Children's Conversational Speech Corpus (MyST Corpus) is the world's largest English children's speech corpus.  It is freely available to the research community for research use.  Companies can acquire the corpus for $10,000.  The MyST Corpus was collected over a 10-year period, with support from over $9 million in grants from the US National Science Foundation and Department of Education, awarded to Boulder Learning Inc. (Wayne Ward, Principal Investigator).

The MyST corpus contains speech collected from 1,374 third, fourth and fifth grade students.  The students engaged in spoken dialogs with a virtual science tutor in 8 areas of science.  A total of 11,398 student sessions of 15 to 20 minutes produced a total of 244,069 utterances.  42% of the utterances have been transcribed at the word level.  The corpus is partitioned into training and test sets to support comparison of research results across labs. All parents and students signed consent forms, approved by the University of Colorado?s Institutional Review Board,  that authorize distribution of the corpus for research and commercial use. 

The MyST children?s speech corpus contains approximately ten times as many spoken utterances as all other English children?s speech corpora combined (see https://en.wikipedia.org/wiki/List_of_children%27s_speech_corpora). 

Additional information about the corpus, and instructions for how to acquire the corpus (and samples of the speech data) can be found on the Boulder Learning Web site at http://boulderlearning.com/request-the-myst-corpus/.   

Back  Top

5-2-15HARVARD speech corpus - native British English speaker
  • HARVARD speech corpus - native British English speaker, digital re-recording
 
Back  Top

5-2-16Magic Data Technology Kid Voice TTS Corpus in Mandarin Chinese (November 2019)

Magic Data Technology Kid Voice TTS Corpus in Mandarin Chinese

 

Magic Data Technology is one of the leading artificial intelligence data service providers in the world. The company is committed to providing a wild range of customized data services in the fields of speech recognition, intelligent imaging and Natural Language Understanding.

 

This corpus was recorded by a four-year-old Chinese girl originally born in Beijing China. This time we published 15-minute speech data from the corpus for non-commercial use.

 

The contents and the corresponding descriptions of the corpus:

  • The corpus contains 15 minutes of speech data, which is recorded in NC-20 acoustic studio.

  • The speaker is 4 years old originally born in Beijing

  • Detail information such as speech data coding and speaker information is preserved in the metadata file.

  • This corpus is natural kid style.

  • Annotation includes four parts: pronunciation proofreading, prosody labeling, phone boundary labeling and POS Tagging.

  • The annotation accuracy is higher than 99%.

  • For phone labeling, the database contains the annotation not only on the boundary of phonemes, but also on the boundary of the silence parts.

 

The corpus aims to help researchers in the TTS fields. And it is part of a much bigger dataset (2.3 hours MAGICDATA Kid Voice TTS Corpus in Mandarin Chinese) which was recorded in the same environment. This is the first time to publish this voice!

 

Please note that this corpus has got the speaker and her parents’ authorization.

 

Samples are available.

Do not hesitate to contact us for any questions.

Website: http://www.imagicdatatech.com/index.php/home/dataopensource/data_info/id/360

E-mail: business@magicdatatech.com

Back  Top

5-2-17FlauBERT: a French LM
Here is FlauBERT: a French LM learnt (with #CNRS J-Zay supercomputer) on a large and heterogeneous corpus. Along with it comes FLUE (evaluation setup for French NLP). FlauBERT was successfully applied to complex tasks (NLI, WSD, Parsing).  More on https://github.com/getalp/Flaubert
More details on this online paper: https://arxiv.org/abs/1912.05372 
Back  Top

5-2-18ELRA-S0408 SpeechTera Pronunciation Dictionary

ELRA-S0408 Speechtera Pronunciation Dictionary

ISLRN: 645-563-102-594-8
The SpeechTera Pronunciation Dictionary is a machine-readable pronunciation dictionary for Brazilian Portuguese and comprises 737,347 entries. Its phonetic transcription is based on 13 linguistics varieties spoken in Brazil and contains the pronunciation of the frequent word forms found in the transcription data of the SpeechTera's speech and text database (literary, newspaper, movies, miscellaneous). Each one of the thirteen dialects comprises 56,719 entries.
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-S0408/

For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org

If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/catalogues/language-resources-announcements

Back  Top

5-2-19Ressources of ELRC Network

Paris, France, April 23, 2020

ELRA is happy to announce that Language Resources collected within the ELRC Network, funded by the European Commission, are now available from the ELRA Catalogue of Language Resources.

In total, 180 Written Corpora, 5 Multilingual Lexicons and 2 Terminological Resources, are freely available under open licences and can be downloaded directly from the catalogue. Type 'ELRC' in the catalogue search engine (http://catalog.elra.info/en-us/repository/search/?q=ELRC) to access and download resources.

All these Language Resources can be used to support your Machine Translation solutions developments. They cover the official languages of the European Union and CEF associated countries.

More LRs coming from ELRC will be added as they become available.

*****
About ELRC
ELRC (European Language Resources Coordination) Network raises awareness and promote the acquisition and continued identification and collection of language resources in all official languages of the EU and CEF associated countries. These activities aim to help to improve the quality, coverage and performance of automated translation solutions in the context of current and future CEF digital services.

To find out more about ELRC, please visit the website: http://lr-coordination.eu


About ELRA
The European Language Resources Association (ELRA) is a non-profit-making organisation founded by the European Commission in 1995, with the mission of providing a clearing house for Language Resources and promoting Human Language Technologies. Language Resources covering various fields of HLT (including Multimodal, Speech, Written, Terminology) and a great number of languages are available from the ELRA catalogue. ELRA's strong involvement in the fields of Language Resources  and Language Technologies is also emphasized at the LREC conference, organized every other year since 1998.

To find out more about ELRA, please visit the website: http://www.elra.info

For more information on the catalogue, please contact Valérie Mapelli
If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/catalogues/language-resources-announcements

Back  Top

5-2-20ELRA announces that MEDIA data are now available for free for academic research

ELRA announces that MEDIA data are now available for free for academic research

Further to the request of the HLT French community to foster evaluation activities for man-machine dialogue systems for French language, ELRA has decided to provide a free access to the MEDIA speech corpora and evaluation package for academic research purposes.

The MEDIA data can be found in the ELRA Catalogue under the following references:

Data available from the ELRA Catalogue can be obtained easily by contacting ELRA.  

The MEDIA project was carried out within the framework of Technolangue, the French national research programme funded by the French Ministry of Research and New Technologies (MRNT) with the objective of running a campaign for the evaluation of man-machine dialogue systems for French. The campaign was distributed over two actions: an evaluation taking into account the dialogue context and an evaluation not taking into account the dialogue context.

PortMedia was a follow up project supported by the French Research Agency (ANR). The French and Italian corpus was produced by ELDA, with the same paradigm and specifications as the MEDIA speech database but on a different domain.

For more information and/or questions, please write to contact@elda.org.

 *** About ELRA ***
The European Language Resources Association (ELRA) is a non-profit making organisation founded by the European Commission in 1995, with the mission of providing a clearing house for language resources and promoting Human Language Technologies (HLT).

To find out more about ELRA and its respective catalogue, please visit: http://www.elra.info and http://catalogue.elra.info

Back  Top

5-2-21ELRA/ELDA Communication : LT4all

Out of the 7,000+ language spoken around the world, only a few have associated Language Technologies. The majority of languages can be considered as 'under-resourced' or as 'not supported'. This situation, very detrimental to many languages speakers, and specifically indigenous languages speakers, creates a digital divide,  and places many languages in danger of digital extinction, if not complete extinction.

Organized as part of the 2019 International Year of Indigenous Languages, the 1st edition of LT4All (Language Technologies for All: Enabling Linguistic Diversity and Multilingualism Worldwide) took place in Paris at the UNESCO Headquarters on December 4-6, 2019 and gathered 400 participants from various backgrounds (including language science and technology researchers, linguists, industrials, indigenous peoples, language policy and decision makers) from all over the world.

The LT4All Programme and Editorial Committees are very happy to announce that the set of Research Papers and Posters collected at the occasion of LT4All is now available online at : https://lt4all.elra.info/proceedings/lt4all2019/

****
LT4All has  been made possible thanks to the close cooperation between UNESCO, the  Government of the Khanty-Mansiysk Autonomous Okrug-Ugra (Russian Federation), the European Language Resources Association (ELRA) and its Special Interest  Group on Under-resourced  languages  (SIGUL), and in partnership with UNESCO Intergovernmental Information for All Programme (IFAP) and the Interregional Library Cooperation Centre, as well as with support of other public organizations and sponsors.

More information including the list of all the sponsors and supporters @ https://en.unesco.org/LT4All

Back  Top

5-2-22Search and Find ELRA LRs on Google Dataset Search and ELG LT Platform

Search and Find ELRA LRs on Google Dataset Search and ELG LT Platform

ELRA is happy to announce that all the Language Resources from its Catalogue can now be searched and found on Google Dataset Search and on the ELG Language Technology platform developed within the European Language Grid project.

In order to allow the indexing by Google Dataset Search, ELRA has updated the code generating the catalogue pages. The code developed follows the schema.org standard and is publicly available in JSON format so that it can be used for other harvesting purposes.

The ELRA Catalogue is already indexed and harvested by famous repositories and archives such as OLAC (Open Language Archives Community), CLARIN Virtual Language Observatory and META-SHARE.

For 25 years now, ELRA has been distributing Language Resources to support research and development in various fields of Human Language Technology. The indexing on both Google Dataset Search and the ELG LT Plaform is increasing ELRA Catalogue?s visibility, making the LRs known to new visitors from the Human Language Technologies, Artificial Intelligence and other related fields.


*** About ELRA ***

The European Language Resources Association (ELRA) is a non-profit making organisation founded by the European Commission in 1995, with the mission of providing a clearing house for language resources and promoting Human Language Technologies (HLT).

ELRA Catalogue of Language Resources: http://catalogue.elra.info

More about ELRA, please visit: http://www.elra.info.


*** About Google Dataset Search ***

Google Dataset Search is a search engine for datasets that enables users to search through a list of data repositories indexed through a standardised schema.

More about Google Dataset Search: https://datasetsearch.research.google.com/


*** About European Language Grid ***

The European Language Grid (ELG) is a project funded by the European Union through the Horizon 2020 research and innovation programme. It aims to be a primary platform for Language Technology in Europe.

More about the European Language Grid project: https://www.european-language-grid.eu/

Back  Top

5-2-23Sharing language resources (ELRA)

ELRA recognises the importance of sharing Language Resources (LRs) and making them available to the community.

Since the 2014 edition of LREC, the Language Resources and Evaluation Conference, participants have been offered the possibility to share their LRs (data, tools, web-services, etc.) when submitting a paper, uploading them in a special LREC repository set up by ELRA.

This effort of sharing LRs, linked to the LRE Map initiative (https://lremap.elra.info) for their description, contributes to creating a common repository where everyone can deposit and share data.

Despite the cancellation of LREC 2020 in Marseille, a high number of contributions was submitted and the LREC initiative 'Share your LRs' could be conducted to the end successfully.

Repositories corresponding to each edition of the conference are available here:

For more info and questions, please write to contact@elda.org.
 
Back  Top

5-2-24The UCLA Variability Speaker Database

With NSF support, our interdisciplinary team of voice research at UCLA recently put together a public database that we believe will be of interest to many members of the ISCA community. On behalf of my co-authors (Patricia Keating, Jody Kreiman, Abeer Alwan, Adam Chong), I'm writing to ask if we could advertise our database in the ISCA newsletter. We'd really appreciate your help with this.

 
The database, the UCLA Variability Speaker Database, is freely available through UCLA's Box cloud, which can be accessed from our lab website: http://www.seas.ucla.edu/spapl/shareware.html#Data I should mention that the database will also be available from the Linguistic Data Consortium (LDC) as of October, 2021.
 
Here's a brief description of the database.
The UCLA Variability Speaker Database comprises high-quality audio recordings from 202 speakers, 101 men 
and 101 women, performing 12 brief speech tasks in English over three recording sessions (total amount 
of speech: 300-450 sec per speaker). This public database was designed to sample variability in speaking 
within individual speakers and across a large number of speakers. The large set of speakers (similar in age) 
sampled from the current university community is gender-balanced and has a variety of language backgrounds. 
The database can serve as a testing ground for research questions involving between-speaker variability, 
within-speaker variability, and text-dependent variability. 
More details about the database are available in a readme file that can be sent on request.

--Cynthia Yoonjeong Lee
Postdoctoral Scholar, Department of Linguistics, UCLA
Back  Top

5-2-25Free databases in Catalan, Spanish and Arabic (ELRA and UPC Spain)

We are pleased to announce that Language Resources entrusted to ELRA for distribution and shared by the Universitat Politecnica de Catalunya (UPC), in Spain, are now available for free for academic research purposes (for ELRA institutional members) and at substantially decreased costs for commercial purposes. All data have been developed to enhance Speech technologies in Catalan, Spanish and Arabic.

 

The Language Resources can be found in the ELRA Catalogue under the following references:

ELRA-S0101 Spanish SpeechDat(II) FDB-1000 (ISLRN: 415-072-153-167-5)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0101/
ELRA-S0102 Spanish SpeechDat(II) FDB-4000 (ISLRN: 295-399-069-106-4)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0102/
ELRA-S0140 Spanish SpeechDat-Car database (ISLRN: 937-459-364-430-3)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0140/
ELRA-S0141 SALA Spanish Venezuelan Database (ISLRN: 894-744-522-508-8)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0141/
ELRA-S0173 SALA Spanish Mexican Database (ISLRN: 077-043-759-782-3)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0173/
ELRA-S0183 OrienTel Morocco MCA (Modern Colloquial Arabic) database (ISLRN: 613-578-868-832-2)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0183/
ELRA-S0184 OrienTel Morocco MSA (Modern Standard Arabic) database (ISLRN: 978-839-138-181-8)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0184/
ELRA-S0185 OrienTel French as spoken in Morocco database (ISLRN: 299-422-451-969-8)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0185/
ELRA-S0186 OrienTel Tunisia MCA (Modern Colloquial Arabic) database (ISLRN: 297-705-745-294-4)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0186/
ELRA-S0187 OrienTel Tunisia MSA (Modern Standard Arabic) database (ISLRN: 926-401-827-806-5)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0187/
ELRA-S0188 OrienTel French as spoken in Tunisia database (ISLRN:
085-972-271-578-3)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0188/
ELRA-S0207 LC-STAR Catalan phonetic lexicon (ISLRN: 102-856-174-704-7)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0207/
ELRA-S0208 LC-STAR Spanish phonetic lexicon (ISLRN: 826-939-678-247-5)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0208/
ELRA-S0243 SpeechDat Catalan FDB database (ISLRN:
373-541-490-506-3)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0243/
ELRA-S0306 TC-STAR Transcriptions of Spanish Parliamentary Speech (ISLRN: 972-398-693-247-4 )
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0306/
ELRA-S0309 TC-STAR Spanish Baseline Female Speech Database (ISLRN: 682-113-241-701-0)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0309/
ELRA-S0310 TC-STAR Spanish Baseline Male Speech Database (ISLRN: 736-021-086-598-0)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0310/
ELRA-S0311 TC-STAR Bilingual Voice-Conversion Spanish Speech Database (ISLRN: 254-311-004-570-0)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0311/
ELRA-S0312 TC-STAR Bilingual Voice-Conversion English Speech Database (ISLRN: 522-613-023-181-1)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0312/
ELRA-S0313 TC-STAR Bilingual Expressive Speech Database (ISLRN:
088-656-828-489-3)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0313/
ELRA-S0336 Spanish Festival voice male (ISLRN: 868-352-143-949-9)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0336/
ELRA-S0337 Spanish Festival voice female (ISLRN: 396-262-481-019-0)
For more information, see: http://catalogue.elra.info/en-us/repository/browse/ELRA-S0337/



For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org

If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/catalogues/language-resources-announcements

Back  Top

5-2-26USC 75-Speaker Speech MRI Database. Multispeaker speech production articulatory datasets of vocal tract MRI video

A USC Multispeaker speech production articulatory datasets of vocal tract MRI video
 the 'USC 75-Speaker Speech MRI Database'  is a new freely-available speech production data set with accompanying software  tools:
https://www.nature.com/articles/s41597-021-00976-x

These data contain 2D sagittal-view RT-MRI videos along with synchronized  audio for 75 subjects performing linguistically motivated speech tasks,  alongside the corresponding raw RT-MRI data. The dataset also includes
3D volumetric vocal tract MRI during sustained speech sounds and  high-resolution static anatomical T2-weighted upper airway MRI for each  subject. 

Other speech production datasets of articulatory data that are also freely available  include  a TIMIT articulatory data corpus and emotional speech production data, all available from:  https://sail.usc.edu/span/resources.html

Back  Top

5-2-27Annoted tweet corpus , ELDA and INSA Rouen

ELDA and INSA Rouen Normandie partner to release the Annotated tweet corpus in Arabizi, French and English

ELDA and INSA Rouen Normandie are pleased to announce the release of the Annotated tweet corpus in Arabizi, French and English. This corpus was built by ELDA on behalf of INSA Rouen Normandie (Normandie Université, LITIS team), in the framework of the SAPhIRS project (System for the Analysis of Information Propagation in Social Networks), funded by the DGE (Direction Générale des Entreprises, France) through the RAPID programme (2017-2020). This project aimed at studying the mechanisms of information and opinion propagation within social networks: identifying influential leaders, detecting channels for disseminating information and opinion. The purpose of the corpus constitution, completed in 2020, was to collect and annotate tweets in 3 languages (Arabizi, French and English) for 3 predefined themes (Hooliganism, Racism, Terrorism).

The annotated tweet corpus in Arabizi, French and English can be found in the ELRA Catalogue under the following links and references:

 For more information and/or questions, please write to contact@elda.org.

 

About INSA Rouen Normandie

As a leading regional institute for research and higher education in engineering and among the main French establishments, INSA Rouen Normandie holds a major place in the landscape of engineering education in France. Its mission includes education (11 engineering courses including 4 apprenticeship programs, 2 master’s specialization and 7 masters), research (8 laboratories) and the spreading of scientific culture in the following fields of expertise: information systems, big data, mathematics, chemistry and processes, risks management and industrial site recovery, energy, propulsion systems, mechanics, industrial performance, civil engineering and urban design and planning. INSA Rouen Normandie graduates almost 400 engineers and is a member of the INSA group. It is closely tied to the world of industry and has established a large number of partnerships with international organisations.

To find out more about INSA Rouen Normandie, please visit: https://www.insa-rouen.fr/

 

Back  Top

5-2-28LR Agreement between ELRA and Datatang

Paris, France, October 24, 2022/

*LR Agreement with Datatang*

**

ELRA and Datatang signed a Language Resources distribution agreement to release a total
of 67 Speech Resources distributed by ELRA. With this agreement, ELRA is strengthening
its position as the leading worldwide distribution centre and Datatang is getting more
visibility on the European market.

Those resources were designed and collected to boost Speech Recognition in particular.
They cover the following languages:


 * Cantonese,
 * Chinese Mandarin,
 * Various dialects from China: Changsha, Kunming, Shanghai, Sichuan,
   Wuhan,
 * Several variants of English (English from Australia, Canada, China,
   France, Germany, India, Italy, Japan, Korea, Latin America,
   Portugal, Russia, Singapore, Spain, United Kingdom, USA),
 * French,
 * German,
 * Hindi,
 * Indonesian,
 * Italian,
 * Japanese,
 * Korean,
 * Malay,
 * Portuguese (Brazilian),
 * Russian,
 * Spanish (including non-hispanic Spanish),
 * Thai,
 * Uyghur,
 * Vietnamese.

The detailed list of all 67 Language Resources from Datatang can be found here
<http://www.elra.info/en/catalogues/language-resources-announcements/#Oct22>. *About
Datatang***

Founded in 2011, Datatang (Stock code: 831428) is a global AI data asset and data
solution provider. Datatang offers solutions for R&D needs with over 500 prepared AI
datasets covering ASR, TTS, CV and NLP. Relying on own data resources, technical
advantages and intensive data processing experiences, Datatang provides data services to
1,000+ companies and institutions worldwide.

To find out more about Datatang, please visit the website: https://www.datatang.ai/
<https://www.datatang.ai/>

*About ELRA*

The ELRA Language Resources Association (ELRA) is a non-profit-making organisation
founded by the European Commission in 1995, with the mission of providing a clearing
house for Language Resources and promoting Human Language Technologies. Language
Resources covering various fields of HLT (including Multimodal, Speech, Written,
Terminology) and a great number of languages are available from the ELRA catalogue.
ELRA's strong involvement in the fields of Language Resources and Language Technologies
is also emphasized at the LREC conference, organized every other year since 1998.

To find out more about ELRA, please visit the website: http://www.elra.info
<http://www.elra.info/>

For more information on the catalogue or if you would like to enquire about having your
resources distributed by ELRA, please contact us <mailto:contact@elda.org>.

Back  Top

5-2-29Biomedical language model DrBERT
We are proud to announce our first biomedical language model for French called DrBERT. It's now available on HuggingFace and Arxiv (https://arxiv.org/abs/2304.00958).
 
You can now use the model on your own documents and get state-of-the-art performances in only 3 lines of code.
 
Check out the:
- Hugging Face models: https://huggingface.co/Dr-BERT
 
 
Our model was trained on 128 GPU from Jean-Zay and assessed on 11 distinct practical biomedical tasks for French language, which came from public and private data. These tasks include : Named Entity Recognition (NER), Part-Of-Speech tagging (POS), binary/multi-class/multi-label classification, and multiple-choice question answering. The outcomes revealed that DrBERT enhanced the performance of most tasks compared to prior techniques, indicating that from-scratch pre-trained strategy is still the most effective for BERT language models on French Biomedical.
 
Tutorials about biomedical natural language processing are coming soon, stay tuned !!
 
With Yanis Labrak (LIA / Zenidoc), Adrien Bazoge (LS2N), Richard Dufour (LS2N), Mickael Rouvier (LIA), Emmanuel Morin (LS2N), Béatrice Daille (LS2N) and Pierre-Antoine Gourraud (Nantes University / CHU Nantes).
 
Back  Top

5-2-30A respiratory sounds and symptoms dataset: Coswara
We are happy to share information on the release of open-access dataset on respiratory sound samples with the community. Hoping the dataset finds use in understanding respiratory sounds (breathing, cough, vowel phonation, and speech) and building solutions for healthcare.
 
It will be great if this information can be shared with the ISCA community!
 
=====
=====
Back  Top

5-2-31ELDA releases air-traffic control data from ATCO2 Project

Press Release - Immediate
Paris, France, September 29, 2023

 

ELDA releases air-traffic control data from ATCO2 Project

ELDA is pleased to announce the release of ATCO2 Project Data. ATCO2 project collected the real-time voice communication between air-traffic controllers and pilots available either directly through publicly accessible radio frequency channels or indirectly from air-navigation service providers (ANSPs). In addition to the voice communication data, contextual information is available in a form of metadata (i.e. surveillance data).

The dataset consists of two subsets:

 

  • a corpus of more than 4000 hours with untranscribed data
  • a corpus of 4 hours with transcribed data of air-traffic control speech collected across different airports (Sion, Bern, Zurich, etc.). Ca. 1 hour of annotation has followed a human re-checking.

 

The dataset can be found in the ELRA Catalogue of Language Resources under the following links and references:

ATCO 2 Project DataISLRN: 589-403-577-685-7

For more information and/or questions, please write to contact@elda.org.

 

About ATCO2 project

ATCO2 project (Automated data collection and semi-supervised processing framework for deep learning) aims at developing a unique platform allowing to collect, organize and pre-process air-traffic control (voice communication) data from air space. It has received funding from the Clean Sky 2 Joint Undertaking (JU) under grant agreement No 864702. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and the Clean Sky 2 JU members other than the Union.

The following partners collaborated on this project:  IDIAP Research Institute (Switzerland), Opensky Network (Switzerland), ELDA (France), ReplayWell (Czech Republic), Romagnatech (Italy), Universität des Saarlandes (Germany), Brno University of Technology (Czech Republic).

To find out more about ATCO2 project, please visit: https://www.atco2.org/

 

About ELDA

The Evaluation and Language resources Distribution Agency (ELDA) identifies, collects, markets, and distributes language resources, along with the dissemination of general information in the field of Human Language Technologies (HLT). ELDA has considerable knowledge and skills in HLT applications. ELDA is part of major French, European and international projects in the field of HLT.

To find out more about ELDA, please visit our web site: http://www.elda.org/

 

Back  Top

5-2-32AVID (Aalto Vocal Intensity Database): An open speech/electroglottography repository for machine learning -based studies on vocal intensity

AVID (Aalto Vocal Intensity Database): An open speech/electroglottography repository for machine learning -based studies on vocal intensity

 

AVID is an open database, which includes speech and electroglottography (EGG) signals produced by 50 speakers (25 males, 25 females). The speakers varied their vocal intensity in four categories (soft, normal, loud and very loud). Each speaker produced 25 isolated sentences in English and read two paragraphs of text using the four intensity modes. These speaking tasks were repeated twice in two sessions. Recordings were conducted using a constant mouth-to-microphone distance and by recording a sound pressure level (SPL) calibration tone. The speech data is labeled sentence-wise with a total of 19 labels (1 categorical intensity category label and 18 continuous SPL labels). By launching the open AVID repository, the authors would like to raise awareness of the speech and voice research communities for machine learning (ML) - based studies of vocal intensity. We are particularly advocating the utilization of ML in a scenario where the original intensity information of speech is lost because the signal has been recorded without SPL calibration and is therefore presented on an arbitrary amplitude scale. In order to demonstrate how ML can be used together with the AVID database for these kinds of research problems, the interested reader is referred to our article (Alku, Kodali, Laaksonen, Kadiri, “AVID: A speech database for machine learning studies on vocal intensity”, Speech Communication, Vol. 157, Article 103039, 2024).

The AVID database is freely available at:

https://zenodo.org/records/10524873



Back  Top

5-3 Software
5-3-1Cantor Digitalis, an open-source real-time singing synthesizer controlled by hand gestures.

We are glad to announce the public realease of the Cantor Digitalis, an open-source real-time singing synthesizer controlled by hand gestures.


It can be used e.g. for making music or for singing voice pedagogy.

A wide variety of voices are available, from the classic vocal quartet (soprano, alto, tenor, bass), to the extreme colors of childish, breathy, roaring, etc. voices.  All the features of vocal sounds are entirely under control, as the synthesis method is based on a mathematic model of voice production, without prerecording segments.

The instrument is controlled using chironomy, i.e. hand gestures, with the help of interfaces like stylus or fingers on a graphic tablet, or computer mouse. Vocal dimensions such as the melody, vocal effort, vowel, voice tension, vocal tract size, breathiness etc. can easily and continuously be controlled during performance, and special voices can be prepared in advance or using presets.

Check out the capabilities of Cantor Digitalis, through performances extracts from the ensemble Chorus Digitalis:
http://youtu.be/_LTjM3Lihis?t=13s.

In pratice, this release provides:
  • the synthesizer application
  • the source code in the form of a Max package (GPL-like license)
  • a documentation for the musician and another for the developper
What do you need ?
  • a Mac OSX
  • ideally a Wacom graphic tablet, but it also works with your computer mouse
  • for the developers, the Max software
Interested ?
  • To download the Cantor Digitalis, click here
  • To subscribe to the Cantor Digitalisnewsletter and/or the forum list, or to contact the developers, click here
  • To learn about the Chorus Digitalis, ensemble of Cantor Digitalisand watch videos of performances, click here
  • For more details about the Cantor Digitalis, click here
 
Regards,
 
The Cantor Digitalis team (who loves feedback — cantordigitalis@limsi.fr)
Christophe d'Alessandro, Lionel Feugère, Olivier Perrotin
http://cantordigitalis.limsi.fr/
Back  Top

5-3-2MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP

 

We are happy to announce the release of our new toolkit “MultiVec” for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]’s word2vec features, Le and Mikolov [2014]’s paragraph vector (batch and online) and Luong et al. [2015]’s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification. The toolkit also includes C++ and Python libraries, that you can use to query bilingual and monolingual models.

 

The project is fully open to future contributions. The code is provided on the project webpage (https://github.com/eske/multivec) with installation instructions and command-line usage examples.

 

When you use this toolkit, please cite:

 

@InProceedings{MultiVecLREC2016,

Title                    = {{MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP}},

Author                   = {Alexandre Bérard and Christophe Servan and Olivier Pietquin and Laurent Besacier},

Booktitle                = {The 10th edition of the Language Resources and Evaluation Conference (LREC 2016)},

Year                     = {2016},

Month                    = {May}

}

 

The paper is available here: https://github.com/eske/multivec/raw/master/docs/Berard_and_al-MultiVec_a_Multilingual_and_Multilevel_Representation_Learning_Toolkit_for_NLP-LREC2016.pdf

 

Best regards,

 

Alexandre Bérard, Christophe Servan, Olivier Pietquin and Laurent Besacier

Back  Top

5-3-3An android application for speech data collection LIG_AIKUMA
We are pleased to announce the release of LIG_AIKUMA, an android application for speech data collection, specially dedicated to language documentation. LIG_AIKUMA is an improved version of the Android application (AIKUMA) initially developed by Steven Bird and colleagues. Features were added to the app in order to facilitate the collection of parallel speech data in line with the requirements of a French-German project (ANR/DFG BULB - Breaking the Unwritten Language Barrier). 
 
The resulting app, called LIG-AIKUMA, runs on various mobile phones and tablets and proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). It was used for field data collections in Congo-Brazzaville resulting in a total of over 80 hours of speech.
 
Users who just want to use the app without access to the code can download it directly from the forge direct link: https://forge.imag.fr/frs/download.php/706/MainActivity.apk 
Code is also available on demand (contact elodie.gauthier@imag.fr and laurent.besacier@imag.fr).
 
More details on LIG_AIKUMA can be found on the following paper: http://www.sciencedirect.com/science/article/pii/S1877050916300448
Back  Top

5-3-4Web services via ALL GO from IRISA-CNRS

It is our pleasure to introduce A||GO (https://allgo.inria.fr/ or http://allgo.irisa.fr/), a platform providing a collection of web-services for the automatic analysis of various data, including multimedia content across modalities. The platform builds on the back-end web service deployment infrastructure developed and maintained by  Inria?s  Service for Experimentation and Development (SED). Originally dedicated to multimedia content, A||GO progressively broadened to other fields such as computational biology, networks and telecommunications, computational graphics or computational physics.

As part of the CNRS PlaSciDo initiative [1], the Linkmedia team at IRISA / Inria Rennes is making available via A||GO a number of web services devoted to multimedia content analysis across modalities (language, audio, image, video). The web services provided currently include research results from the Linkmedia team as well as contribution from a number of partners. A list of the services available by the date is given below and the current state is available at
https://www-linkmedia.irisa.fr/software along with demo videos. Most web services are interoperable, facilitating the implementation of a multimedia content analysis processing chain, and are free to use for trial, prototyping or lab work. A brief and free account creation step will allow you to execute the web-services using either the graphical interface or a command line via a dedicated API.

We expect the number of web services to grow over time and invite interested parties to contact us should they wish to contribute the multimedia web service offer of A||GO.

List of multimedia content analysis tools currently available on A||GO:
- Audio Processing
        SaMuSa: music/speech segmentation
        SilAD: silence detection
        Radi.sh: repeated audio motif discovery
        LORIA STS v2: speech transcription for the French language from LORIA
        Multi channel BSS locate: audio source localization toolbox from IRISA-PANAMA
        A-spade: audio declipper from IRISA-PANAMA
        Transvox: voice faker from LORIA
- Natural Language Processing
        NERO: name entity recognition
        TermEx: keywords/indexing terms detection
        Otis!: topic segmentation
        Hi-tost: hierarchical topic structuring
- Video Processing
        Vidseg: video shot segmentation
        HUFA: face detection and tracking
Shortcuts to Linkmedia services are also available here:
https://www-linkmedia.irisa.fr/software/
 
For more information don't hesitate to contact us (
contact-multimedia-allgo@irisa.fr
). 

 
Gabriel Sargent and Guillaume Gravier
--
Linkmedia
IRISA - CNRS
Rennes, France

Back  Top

5-3-5Clickable map - Illustrations of the IPA

Clickable map - Illustrations of the IPA


We have produced a clickable map showing the Illustrations of the International Phonetic
Alphabet.

The map is being updated with each new issue of the Journal of the International Phonetic
Association.

https://richardbeare.github.io/marijatabain/ipa_illustrations_all.html

Marija Tabain - La Trobe University, Australia
Richard Beare - Monash University & MCRI, Australia

Back  Top

5-3-6LIG-Aikuma running on mobile phones and tablets

 


Dear all,

LIG is pleased to inform you that the website for the app Lig-Aikuma is online: https://lig-aikuma.imag.fr/
In the same time, an update of Lig-Aikuma (V3) was made available (see website).      

LIG-AIKUMA is a free Android app running on various mobile phones and tablets. The app proposes a range of different speech collection modes (recording, respeaking, translation and elicitation) and offers the possibility to share recordings between users. LIG-AIKUMA is built upon the initial AIKUMA app developed by S. Bird & F. Hanke (see https://en.wikipedia.org/wiki/Aikuma  for more information)

Improvements of the app:

  • Visual upgrade:
    + Waveform visualizer on the Respeaking and Translation modes (possibility to zoom in/out the audio signal)
    + File explorer included in all modes, to facilitate the navigation between files
    + New Share mode to share recordings between devices (by Bluetooth, Mail, NFC if available)
    + French and German languages available. In addition to English, the application now supports French and German languages. Lig-Aikuma uses by default the language of the phone/tablet.
    + New icons, more consistent to discriminate all type of files (audio, text, image, video)
  • Conceptual upgrade:
    + New name for the root project: ligaikuma ?> /! Henceforth, all data will be stored into this directory instead of ?aikuma? (in the previous versions of the app). This change doesn?t have compatibility issues. In the file explorer of the mode, the default position is this root directory. Just go back once with the left grey arrow (on the lower left of the screen) and select the ?aikuma? directory to access to your old recordings
    + Generation of a PDF consent form (from informations filled in the metadata form) that can be signed by linguist and speaker thanks to a pdf annotation tool (like Adobe Fill & Sign mobile app)
    + Generation of a CSV file which can be imported in Elan software: it will automatically create segmented tier, as it was done during a respeaking or a translation session. It will also mention by a ?non-speech? label that a segment has no speech.
    + Géolocalisation of the recordings
    + Respeak an elicit file: it is now possible to use in Respeaking or Translation mode an audio file initially recorded in Elicitation mode
  • Structural upgrade:
    + Undo button on Elicitation to erase/redo the current recording
    + Improvement session backup on Elicitation
    + Non-speech button in Respeaking and Translation modes to indicate by a comment that the segment does not contain speech (but noise or silent for instance)
    + Automatic speaker profile creation to quickly fill in the metadata infos if several sessions with a same speaker
Best regards,

Elodie Gauthier & Laurent Besacier
Back  Top

5-3-7Python Library
Nous sommes heureux d'annoncer la mise à disposition du public de la
première bibliothèque en langage Python pour convertir des nombres écrits en
français en leur représentation en chiffres.
 
L'analyseur est robuste et est capable de segmenter et substituer les expressions
de nombre dans un flux de mots, comme une conversation par exemple. Il reconnaît les différentes
variantes de la langue (quantre-vingt-dix / nonante?) et traduit aussi bien les
ordinaux que les entiers, les nombres décimaux et les séquences formelles (n° de téléphone, CB?).
 
Nous espérons que cet outil sera utile à celles et ceux qui, comme nous, font du traitment
du langage naturel en français.
 
Cette bibliothèque est diffusée sous license MIT qui permet une utilisation très libre.
 
 
-- 
Romuald Texier-Marcadé
Back  Top

5-3-8Evaluation des troubles moteurs de la parole MONPAGE version 2.0.s

Chères et chers collègues,  

Après plusieurs années de travail, nous avons le plaisir de vous annoncer la mise en ligne du protocole d?évaluation des troubles moteurs de la parole MONPAGE, version 2.0.s, à présent normalisé et validé.

Cet outil, mis librement à disposition de la communauté, est destiné à l?évaluation clinique des troubles moteurs de la parole légers à modérés (dysarthries et apraxies de la parole) chez l?adulte francophone. Il a été élaboré par un groupe de chercheurs et cliniciens belges, suisses, français et québécois. Il s?agit d?une batterie d?évaluation de la parole comprenant une passation informatisée (avec enregistrement des productions des patients) et des analyses perceptives et acoustiques semi-automatiques. Sa prise en main nécessite un minimum de compétences en phonétique acoustique.

Vous trouverez la présentation, les références, les ressources et les liens de téléchargement de MonPaGe-2.0.s sur le site : https://lpp.in2p3.fr/monpage/
Nous recommandons de commencer la prise en main en lisant le manuel de l'utilisateur.

N'hésitez pas à diffuser l'information!
Pour l?équipe MonPaGe,
Véronique


Prof. Véronique Delvaux, PhD
Chercheur qualifié FNRS à l'UMONS
Chargée de cours UMONS & ULB
Service de Métrologie et Sciences du Langage
Local ?1.7, Place du Parc, 18, 7000 Mons
+3265373140

Back  Top

5-3-9VocalTractLab3D: articulatory synthesis software.

Bonjour à tous,

 

Je vous informe par ce mail que le logiciel de synthèse articulatoire VocalTractLab3D est maintenant en ligne et à la disposition de tous librement:

 

https://vocaltractlab.de/index.php?page=vocaltractlab-download

 

VocalTractLab est un logiciel de synthèse articulatoire développé principalement par Peter Birkholz à la chaire de technologies de la parole et systèmes cognitifs de l’université de Dresde.

 

Au cours de mon postdoc j’ai travaillé à développer une version spéciale, VocalTractLab3D, qui inclut des simulations acoustiques 3Ds efficaces pilotées par une interface graphique.

Contrairement aux simulations acoustiques couramment utilisées dans l’étude de la parole, qui reposent sur une approche 1D basée sur la function d’aire, les simulations 3Ds décrivent le champ acoustique dans toutes les dimensions de l’espace et prennent en compte la forme 3D précise du conduit vocal.

Elles sont de fait plus précises, en particulier en haute fréquence (à partir d’environ 2-3 kHz).

Leur limitation est cependant le temps de calcul. Dans notre projet nous avons travaillé à repousser cette limite et notre logiciel permet de réaliser ce type de simulation dans un temps raisonnable (environ 1 heure pour une géométrie statique avec une solution précise).

 

Une autre limitation courante des simulations 3Ds est la nécessité de maitriser des methods de simulations assez techniques tells que les éléments finis, différences finies ou autres. Cela passe souvent par l’utilisation d’un language de programmation.

Nous avons également travaillé à repousser cette limite pour rendre accessible ce type de simulation au plus grand nombre: dans VocalTractLab3D ces simulations sont pilotées par une interface graphique et il n’est pas nécessaire de comprendre exactement comment fonctionne la méthode pour pouvoir calculer des fonctions de transfert ou des champs acoustiques.

 

Si suffisamment de personnes sont intéressées, je peux faire une présentation en ligne du logiciel, pour expliquer plus en detail en quoi il consiste, à quoi il peut servir et comment l’utiliser.

Ecrivez-moi si cela vous intéresse.

 

N’hésitez pas également à me contacter si vous avez des questions par rapport à ce logiciel.

 

Bien à vous,

 

Rémi Blandin

Back  Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA