ISCA - International Speech
Communication Association

ISCApad Archive  »  2012  »  ISCApad #164  »  Resources  »  Database

ISCApad #164

Saturday, February 11, 2012 by Chris Wellekens

5-2 Database
5-2-1Nominations for the Antonio Zampoli Prize (ELRA)

The ELRA Board has created a prize to honour the memory of its first President, Professor Antonio Zampolli, a pioneer and visionary scientist
who was internationally recognized in the field of computational linguistics and Human Language Technologies (HLT). He also contributed
much through the establishment of ELRA and the LREC conference.

To reflect Antonio Zampolli's specific interest in our field, the Prize will be awarded to individuals whose work lies within the areas of
Language Resources and Language Technology Evaluation with acknowledged contributions to their advancements.

The Prize will be awarded for the fifth time in May 2012 at the LREC 2012 conference in Istanbul (21-27 May 2012 ). Completed nominations should
be sent to the ELRA President Stelios Piperidis @ no later than February 1st, 2012.

On behalf of the ELRA Board
Stelios Piperidis

Please visite ELRA web site for the Antonio Zampolli Prize Statutes and the nomination procedure:

Back  Top

5-2-2ELRA - Language Resources Catalogue - Update (2012-01)

ELRA - Language Resources Catalogue - Update

ELRA is happy to announce that 14 new Speech Resources are now available in its catalogue.

ELRA-S0324 Catalan-SpeechDat For the Fixed Telephone Network Database
This speech database contains the recordings of 2000 Catalan speakers who called from Fixed telephones and who are recorded over the fixed PSTN using and ISDN-BRI interface. Each speaker uttered around 50 read and spontaneous items. The speech database follows the specifications made within the SpeechDat (II) project. The database was validated by UVIGO. The Catalan-SpeechDat for the Fixed Telephone Network Database was funded by the Catalan Government.
For more information, see:

ELRA-S0325 Catalan-SpeechDat for the Mobile Telephone Network Database
This speech database contains the recordings of 2000 Catalan speakers who called from GSM telephones and who are recorded over the fixed PSTN using and ISDN-BRI interface. Each speaker uttered around 50 read and spontaneous items. The speech database follows the specifications made within the SpeechDat (II) project. The database was validated by UVIGO. The Catalan-SpeechDat for the Mobile Telephone Network Database was funded by the Catalan Government.
For more information, see:

ELRA-S0326 Catalan SpeechDat-Car database
The Catalan SpeechDat-Car database contains the in-car recordings of 300 speakers who uttered from around 120 read and spontaneous items. Each speaker recorded two sessions. Recordings have been made through 4 different channels, via in-car microphones (1 close-talk microphone, 3 far-talk microphones). The 300 Catalan speakers were selected from 5 different dialectal regions and are balanced in gender and age groups. The database was validated by UVIGO. The Catalan-SpeechDat-Car Database was funded by the Catalan Government.
For more information, see:

ELRA-S0327 Catalan Speecon database
The Catalan Speecon database comprises the recordings of 550 adult Catalan speakers who uttered over 290 items (read and spontaneous). The data were recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place). The speech database follows the specifications made within the UE funded Speecon project. The database was validated by UVIGO. The Catalan-Speecon Database was funded by the Catalan Government.
For more information, see:

ELRA-S0328 Spanish EUROM.1
EUROM1 is a multilingual European speech database. It contains over 60 speakers per language who pronounced numbers, sentences, isolated words ... using close talking microphone in an anecoic room. Equivalent corpora for each of the European languages exist already, with the same number of speakers selected in the same way, and recorded in the same conditions with common file formats.
For more information, see:

ELRA-S0329 Emotional speech synthesis database
This database contains the recordings of one male and one female Spanish professional speakers recorded in a noise-reduced room. It consists in recordings and annotations of read text material in neutral style plus six MPEG expressions, all in fast, slow, soft and loud speech styles. The text material is composed of 184 items including phonetically balanced sentences, digits and isolated words. The text material was the same for all the modes and styles, giving a total of 3h 59min recorded speech for the male speaker and 3h 53min for the female speaker. The Emotional speech synthesis database was created within the scope of the Interface EU funded project.
For more information, see:

ELRA-S0330 FESTCAT Catalan TTS baseline male speech database
This database contains the recordings of one male Catalan professional speaker recorded in a noise-reduced room simultaneously through a close talk microphone, a mid distance microphone and a laryngograph signal. This database consists in the recordings and annotations of read text material of approximately 10 hours of speech for baseline applications (Text-to-Speech systems). The FESTCAT Catalan TTS Baseline Male Speech Database was created within the scope of the FESTCAT project, funded by the Catalan Government.
For more information, see:

ELRA-S0331 FESTCAT Catalan TTS baseline female speech database
This database contains the recordings of one female Catalan professional speaker recorded in a noise-reduced room simultaneously through a close talk microphone, a mid distance microphone and a laryngograph signal. It consists in the recordings and annotations of read text material of approximately 10 hours of speech for baseline applications (Text-to-Speech systems). The FESTCAT Catalan TTS Baseline Female Speech Database was created within the scope of the FESTCAT project funded by the Catalan Government.
For more information, see:

ELRA-S0332 FESTCAT Catalan TTS baseline speech database - 8 speakers
This database contains the recordings of four female and four male Catalan professional speakers recorded in a noise-reduced room simultaneously through a close talk microphone, a mid distance microphone and a laryngograph signal. It consists of the recordings and annotations of read text material of approximately 1 hour of speech per speaker for baseline applications (Text-to-Speech systems). The FESTCAT Catalan TTS baseline speech database - 8 speakers was created within the scope of the FESTCAT project funded by the Catalan Government.
For more information, see:

ELRA-S0333 Spanish Festival HTS models - male speech
This database contains the Festival HTS models trained with 10h of speech from the TC-STAR Spanish Baseline Male Speech Database (ELRA-S0310).
For more information, see:

ELRA-S0334 Spanish Festival HTS models - female speech
This database contains the Festival HTS models trained with 10h of speech from the TC-STAR Spanish Baseline Female Speech Database (ELRA-S0309).
For more information, see:

ELRA-S0335 Bilingual (Spanish-English) Speech synthesis HTS models
This database contains Bilingual (English and Spanish) Festival HTS models. Models were trained with 9h of speech from 2 female bilingual speakers and 2 male bilingual speakers. Each speaker recorded 2h 15 min per language. The speech data can be found in the TC-STAR Bilingual Voice-Conversion Spanish Speech Database (ELRA-S0311) and in the TC-STAR Bilingual Expressive Spanish Speech Database (ELRA-S0313).
For more information, see:

ELRA-S0336 Spanish Festival voice male
This database contains the recordings of one male Spanish speaker recorded in a noise-reduced room simultaneously through a close talk microphone, a mid distance microphone and a laryngograph signal. This comprises read text material of approximately 10 hours of speech for baseline applications (Text-to-Speech systems). The database includes Festival-compatible annotations. The recordings can be also found under TC-STAR Spanish Baseline Male Speech Database (ELRA-S0310).
For more information, see:

ELRA-S0337 Spanish Festival voice female
This database contains the recordings of one female Spanish speaker recorded in a noise-reduced room simultaneously through a close talk microphone, a mid distance microphone and a laryngograph signal, of read text material of approximately 10 hours of speech for baseline applications (Text-to-Speech systems). The database includes Festival-compatible annotations. The recordings can be also found under TC-STAR Spanish Baseline Female Speech Database (ELRA-S0309).
For more information, see:

For more information on the catalogue, please contact Valérie Mapelli

Visit our On-line Catalogue:
Visit the Universal Catalogue:
Archives of ELRA Language Resources Catalogue Updates:

Back  Top

5-2-3LDC Newsletter (January 2012)

In this newsletter:


-  LDC Celebrates its 20th Anniversary!  -


-  2012 LDC Survey – Be on the Lookout!  -


-  Membership Discounts for MY 2012 Still Available  -

New publications:

-  2006 NIST Speaker Recognition Evaluation Test Set Part 2  -

-  TORGO Database of Dysarthric Articulation  -


LDC Celebrates its 20th Anniversary!

2012 marks LDC’s  20th Anniversary year – officially on April 15 – but this is cause for a yearlong celebration! From our founding in 1992 as a data repository and language resource distribution center, our online catalog has grown to include over 500 databases in 60 languages that  have been licensed by over 3000 organizations from 80 different nations.  This data has been made available through donations, funded projects at LDC or elsewhere, community initiatives, and from LDC resources, an indication of the collective strength of this consortium. And, LDC has evolved from an organization that shares language resources to one that also is at the forefront of language technology research that includes the development of new data resources, software tools, and standards and best practices.

As we celebrate throughout the year, look for announcements and special features in our newsletter and on our Facebook page.


2012 LDC Survey – Be on the Lookout!

It’s been four years since our last survey of LDC members and data licensees and we would like to again ask you to share your views on LDC and its language resources as well as your thoughts about data distribution in general and the impact of social media on language-related research and technology development. These topics are particularly timely as LDC enters its 20th anniversary year.

The 2012 LDC Survey will be sent to every person and organization that licensed LDC data and/or joined LDC as a Member during the period from 2009 through 2011. Those who complete the survey on or before February 7, 2012 will make their organization  eligible for a $500 benefit to be applied to any corpus or membership purchase in 2012. LDC will conduct a blind drawing and one lucky winner will be chosen from the pool of respondents.

Many thanks for your continued support and for your participation in the 2012 Survey!

Membership Discounts for MY 2012 Still Available

If you are considering joining for Membership Year 2012 (MY2012), there is still time to save on membership fees.   Any organization which joins or renews membership for 2012 through Thursday, March 1, 2012, is entitled to a 5% discount on membership fees.  Organizations which held membership for MY2011 can receive a 10% discount on fees provided they renew prior to March 1, 2012.  For further information on pricing, please consult our Announcements page or contact LDC.

New Publications

(1) 2006 NIST Speaker Recognition Evaluation Test Set Part 2 was developed by LDC and National Institute of Standards and Technology (NIST). It contains 568 hours of conversational telephone and microphone speech in English, Arabic, Bengali, Chinese, Farsi, Hindi, Korean, Russian, Spanish, Thai and Urdu and associated English transcripts used as test data in the NIST-sponsored 2006 Speaker Recognition Evaluation (SRE).

The task of the 2006 SRE evaluation was speaker detection, that is, to determine whether a specified speaker is speaking during a given segment of conversational telephone speech. The task was divided into 15 distinct and separate tests involving one of five training conditions and one of four test conditions. Further information about the test conditions and additional documentation is available at the NIST web site for the 2006 SRE and within the 2006 SRE Evaluation Plan.

LDC has previously published 2006 NIST Speaker Recognition Evaluation Training Set and 2006 NIST Speaker Recognition Evaluation Test Set Part 1.

The speech data in this release was collected by LDC as part of the Mixer project, in particular Mixer Phases 1, 2 and 3. The Mixer project supports the development of robust speaker recognition technology by providing carefully collected and audited speech from a large pool of speakers recorded simultaneously across numerous microphones and in different communicative situations and/or in multiple languages. The data is mostly English speech, but includes some speech in Arabic, Bengali, Chinese, Farsi, Hindi, Korean, Russian, Spanish, Thai and Urdu.

The telephone speech segments are multi-channel data collected simultaneously from a number of auxiliary microphones. The files are organized into four types: two-channel excerpts of approximately 10 seconds, two-channel conversations of approximately 5 minutes, summed-channel conversations also of approximately 5 minutes and a two-channel conversation with the usual telephone speech replaced by auxiliary microphone data in the putative target speaker channel. The auxiliary microphone conversations are also of approximately five minutes in length.  English language transcripts in .ctm format were produced using an automatic speech recognition (ASR) system.

2006 NIST Speaker Recognition Evaluation Test Set Part 2 is distributed on seven DVD-ROM.

2012 Subscription Members will automatically receive two copies of this corpus. 2012 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$2000.


(2) TORGO Database of Dysarthric Articulation was developed by the University of Toronto's departments of Computer Science and Speech Language Pathology in collaboration with the Holland-Bloorview Kids Rehabilitation Hospital in Toronto, Canada. It contains approximately 23 hours of English speech data, accompanying transcripts and documentation from 8 speakers (5 males, 3 females) with cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS) and from 7 speakers (4 males, 3 females) from a non-dysarthric control group.

CP and ALS are examples of dysarthria which is caused by disruptions in the neuro-motor interface that distort motor commands to the vocal articulators, resulting in atypical and relatively unintelligible speech in most cases. The TORGO database is primarily a resource for developing advanced automatic speaker recognition (ASR) models suited to the needs of people with dysarthria, but it is also applicable to non-dysarthric speech. The inability of modern ASR to effectively understand dysarthric speech is a problem since the more general physical disabilities often associated with the condition can make other forms of computer input, such as computer keyboards or touch screens, difficult to use.

The data consists of aligned acoustics and measured 3D articulatory features from the speakers carried out using the 3D AG500 electro-magnetic articulograph (EMA) system (Carstens Medizinelektronik GmbH, Lenglern, Germany) with fully-automated calibration. This system allows for 3D recordings of articulatory movements inside and outside the vocal tract, thus providing a detailed window on the nature and direction of speech-related activity.

All subjects read text consisting of non-words, short words and restricted sentences from a 19-inch LCD screen. The restricted sentences included 162 sentences from the sentence intelligibility section of Assessment of intelligibility of dysarthric speech (Yorkston & Beukelman, 1981) and 460 sentences derived from the TIMIT database. The unrestricted sentences were elicited by asking participants to spontaneously describe 30 images in interesting situations taken randomly from Webber Photo Cards - Story Starters (Webber, 2005), designed to prompt students to tell or write a story.

Data is organized by speaker and by the session in which each speaker recorded data. Each speaker's directory contains 'Session' directories which encapsulate data recorded in the respective visit and occasionally, a 'Notes' directory which can include Frenchay assessments (test for the measurement, description and diagnosis of dysarthria), notes about sessions (e.g., sensor errors), and other relevant notes.

TORGO Database of Dysarthric Articulation is distributed on 4 DVD-ROM.

2012 Subscription Members will automatically receive two copies of this corpus. 2012 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$1200.

Back  Top

5-2-4Speechocean January 2012 update

Speechocean - Language Resource Catalogue - New Released (01- 2012)

Speechocean, as a global provider of language resources and data services, has more than 200 large-scale databases available in 80+ languages and accents covering the fields of Text to Speech, Automatic Speech Recognition, Text, Machine Translation, Web Search, Videos, Images etc.


Speechocean is glad to announce that more Speech Resources has been released:


Chinese and English Mixing Speech Synthesis Database (Female)

The Chinese Mandarin TTS Speech Corpus contains the read speech of a native Chinese Female professional broadcaster recorded in a studio with high SNR (>35dB) over two channels (AKG C4000B microphone and Electroglottography (EGG) sensor). 
The Corpus includes the following categories:
1.    Basic Mandarin sub-corpus: including 5,000 utterances which were carefully designed considering all kinds of linguistic phenomena. All sentences were declarative and extracted from News channels of People's Daily, China Daily, etc. The prompts with negative words were carefully excluded. ONLY suitable length sentences were accepted (7~20 words, in average 14 words). This sub-corpus can be used for R&D of HMM-based TTS, Limit domain TTS and Small-scale concatenative TTS;
2.    Complementary Mandarin sub-corpus: including 10,000 utterances which were carefully designed considering all kinds of linguistic phenomena. All sentences were declarative and extracted from News channels of People's Daily, China Daily, etc. The prompts with negative words are carefully excluded. ONLY suitable length sentences were accepted (7~20 words, average 14 words). This sub-corpus is a complementary corpus for Basic Mandarin sub-corpus and can be used for R&D of Large-scale concatenative TTS;
3.    Mandarin Neutral sub-corpus: including 380 Chinese bi-syllable words which embedded in carrier sentences;
4.    Mandarin ERHUA sub-corpus: including 290 Chinese Erhua syllables which embedded in carrier sentences;
5.    Mandarin Digit-String sub-corpus: including 1250 utterances with 3-digit length which considered the different pronunciation of 1, i.e. “yi1” and “yao1”.
6.    Mandarin Question sub-corpus: including 300 question sentences with common used question mark, for example “吗”, “么”, “呢”, and etc.;
7.    Mandarin exclamatory sub-corpus: including 200 exclamatory sentences with common used exclamatory mark, for example “呀”, “啊”, “吧”, “啦”, and etc.;
8.    Chinese English sentence sub-corpus: including 1,000 sentences which were carefully designed considering bi-phone coverage. All sentences were extracted from News channels of Voice of America (VOA), and etc. The prompts with negative words are carefully excluded. ONLY suitable length sentences were accepted (7~20 words, in average 12 words) and phonetically annotated with SAMPA. This sub-corpus can be used for R&D of HMM-based TTS, Limit domain TTS and Small-scale concatenative TTS;
9.    Chinese English words sub-corpus: including about 6,000 commonly used English words which embedded in carrier sentence;
10.    Chinese English Abbreviation sub-corpus: including about 200 utterances which considered not only the alphabet coverage, but also the combination of character and digit, such as “MP4”;
11.    Chinese English Letter sub-corpus: including 26 carrier utterances with each letter embedded in the Beginning, Middle and End;
12.    Chinese Greek Letter sub-corpus: including 24 carrier utterances with each letter embedded in the Beginning, Middle and End.

All speech data are segmented and labeled on phone level. Pronunciation lexicon and pitch extract from EEG can also be provided based on demands.


France French Speech Recognition Corpus (desktop) – 50 speakers

This France French desktop speech recognition database was collected by SpeechOcean in France. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (28 males, 22 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.


UK English Speech Recognition Corpus (desktop) – 50 speakers

This UK English desktop speech recognition database was collected by SpeechOcean in England. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (28 males, 22 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.


US English Speech Recognition Corpus (desktop) – 50 speakers

This US English desktop speech recognition database was collected by SpeechOcean in America. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (25 males, 25 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.


Italian Speech Recognition Corpus (desktop) – 50 speakers

This Italian desktop speech recognition database was collected by SpeechOcean in Italy. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections for 30 languages presently. 

It contains the voices of 50 different native speakers who were balanced distributed by age (mainly 16 – 30, 31 – 45, 46 – 60), gender (23 males, 27 females) and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognition applications. Each speaker recorded 500 utterances in a quiet office environment through two professional microphones. Each utterance is stored as 44.1K 16Bit uncompressed PCM format and accompanied by an ASCII SAM label file which contains the relevant descriptive information.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.


For more information about our Database and Services please visit our website or visit our on-line Catalogue at

If you have any inquiry regarding our databases and service please feel free to contact us:

Xianfeng Cheng mailto:

Marta Gherardi mailto:



Back  Top

5-2-5ELDA Distribution Campaign 2011


ELDA Distribution Campaign 2011


ELDA is launching a special distribution campaign offering very favorable conditions for the language resources acquisition,
including discounts on public prices, from the ELRA Catalogue of Language Resources (see

This offer will be open until the end of December 2011.
For more information on this offer, please contact Valérie Mapelli (

Visit our On-line Catalogue:
Visit the Universal Catalogue:
Archives of ELRA Language Resources Catalogue Updates:

Back  Top

 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA