ISCA Services

ISCA - International Speech
Communication Association

Previous

ISCApad Archive » 2011 » ISCApad #160 » Resources » Database » Speechocean October 2011 update

ISCApad #160

Saturday, October 08, 2011 by Chris Wellekens

5-2-4 Speechocean October 2011 update

SpeechOcean China also has about 200+ large language resources and some of databases can be freely used to our members for academic research purpose. As a ISCA member, we will be also glad to share these databases to other ISCA members,

www.speechocean.com

Speechocean - Language Resource Catalogue - New Released (2011-10)

Speechocean, as a global provider of language resources and data services, has more than 200 large-scale databases available 80+ languages and accents covering the fields of Text to Speech, Automatic Speech Recognition, Text, Machine Translation, Web Search, Videos, Images etc.

Speechocean is glad to announce that more Speech Resources has been released:

Turkish speech recognition Database (Desktop) --- 201 speakers

This Turkish desktop speech recognition database was collected by Speechocean’s project team in Turkey. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections in 30 languages presently.
It contains the voices of 201 different native speakers (104 males, 97 females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing for many classes of speech recognizers. Each speaker was recorded in a quiet office environment and 300 phonetically rich sentences were randomly selected from a pool of sentences specially designed.

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/789.html

Turkish speech recognition Database (In-car) --- 316 speakers

This Turkish in-car speech recognition database was collected by Speechocean’s project team in Turkey. This database is one of our databases of Speech Data---Car (SDC) Project, which contains the database collections in more than 30 languages presently.
It contains the voices of 316 different native speakers who were balanced distributed by age (mainly 16-30,31-45,46-60), gender (156males, 160 females) and regional accents.

The script was specially designed to provide material for both training and testing of many classes of speech recognizers, and contain 320 utterances covering 15 categories and 35 sub-categories for each speaker. Each speaker was recorded under two environments in three variations (Parked, City Driving and Highway driving) with kinds of recording conditions such as motor running, fan on/off, window up/down and etc. A total of 320 utterances were recorded for each speaker under two environments (160 utterances and spontaneous sentences per environment).

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/793.html

France French speech recognition Database (Desktop) --- 200 speakers

This France French desktop speech recognition database was collected by Speechocean’s project team in France. This database is one of our databases of Speech Data ----Desktop Project (SDD)which contains the database collections in 30 languages presently.
It contains the voices of 200 different native speakers (100 males, 100 females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing for many classes of speech recognizers. Each speaker was recorded in a quiet office environment and 500 utterances and spontaneous which includes 13 categories and about 40 sub-categories such as contact names, directory assistant names, application words, album titles, query words, etc. were recorded for each speaker.

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/796.html

Spain Spanish speech recognition Database (Desktop) --- 210 speakers

This Spain Spanish desktop speech recognition database was collected by Speechocean’s project team in Spain. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections in 30 languages presently.
It contains the voices of 210 different native speakers (102males, 108 females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognizers. Each speaker were recorded in a quiet office environment and 500 utterances and spontaneous which includes 13 categories and about 40 sub-categories such as contact names, directory assistant names, application words, album titles, query words, etc. were recorded for each speaker.

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/795.html

UK English speech recognition Database (Desktop) --- 200 speakers

This UK English desktop speech recognition database was collected by Speechocean’s project team in UK. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections in 30 languages presently.
It contains the voices of 200 different native speakers (106 males, 94 females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognizers, each speaker were recorded in a quiet office environment and 300 phonetically rich sentences which was randomly selected from a pool of sentences specially designed.

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/792.html

Portugal Portuguese speech recognition Database (Desktop) --- 200 speakers

This Portugal Portuguese desktop speech recognition database was collected by Speechocean’s project team in Portugal. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections in 30 languages presently.
It contains the voices of 200 different native speakers (101 males, 99 females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognizers, each speaker were recorded in a quiet office environment and 300 phonetically rich sentences which was randomly selected from a pool of sentences specially designed.

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/791.html

Swedish speech recognition Database (Desktop) --- 200 speakers

This Swedish desktop speech recognition database was collected by Speechocean’s project team in Sweden. This database is one of our databases of Speech Data ----Desktop Project (SDD) which contains the database collections in 30 languages presently.
It contains the voices of 200 different native speakers (118 males, 82 females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognizers, each speaker will be recorded in a quiet office environment and 300 phonetically rich sentences which was randomly selected from a pool of sentences specially designed.

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/790.html

Canadian French Desktop speech recognition Corpus (200 speakers) was launched in Canada

Based on our client's urgent demands, the Canadian French desktop speech recognition database (200 speakers) was collected by Speechocean’s project team in Canada. This database belongs to Speechocean's Desktop Speech Data Project.
It contains the voices of 200 different native speakers (100males, 100females) who were balanced distributed by age, gender and regional accents. The script was specially designed to provide material for both training and testing of many classes of speech recognizers. Each speaker were recorded in a quiet office environment and 500 utterances and spontaneous were recorded for each speaker.

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/733.html

Chinese Mandarin In-car Speech Recognition Database was Successful Released!

Chinese Mandarin In-car Speech Recognition Database was successfully released with the catalogue serial number of King-ASR-122 in our Catalogue. This database was made for the tuning and testing purpose of speech recognition system for car-using. It belongs to SPC’s Multi-language In-car Speech Data Project.
The Database which were collected in China Mainland, contains the voices of 100 different native speakers (50males, 50females) who were balanced according by age(mainly 18 – 30（62），31 – 45（28）,46 – 60（10）), gender (Male50%, Female50%) and regional accents (Northern60%, Wu10%, Xiang5%, Gan5%, Kejia 5%, Min5%, Cantonese10%).

The script was specially designed to provide material for both training and testing of many classes of speech recognizers which contain 320 utterances covering 15 categories and 35 sub-categories for each speaker.
Each speaker was recorded under two environments in three variations (Parked, City Driving and Highway driving) with various kinds of recording conditions such as motor running, fan on/off, window up/down, etc. In total, 320 utterances were recorded for each speaker under two environments (160 utterances and spontaneous sentences per environment)

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/781.html

The American Spanish Mobile speech Recognition database was Successful Released!

The American Spanish Mobile speech Recognition database was successfully released with the catalogue serial number of King-ASR-119. This database was made for the tuning and testing purpose of speech recognition system for IVR / mobile. It belongs to SPC’s Multi-language Mobile Speech Data Project.
The database which was collected in America, contains the voices of 40 different native speakers (21 males, 19 females) who were balanced according to age (mainly 16-30,31-45,46-60), gender and regional accents.

All audio files are manually transcribed and labelled. A pronunciation lexicon with a phonetic transcription in SAMPA is also included.

For more information, please see the technical document at the following link:

http://www.speechocean.com/en-ASR-Corpora/779.html

Visit our on-line Catalogue: http://www.speechocean.com/en-Product-Catalogue/Index.html

For more information about our Database and Services please visit our website www.Speechocen.com

If you have any inquiry regarding our databases and service please feel free to contact us:

XiangFeng Cheng mailto:Chengxianfeng@speechocean.com

Marta Gherardi mailto:Marta@speechocean.com

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy

© Copyright 2026 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA