ISCApad #190 |
Thursday, April 10, 2014 by Chris Wellekens |
Speechocean March 2014 update:
Speechocean: A global language resources and data services supplier
has over 500 large-scale databases available in 110+ languages and accents with the platform of desktop, in-car, telephony and tablet PC. Our data repository is enormous and diversified, which includes ASR Databases, TTS Databases, Lexica, Text Corpora, etc.
Speechocean is glad to announce that more resources have been released: ASR DatabasesSpeechocean provides 110+ regional languages corpora, available in a variety of formats, situational styles, scene environments and platform systems, covering In-car speech recognition corpora, mobile phone speech recognition corpora, fixed-line speech recognition corpora, desktop speech recognition corpora, etc. This month we released more Asian languages databases which were made for the tuning and testing purpose of speech recognition systems for speech ASR applications.
Chinese Mandarin Speech Recognition Database ---- (In-Car)-100 Speakers ID: King-ASR-122 This database was collected in China Mainland. It contains the voices of 100 different native speakers (50 males, 50 females) who were balanced according to age(mainly 18 – 30(62),31 – 45(28),46 – 60(10)), gender (Male 50%, Female 50%) and regional accents (Northern 60%, Wu 10%, Xiang 5%, Gan 5%, Kejia 5%, Min 5%, Cantonese 10%).
Each utterance is stored in a separate file and each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information. A pronunciation lexicon with a phonemic transcription in SAMPA is also included. All the data was transcribed and labeled. Japanese Speech Recognition Database ---- (In-Car)-800 Speakers ID: King-ASR-125 This Japan In-car Speech database was collected in Japan and contains the voices of 800 different native speakers who were demographically balanced according to Age (16-30, 31-45, and 46-60), Gender (400±5% males, 400±5% females) and Dialectical Region. The script was specially designed to provide material for both training and testing of many classes of speech recognizers which contains 16 general categories and more than 50 specific sub-categories. Each speaker was recorded under three driving environments (parked, city driving and highway driving) with recording conditions such as fan on/off and window up/down. A total of 300 utterances were recorded for each speaker in two of three driving environments (150 utterances and 10 spontaneous utterances per environment).
Japanese Speech Recognition Database ---- Conversation (Telephony)-201 Speakers This Japanese Speech Recognition database was collected in Japan and contains the voices of 201 different native speakers who were demographic balanced according to age distribution (16-28,29-60), Gender, Dialectical Regions. The corpus contains 100 pairs of spontaneous dialog speech data which were from 201 speakers. Each pair of speech consists of 3 audio files: two of them from single speaker separately and the other is from the mixed channel. The three files were recorded simultaneously. The pure recording time of mixed channel is about 104.8 hours. 33 topics were contained in this database.
There are 7,009 audio files which were saved as uncompressed PCM files. All the speech data was transcribed and labeled. 1.3 MobileKorean Speech Recognition Database—(Mobile)--1023 Speakers ID: King-ASR-137 The Korean mobile speech Recognition database which was collected in Korea, contains the voices of 1023 different native speakers (510±5%males, 513±5% females) who were balanced according to age (mainly 16 – 30,31 – 45,46 – 60), Gender and regional accents (for the details, please see the technical document).
Chinese Mandarin Speech Recognition Database---Sentences (Mobile) - (5048 Speakers) ID: King-ASR-216 This database is a desktop speech database collected by Speechocean which is performed in a quiet environment in China. This database is one of our databases of Speech Data ----Mobile Project (SDM) which contains the database collections in 30 languages presently.
The script was specially designed to provide material for both training and testing of many classes of speech recognizers. The script of each speaker contains 300 sentences which were randomly selected from a pool of sentences specially designed. Each speaker will be recorded as naturally as possible in quiet environment through Popular Mobile Phones such as of iPhones, HTC Samsung, MOTO and etc. which cover the platforms of ios, android and window mobile. The speech data are stored as sequences of 16 kHz, 16 bit and uncompressed PCM format. All the speech was manually transcribed and labeled. A pronunciation lexicon with a phonemic transcription in Pinyin is also included.
Indonesian Speech Recognition Database ---- Sentences (Desktop)-200 Speakers ID: King-ASR-061 This Indonesian Speech Recognition database was collected in Indonesia and contains the voices of 200 different native speakers who were demographic balanced according to age distribution (16–30, 31–45, 46–60) and Gender. It contains 239267 audio files with about 460.94 hours of recording. Each speaker uttered 300 sentences in a quiet office room. The whole data has been proofread manually with precise data labeling. Urdu Speech Recognition Database ----Sentences (Desktop)-200 Speakers ID: King-ASR-063 This Urdu Speech Recognition database, which was collected in Pakistan, contains the voices of 200 native speakers who were demographic balanced according to age distribution (16–60), gender, dialectical Regions, there were 241,354 audio files which were saved as uncompressed PCM files. All the speech data was transcribed and labeled. Vietnamese Speech Recognition Database ----Sentences (Desktop)-200 Speakers ID: King-ASR-074 This Vietnamese Speech Recognition database, which was collected in Vietnam, contains the voices of 200 native speakers who were demographic balanced according to age distribution (16–60), Gender, Dialectical Regions, there were 263,204 audio files which were saved as uncompressed PCM files. All the speech data was transcribed and labeled.
Speechocean licenses a variety of databases in more than 40 languages for speech synthesis broadcasting speech, emotional speech, etc. which can be used in different algorithms.
European Portuguese Speech Corpus for TTS (Female) ID: King-TTS-017 The European Portuguese (pt-PT) Speech Corpus consists native Portuguese female professional broadcaster (Female, 32 years old) recorded in a studio with high SNR (>35dB) over two channels (Shure SM15 microphone and Electroglottography (EGG) sensor).
The Corpus includes the following sub-corpora:
All reading prompts are manually revised and prosody annotations were made according to real speech. All speech data are segmented and labeled on phone level. Pronunciation lexicon and pitch extract from EEG can also be provided based on demands
Speechocean licenses many kinds of text corpora in many languages which is superb for language model training.
Speechocean builds pronunciation lexica in many languages which can be licensed to customers.
Contact Information Xianfeng Cheng Business Manager of Commercial Department Tel: +86-10-62660928; +86-10-62660053 ext.8080 Cell phone: +86 13681432590 Skype: xianfeng.cheng1 Email: chengxianfeng@speechocean.com; cxfxy0cxfxy0@gmail.com Website: www.speechocean.com
|
Back | Top |