| Speechocean – update (July 2014):
Speechocean: A global language resources and data services supplier
Speechocean has over 500 large-scale databases available in 110+ languages and accents with the platform of desktop, in-car, telephony and tablet PC. Our data repository is enormous and diversified, which includes ASR Databases, TTS Databases, Lexica, Text Corpora, etc.
Speechocean is glad to announce more resources that have been released:
ASR Databases
Speechocean provides 110+ regional languages corpora, available in a variety of formats, situational styles, scene environments and platform systems, covering In-car speech recognition corpora, mobile phone speech recognition corpora, fixed-line speech recognition corpora, desktop speech recognition corpora, etc. This month we released more European languages (Part One) databases which were made for the tuning and testing purpose of speech recognition systems for speech ASR applications.
-
In-Car
Serial Number
|
Kingline Data Names
|
Sound Parameter
|
Utterances
|
King-ASR-147
|
Italian Speech Recognition Corpus ( in car ) 300 Speakers
|
16 K,16 bit Four Channels
|
360230
|
King-ASR-153
|
Russian Speech Recognition Corpus ( in car) 308 Speakers
|
16 K,16 bit Four Channels
|
392,200
|
King-ASR-157
|
Polish Speech Recognition Corpus ( in car) 300 Speakers
|
16 K,16 bit Four Channels
|
356000
|
King-ASR-162
|
Dutch Speech Recognition Corpus (in car) 300 Speakers
|
16 K,16 bit Four Channels
|
360030
|
King-ASR-170
|
Danish Speech Recognition Corpus (in car) 300 Speakers
|
16 K,16 bit Four Channels
|
360058
|
King-ASR-172
|
Brazilian Portuguese Speech Recognition Corpus ( in car) 300 Speakers
|
16 K,16 bit Four Channels
|
360020
|
Telephony
Serial Number
|
Kingline Data Names
|
Sound Parameter
|
Utterances
|
King-ASR-219
|
Spanish Speech Recognition Corpus (Telephone) Conversational 1000 speakers
|
8K,16bit one Channel
|
300000
|
1.3 Mobile
Serial Number
|
Kingline Data Names
|
Sound Parameter
|
Utterances
|
King-ASR-149
|
Finnish Speech Recognition Corpus (mobile) 200 speakers
|
8K,16bit one channel
|
60000
|
King-ASR-154
|
Brazilian Portuguese Speech Recognition Corpus (mobile) Sentence (301 Speakers)
|
16 K, 16 bit one channel
|
90266
|
King-ASR-155
|
European Portuguese Speech Recognition Corpus (Mobile) 300 Speakers
|
16K,16bit one channel
|
90000
|
King-ASR-205
|
Turkish Speech Recognition Corpus (mobile) Sentence (302 Speakers)
|
16 K, 16 bit one channel
|
99471
|
King-ASR-206
|
Greek Speech Recognition Corpus (Mobile) 300 Speakers
|
22K,16bit one channel
|
45000
|
King-ASR-209
|
Dutch Speech Recognition Corpus (Mobile) 200 Speakers
|
16K,16bit one channel
|
60000
|
-
Desktop
Serial Number
|
Kingline Data Names
|
Sound Parameter
|
Utterances
|
King-ASR-181
|
Italian Speech Recognition Corpus (desktop) –Sentence 201 Speakers
|
44.1K,16bit Four Channels
|
240880
|
King-ASR-212
|
Polish Speech Recognition Corpus (Desktop) -Sentence (200 Speakers)
|
44.1K,16bit Two Channels
|
119948
|
King-ASR-084
|
Russian Speech Recognition Corpus (Desktop) –Comprehensive utterances (200 Speakers)
|
44.1K,16bit Four Channels
|
239936
|
King-ASR-158
|
Swedish Speech Recognition Corpus (Desktop) -Sentence (200 Speakers)
|
44.1K,16bit Two Channels
|
119938
|
King-ASR-159
|
Turkish Speech Recognition Corpus (Desktop) -Sentence (201 Speakers)
|
44.1K,16bit Two Channels
|
120578
|
-
TTS Databases
Speechocean licenses a variety of databases in more than 40 languages for speech synthesis broadcasting speech, emotional speech, etc. which can be used in different algorithms.
Serial No.
|
Kingline Data Names
|
Sound Parameter
|
Utterances
|
Recording Hours
|
King-TTS-016
|
Italian Speech Synthesis Database (Male)
|
44.1K,20bit Two Channels
|
7211
|
9.09
|
King-TTS-017
|
Portugal Portuguese Speech Synthesis Database (Female)
|
44.1K,21bit Two Channels
|
9900
|
12.6
|
King-TTS-018
|
Portugal Portuguese Speech Synthesis Database (Male)
|
44.1K,22bit Two Channels
|
5000
|
Under Building
|
King-TTS-019
|
Russian Speech Synthesis Database (Female)
|
44.1K,23bit Two Channels
|
8143
|
14.59
|
King-TTS-020
|
Russian Speech Synthesis Database (Male)
|
44.1K,24bit Two Channels
|
8216
|
12.32
|
King-TTS-021
|
Polish Speech Synthesis Database (Female)
|
44.1K,25bit Two Channels
|
5000
|
Under Building
|
King-TTS-022
|
Turkish Speech Synthesis Database (Female)
|
44.1K,26bit Two Channels
|
5000
|
Under Building
|
King-TTS-029
|
Ukrainian Speech Synthesis Database (Female)
|
44.1K,27bit Two Channels
|
5000
|
Under Building
|
-
Text Corpora
Speechocean licenses many kinds of text corpora in many languages which is superb for language model training.
ID
|
Kingline Data Names
|
Languages
|
Size
|
King-NLP-027
|
Database of Arab Names
|
Arabic
|
7000000Words
|
King-NLP-028
|
Database of Arab Names in Arabic
|
Arabic
|
222000Words
|
King-NLP-029
|
Italian Personal Names Corpus
|
Italian
|
Under Building
|
King-NLP-030
|
Italian Spanish Address Corpus
|
Italian
|
Under Building
|
King-NLP-031
|
Portuguese Personal Names Corpus
|
Portuguese
|
Under Building
|
King-NLP-032
|
Portuguese Address Corpus
|
Portuguese
|
Under Building
|
King-NLP-033
|
Polish personal names corpus
|
Polish
|
Under Building
|
-
Lexica
Speechocean builds pronunciation lexica in many languages which can be licensed to customers.
No.
|
Name
|
Phoneme Set
|
King-Lexicon-027
|
Czech Pronunciation Lexicon
|
SAMPA
|
King-Lexicon-028
|
Greek Pronunciation Lexicon
|
SAMPA
|
King-Lexicon-029
|
Hungarian Pronunciation Lexicon
|
SAMPA
|
King-Lexicon-031
|
Catalan Pronunciation Lexicon
|
UPC
|
King-Lexicon-036
|
Arabic Pronunciation Lexicon
|
Under Building
|
King-Lexicon-037
|
Brazilian Portuguese Pronunciation Lexicon
|
SAMPA
|
King-Lexicon-043
|
Norwegian Pronunciation Lexicon
|
XSAMPA
|
Contact Information
Xianfeng Cheng
Business Manager of Commercial Department
Tel: +86-10-62660928; +86-10-62660053 ext.8080
Mobile: +86 13681432590
Skype: xianfeng.cheng1
Email: chengxianfeng@speechocean.com; cxfxy0cxfxy0@gmail.com
Website: www.speechocean.com
|