ISCA - International Speech
Communication Association


ISCApad Archive  »  2014  »  ISCApad #197  »  Resources  »  Database  »  Speechocean – update (October 2014)

ISCApad #197

Wednesday, November 12, 2014 by Chris Wellekens

5-2-14 Speechocean – update (October 2014)
  

Speechocean – update (Oct 2014):

Speechocean: A global language resources and data services supplier

Speechocean has over 500 large-scale databases available in 110+ languages and accents with the platform of desktop, in-car, telephony and tablet PC. Our data repository is enormous and diversified, which includes ASR Databases, TTS Databases, Lexica, Text Corpora, etc.

 

Speechocean is glad to announce more resources that have been released:

ASR Databases

Speechocean provides 110+ regional languages corpora, available in a variety of formats, situational styles, scene environments and platform systems, covering In-car speech recognition corpora, mobile phone speech recognition corpora, fixed-line speech recognition corpora, desktop speech recognition corpora, etc. This month we are glad to introduce our most popular Asia Languages Databases which were made for the tuning and testing purpose of speech recognition systems for speech ASR applications.

  • In-Car

 

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-122

Chinese Mandarin Speech Recognition Database -(In car) 100 Speakers

48k,16bit

200,796

Four Channels

King-ASR-120

Chinese Mandarin Speech Recognition Database-(in car )160 Speakers

16 K16 bit

1434028

Four Channels

 

  • Mobile

 

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-216

Chinese Mandarin Speech Recognition Database-Sentences (Mobile)--(5048 speakers)

16K,16bit

1514028

One Channel

King-ASR-137

Korean Speech Recognition Database—(Mobile)--1023 speakers

16K,16bit

306692

One Channel

King-ASR-058

Cantonese Speech Recognition Database ----(Mobile)-3020 Speakers

16K,16bit

965128

One Channel

 

  • Telephony

 

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-222

Japanese Speech Recognition Database ----
spontaneous dialog (Telephony)-200 Speakers

8k,16bit

300

King-ASR-027

Chinese Mandarin Speech Recognition Database ---- Spontaneous Speech (Telephone)-649 Speakers

8k,16bit

1556

 

  • Desktop

 

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-062

Thai Speech Recognition Database ----
Sentences (Decktop)-200 Speakers

44.1k,16bit

327,754

King-ASR-074

Vietnamese Speech Recognition Database ----
Sentences (Desktop)-200 Speakers

44.1k,16bit

263204

King-ASR-086

Cantonese Speech Recognition Database ----
Sentences (Desktop)-200 Speakers

44.1k,16bit

119994

King-ASR-175

Japanese Speech Recognition Database ----
Sentences (Desktop)-505 Speakers

44.1k,16bit

605812

  • TTS Databases

 

Speechocean licenses a variety of databases in more than 40 languages for speech synthesis broadcasting speech, emotional speech, etc. which can be used in different algorithms.

Serial No.

Kingline Data Names

Sound Parameter

Utterances

Recording Hours

King-TTS-017

Portugal Portuguese Speech Synthesis Database (Female)

44.1K,21bit
Two Channels

9900

12.6

King-TTS-018

Portugal Portuguese Speech Synthesis Database (Male)

44.1K,22bit
Two Channels

5000

Under Building

King-TTS-019

Russian Speech Synthesis Database (Female)

44.1K,23bit
Two Channels

8143

14.59

King-TTS-020

Russian Speech Synthesis Database (Male)

44.1K,24bit
Two Channels

8216

12.32

 

 

  • Text Corpora

 

Speechocean licenses many kinds of text corpora in many languages which is superb for language model training.

ID

Kingline Data Names

 Languages

Size

King-NLP-027

Database of Arab Names

Arabic

7000000Words

King-NLP-028

Database of Arab Names in Arabic

Arabic

222000Words

King-NLP-029

Italian Personal Names Corpus

Italian

Under Building

King-NLP-030

Italian Spanish Address Corpus

Italian

Under Building

 

  • Lexica

 

Speechocean builds pronunciation lexica in many languages which can be licensed to customers.

No

Name

Phoneset

King-Lexicon-027

Czech Pronunciation Lexicon

SAMPA

King-Lexicon-028

Greek Pronunciation Lexicon

SAMPA

King-Lexicon-029

Hungarian Pronunciation Lexicon

SAMPA

 

 

Contact Information

Xianfeng Cheng

Business Manager of Commercial Department

Tel: +86-10-62660928; +86-10-62660053 ext.8080

Mobile: +86 13681432590

Skype: xianfeng.cheng1

Email: chengxianfeng@speechocean.com; cxfxy0cxfxy0@gmail.com

Website: www.speechocean.com




 

 

 

 

 

 


Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA