ISCA - International Speech
Communication Association


ISCApad Archive  »  2014  »  ISCApad #192  »  Resources  »  Database  »  Speechocean – update (June 2014)

ISCApad #192

Thursday, June 12, 2014 by Chris Wellekens

5-2-15 Speechocean – update (June 2014)
  

Speechocean – update (June 2014):

 

Speechocean: A global language resources and data services supplier

 

Speechocean has over 500 large-scale databases available in 110+ languages and accents with the platform of desktop, in-car, telephony and tablet PC. Our data repository is enormous and diversified, which includes ASR Databases, TTS Databases, Lexica, Text Corpora, etc.

 

Speechocean is glad to announce more resources that have been released:

ASR Databases

Speechocean provides 110+ regional languages corpora, available in a variety of formats, situational styles, scene environments and platform systems, covering In-car speech recognition corpora, mobile phone speech recognition corpora, fixed-line speech recognition corpora, desktop speech recognition corpora, etc. This month we released more European Languages Databases (Part One) which were made for the tuning and testing purpose of speech recognition systems for speech ASR applications.

    1. In-Car

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-129

Canadian French Speech Recognition Corpus
(In car) Sentence (328 Speakers)

16 K16 bit
Four Channels

361,560

King-ASR-132

France French Speech Recognition Corpus
(in car )300 Speakers

16 K16 bit
Four Channels

360000

King-ASR-134

Turkish Speech Recognition Corpus
(in car) Sentence (316 Speakers)

16 K16 bit
Four Channels

398,692

King-ASR-141

Spain Spanish Speech Recognition Corpus
(in car ) 300 Speakers

16 K16 bit
Four Channels

360000



    1. Telephony

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-220

German Speech Recognition Corpus
(Telephone) Conversational 1000 speakers

8K,16bit
one Channels

150000

 

1.3 Mobile

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-106

Catalan Speech Recognition Corpus
(mobile) 200 Speakers

16K,16bit
One Channel

60000

King-ASR-116

Polish Speech Recognition Corpus
(Mobile) 600 Speakers

16K,16bit
one channel

180000

King-ASR-124

Russian Speech Recognition Corpus
(mobile) Sentence (604 Speakers)

16 K, 16 bit
one channel

180542

King-ASR-128

Romanian Speech Recognition Corpus
(Mobile) 600 Speakers

16K,16bit
one channel

180000

King-ASR-133

Swedish Speech Recognition Corpus
(Mobile) 300 Speakers

16K,16bit
One Channel

45000

    1. Desktop

      Serial Number

      Kingline Data Names

      Sound Parameter

      Utterances

      King-ASR-207

      Brazilian Portuguese Speech Recognition Corpus(Desktop) (203 Speakers)

      44.1K,16bit
      Two Channels

      121780

      King-ASR-075

      European Portuguese Speech Recognition Corpus (desktop) 200 Speakers

      44.1K,16bit
      Four Channels

      319908

      King-ASR-171

      France French Speech Recognition Corpus(Desktop) -Sentence (203 Speakers)

      44.1K,16bit
      Two Channels

      121642

      King-ASR-182

      German Speech Recognition Corpus (Desktop) -Sentence (200 Speakers)

      44.1K,16bit
      Four Channels

      239940

  1. TTS Databases

Speechocean licenses a variety of databases in more than 40 languages for speech synthesis broadcasting speech, emotional speech, etc. which can be used in different algorithms.

Serial No.

Kingline Data Names

Sound Parameter

Utterances

Recording Hours

King-TTS-004

Arabic Speech Synthesis Database 1 (Male)

16K,16bit
Two Channels

8055

11.7

King-TTS-005

Arabic Speech Synthesis Database 2 (Male)

16K,16bit
Two Channels

8039

12.01

King-TTS-008

Spain Spanish Speech Synthesis Database (Female)

44.1K,16bit
Two Channels

5000

Under Building

King-TTS-009

Fr-French Spanish Speech Synthesis Database (Female)

44.1K,17bit
Two Channels

5000

Under Building

King-TTS-010

German Speech Synthesis Database (Female)

44.1K,18bit
Two Channels

5000

Under Building

King-TTS-015

Italian Speech Synthesis Database (Female)

44.1K,19bit
Two Channels

10300

13.13

 

 

  1. Text Corpora

Speechocean licenses many kinds of text corpora in many languages which is superb for language model training.

ID

Kingline Data Names

 Languages

Size

King-NLP-017

Spain Spanish Personal Names Corpus

Spain Spanish

Under Building

King-NLP-018

Spain Spanish Address Corpus

Spain Spanish

Under Building

King-NLP-021

Polish address corpus

Polish

Under Building

King-NLP-025

Turkish Personal Names Corpus

Turkish

Under Building

King-NLP-026

Turkish Address Corpus

Turkish

Under Building

 

  1. Lexica

Speechocean builds pronunciation lexica in many languages which can be licensed to customers.

No.

Name

Phoneme Set

King-Lexicon-019

Italian Pronunciation Lexicon

SAMPA

King-Lexicon-020

Polish Pronunciation Lexicon

SAMPA

King-Lexicon-021

Dutch Pronunciation Lexicon

SAMPA

King-Lexicon-022

Swedish Pronunciation Lexicon

XSAMPA

King-Lexicon-024

Finnish Pronunciation Lexicon

Under Building

King-Lexicon-025

Romanian Pronunciation Lexicon

Under Building

 

 

Contact Information

Xianfeng Cheng

Business Manager of Commercial Department

Tel: +86-10-62660928; +86-10-62660053 ext.8080

Mobile: +86 13681432590

Skype: xianfeng.cheng1

Email: chengxianfeng@speechocean.com; cxfxy0cxfxy0@gmail.com

Website: www.speechocean.com

 

 


Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA