ISCA - International Speech
Communication Association


ISCApad Archive  »  2014  »  ISCApad #193  »  Resources  »  Database  »  Speechocean – update (July 2014)

ISCApad #193

Friday, July 11, 2014 by Chris Wellekens

5-2-15 Speechocean – update (July 2014)
  

Speechocean – update (July 2014):

 

Speechocean: A global language resources and data services supplier

Speechocean has over 500 large-scale databases available in 110+ languages and accents with the platform of desktop, in-car, telephony and tablet PC. Our data repository is enormous and diversified, which includes ASR Databases, TTS Databases, Lexica, Text Corpora, etc.

Speechocean is glad to announce more resources that have been released:

ASR Databases

Speechocean provides 110+ regional languages corpora, available in a variety of formats, situational styles, scene environments and platform systems, covering In-car speech recognition corpora, mobile phone speech recognition corpora, fixed-line speech recognition corpora, desktop speech recognition corpora, etc. This month we released more European languages (Part One) databases which were made for the tuning and testing purpose of speech recognition systems for speech ASR applications.

    1. In-Car

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-147

Italian Speech Recognition Corpus
( in car ) 300 Speakers

16 K16 bit
Four Channels

360230

King-ASR-153

Russian Speech Recognition Corpus
( in car) 308 Speakers

16 K16 bit
Four Channels

392,200

King-ASR-157

Polish Speech Recognition Corpus
( in car) 300 Speakers

16 K16 bit
Four Channels

356000

King-ASR-162

Dutch Speech Recognition Corpus
(in car) 300 Speakers

16 K16 bit
Four Channels

360030

King-ASR-170

Danish Speech Recognition Corpus
(in car) 300 Speakers

16 K16 bit
Four Channels

360058

King-ASR-172

Brazilian Portuguese Speech Recognition Corpus
( in car) 300 Speakers

16 K16 bit
Four Channels

360020


Telephony

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-219

Spanish Speech Recognition Corpus (Telephone) Conversational 1000 speakers

8K,16bit
one Channel

300000

 

1.3 Mobile

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-149

Finnish Speech Recognition Corpus
(mobile) 200 speakers

8K,16bit
one channel

60000

King-ASR-154

Brazilian Portuguese Speech Recognition
Corpus (mobile) Sentence (301 Speakers)

16 K, 16 bit
one channel

90266

King-ASR-155

European Portuguese Speech Recognition Corpus (Mobile) 300 Speakers

16K,16bit
one channel

90000

King-ASR-205

Turkish Speech Recognition Corpus
(mobile) Sentence (302 Speakers)

16 K, 16 bit
one channel

99471

King-ASR-206

Greek Speech Recognition Corpus
(Mobile) 300 Speakers

22K,16bit
one channel

45000

King-ASR-209

Dutch Speech Recognition Corpus
(Mobile) 200 Speakers

16K,16bit
one channel

60000

    1. Desktop

      Serial Number

      Kingline Data Names

      Sound Parameter

      Utterances

      King-ASR-181

      Italian Speech Recognition Corpus
      (desktop) –Sentence 201 Speakers

      44.1K,16bit
      Four Channels

      240880

      King-ASR-212

      Polish Speech Recognition Corpus (Desktop) -Sentence (200 Speakers)

      44.1K,16bit
      Two Channels

      119948

      King-ASR-084

      Russian Speech Recognition Corpus (Desktop) –Comprehensive utterances (200 Speakers)

      44.1K,16bit
      Four Channels

      239936

      King-ASR-158

      Swedish Speech Recognition Corpus (Desktop) -Sentence (200 Speakers)

      44.1K,16bit
      Two Channels

      119938

      King-ASR-159

      Turkish Speech Recognition Corpus (Desktop) -Sentence (201 Speakers)

      44.1K,16bit
      Two Channels

      120578

  1. TTS Databases

Speechocean licenses a variety of databases in more than 40 languages for speech synthesis broadcasting speech, emotional speech, etc. which can be used in different algorithms.

Serial No.

Kingline Data Names

Sound Parameter

Utterances

Recording Hours

King-TTS-016

Italian Speech Synthesis Database (Male)

44.1K,20bit
Two Channels

7211

9.09

King-TTS-017

Portugal Portuguese Speech Synthesis Database (Female)

44.1K,21bit
Two Channels

9900

12.6

King-TTS-018

Portugal Portuguese Speech Synthesis Database (Male)

44.1K,22bit
Two Channels

5000

Under Building

King-TTS-019

Russian Speech Synthesis Database (Female)

44.1K,23bit
Two Channels

8143

14.59

King-TTS-020

Russian Speech Synthesis Database (Male)

44.1K,24bit
Two Channels

8216

12.32

King-TTS-021

Polish Speech Synthesis Database (Female)

44.1K,25bit
Two Channels

5000

Under Building

King-TTS-022

Turkish Speech Synthesis Database (Female)

44.1K,26bit
Two Channels

5000

Under Building

King-TTS-029

Ukrainian Speech Synthesis Database (Female)

44.1K,27bit
Two Channels

5000

Under Building

 

  1. Text Corpora

Speechocean licenses many kinds of text corpora in many languages which is superb for language model training.

ID

Kingline Data Names

 Languages

Size

King-NLP-027

Database of Arab Names

Arabic

7000000Words

King-NLP-028

Database of Arab Names in Arabic

Arabic

222000Words

King-NLP-029

Italian Personal Names Corpus

Italian

Under Building

King-NLP-030

Italian Spanish Address Corpus

Italian

Under Building

King-NLP-031

Portuguese Personal Names Corpus

Portuguese

Under Building

King-NLP-032

Portuguese Address Corpus

Portuguese

Under Building

King-NLP-033

Polish personal names corpus

Polish

Under Building

 

  1. Lexica

Speechocean builds pronunciation lexica in many languages which can be licensed to customers.

No.

Name

Phoneme Set

King-Lexicon-027

Czech Pronunciation Lexicon

SAMPA

King-Lexicon-028

Greek Pronunciation Lexicon

SAMPA

King-Lexicon-029

Hungarian Pronunciation Lexicon

SAMPA

King-Lexicon-031

Catalan Pronunciation Lexicon

UPC

King-Lexicon-036

Arabic Pronunciation Lexicon

Under Building

King-Lexicon-037

Brazilian Portuguese Pronunciation Lexicon

SAMPA

King-Lexicon-043

Norwegian Pronunciation Lexicon

XSAMPA

 

 

Contact Information

Xianfeng Cheng

Business Manager of Commercial Department

Tel: +86-10-62660928; +86-10-62660053 ext.8080

Mobile: +86 13681432590

Skype: xianfeng.cheng1

Email: chengxianfeng@speechocean.com; cxfxy0cxfxy0@gmail.com

Website: www.speechocean.com

 

 

 

 

 


Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA