ISCA - International Speech
Communication Association


ISCApad Archive  »  2014  »  ISCApad #191  »  Resources  »  Database  »  Speechocean May 2014 update

ISCApad #191

Monday, May 12, 2014 by Chris Wellekens

5-2-14 Speechocean May 2014 update
  

 

Speechocean – update (May 2014):

 

 

 

Speechocean: A global language resources and data services supplier

 

 

 

Speechocean has over 500 large-scale databases available in 110+ languages and accents with the platform of desktop, in-car, telephony and tablet PC. Our data repository is enormous and diversified, which includes ASR Databases, TTS Databases, Lexica, Text Corpora, etc.

 

 

 

Speechocean is glad to announce more resources that have been released:

 

ASR Databases

 

Speechocean provides 110+ regional languages corpora, available in a variety of formats, situational styles, scene environments and platform systems, covering In-car speech recognition corpora, mobile phone speech recognition corpora, fixed-line speech recognition corpora, desktop speech recognition corpora, etc. This month we released more European languages databases which were made for the tuning and testing purpose of speech recognition systems for speech ASR applications.

 

    1. In-Car

 

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-129

Canadian French Speech Recognition Corpus
(In car) Sentence (328 Speakers)

16 K16 bit
Four Channels

361,560

King-ASR-132

France French Speech Recognition Corpus
(in car )300 Speakers

16 K16 bit
Four Channels

360000

King-ASR-134

Turkish Speech Recognition Corpus
(in car) Sentence (316 Speakers)

16 K16 bit
Four Channels

398,692

King-ASR-141

Spain Spanish Speech Recognition Corpus
(in car ) 300 Speakers

16 K16 bit
Four Channels

360000

King-ASR-147

Italian Speech Recognition Corpus
( in car ) 300 Speakers

16 K16 bit
Four Channels

360230

King-ASR-153

Russian Speech Recognition Corpus
( in car) 308 Speakers

16 K16 bit
Four Channels

392,200

King-ASR-157

Polish Speech Recognition Corpus
( in car) 300 Speakers

16 K16 bit
Four Channels

356000

King-ASR-162

Dutch Speech Recognition Corpus
(in car) 300 Speakers

16 K16 bit
Four Channels

360030

King-ASR-170

Danish Speech Recognition Corpus
(in car) 300 Speakers

16 K16 bit
Four Channels

360058

King-ASR-172

Brazilian Portuguese Speech Recognition Corpus
( in car) 300 Speakers

16 K16 bit
Four Channels

360020

 



 

    1. Telephony

 

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-220

German Speech Recognition Corpus
(Telephone) Conversational 1000 speakers

8K,16bit
one Channels

150000

King-ASR-219

Spanish Speech Recognition Corpus (Telephone) Conversational 1000 speakers

8K,16bit
one Channel

300000

 

 

 

1.3 Mobile

 

Serial Number

Kingline Data Names

Sound Parameter

Utterances

King-ASR-106

Catalan Speech Recognition Corpus
(mobile) 200 Speakers

16K,16bit
One Channel

60000

King-ASR-116

Polish Speech Recognition Corpus
(Mobile) 600 Speakers

16K,16bit
one channel

180000

King-ASR-124

Russian Speech Recognition Corpus
(mobile) Sentence (604 Speakers)

16 K, 16 bit
one channel

180542

King-ASR-128

Romanian Speech Recognition Corpus
(Mobile) 600 Speakers

16K,16bit
one channel

180000

King-ASR-133

Swedish Speech Recognition Corpus
(Mobile) 300 Speakers

16K,16bit
One Channel

45000

King-ASR-149

Finnish Speech Recognition Corpus
(mobile) 200 speakers

8K,16bit
one channel

60000

King-ASR-154

Brazilian Portuguese Speech Recognition
Corpus (mobile) Sentence (301 Speakers)

16 K, 16 bit
one channel

90266

King-ASR-155

European Portuguese Speech Recognition Corpus (Mobile) 300 Speakers

16K,16bit
one channel

90000

King-ASR-205

Turkish Speech Recognition Corpus
(mobile) Sentence (302 Speakers)

16 K, 16 bit
one channel

99471

King-ASR-206

Greek Speech Recognition Corpus
(Mobile) 300 Speakers

22K,16bit
one channel

45000

King-ASR-209

Dutch Speech Recognition Corpus
(Mobile) 200 Speakers

16K,16bit
one channel

60000

 

    1. Desktop

      Serial Number

      Kingline Data Names

      Sound Parameter

      Utterances

      King-ASR-207

      Brazilian Portuguese Speech Recognition Corpus(Desktop) (203 Speakers)

      44.1K,16bit
      Two Channels

      121780

      King-ASR-075

      European Portuguese Speech Recognition Corpus (desktop) 200 Speakers

      44.1K,16bit
      Four Channels

      319908

      King-ASR-171

      France French Speech Recognition Corpus(Desktop) -Sentence (203 Speakers)

      44.1K,16bit
      Two Channels

      121642

      King-ASR-182

      German Speech Recognition Corpus (Desktop) -Sentence (200 Speakers)

      44.1K,16bit
      Four Channels

      239940

      King-ASR-181

      Italian Speech Recognition Corpus
      (desktop) –Sentence 201 Speakers

      44.1K,16bit
      Four Channels

      240880

      King-ASR-212

      Polish Speech Recognition Corpus (Desktop) -Sentence (200 Speakers)

      44.1K,16bit
      Two Channels

      119948

      King-ASR-084

      Russian Speech Recognition Corpus (Desktop) –Comprehensive utterances (200 Speakers)

      44.1K,16bit
      Four Channels

      239936

      King-ASR-158

      Swedish Speech Recognition Corpus (Desktop) -Sentence (200 Speakers)

      44.1K,16bit
      Two Channels

      119938

      King-ASR-159

      Turkish Speech Recognition Corpus (Desktop) -Sentence (201 Speakers)

      44.1K,16bit
      Two Channels

      120578

  1. TTS Databases

 

Speechocean licenses a variety of databases in more than 40 languages for speech synthesis broadcasting speech, emotional speech, etc. which can be used in different algorithms.

 

Serial No.

Kingline Data Names

Sound Parameter

Utterances

Recording Hours

King-TTS-004

Arabic Speech Synthesis Database 1 (Male)

16K,16bit
Two Channels

8055

11.7

King-TTS-005

Arabic Speech Synthesis Database 2 (Male)

16K,16bit
Two Channels

8039

12.01

King-TTS-008

Spain Spanish Speech Synthesis Database (Female)

44.1K,16bit
Two Channels

5000

Under Building

King-TTS-009

Fr-French Spanish Speech Synthesis Database (Female)

44.1K,17bit
Two Channels

5000

Under Building

King-TTS-010

German Speech Synthesis Database (Female)

44.1K,18bit
Two Channels

5000

Under Building

King-TTS-015

Italian Speech Synthesis Database (Female)

44.1K,19bit
Two Channels

10300

13.13

King-TTS-016

Italian Speech Synthesis Database (Male)

44.1K,20bit
Two Channels

7211

9.09

King-TTS-017

Portugal Portuguese Speech Synthesis Database (Female)

44.1K,21bit
Two Channels

9900

12.6

King-TTS-018

Portugal Portuguese Speech Synthesis Database (Male)

44.1K,22bit
Two Channels

5000

Under Building

King-TTS-019

Russian Speech Synthesis Database (Female)

44.1K,23bit
Two Channels

8143

14.59

King-TTS-020

Russian Speech Synthesis Database (Male)

44.1K,24bit
Two Channels

8216

12.32

King-TTS-021

Polish Speech Synthesis Database (Female)

44.1K,25bit
Two Channels

5000

Under Building

King-TTS-022

Turkish Speech Synthesis Database (Female)

44.1K,26bit
Two Channels

5000

Under Building

King-TTS-029

Ukrainian Speech Synthesis Database (Female)

44.1K,27bit
Two Channels

5000

Under Building

 

 

 

 

 

  1. Text Corpora

 

Speechocean licenses many kinds of text corpora in many languages which is superb for language model training.

 

ID

Kingline Data Names

 Languages

Size

King-NLP-017


Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA