ISCA - International Speech
Communication Association


ISCApad Archive  »  2016  »  ISCApad #218  »  Resources  »  Database  »  Speechocean – update (May 2016)

ISCApad #218

Wednesday, August 10, 2016 by Chris Wellekens

5-2-13 Speechocean – update (May 2016)
  

 

Speechocean – update (May 2016):

 

 

 

Speechocean: A global language resources and data services supplier

 

 

 

About Speechocean

 

Speechocean is one of the world well-known language related resources & services provider in the fields of Human Computer Interaction and Human Language Technology. At present, we can provide data services with 110+ languages and dialects across the world.

 

 

 

KingLine Data Center ---Data Sharing Platform

 

Kingline Data Center is operated and supervised by Speechocean, which is mainly focused on language resources creating and providing for research and development of human language technology.

 

These diversified corpora are widely used for the research and development in the fields of Speech Recognition, Speech Synthesis, Natural Language Processing, Machine Translation, Web Search, etc. All corpora are openly accessible for users all over the world, including users from scientific research institutions, enterprises or individuals.

 

For more detailed information, please visit our website: http://kingline.speechocean.com

 

 

 

New released corpora:

 

  1. Taiwanese Mandarin and English Speech Recognition Database-Sentences (Mobile)-(1026 speakers)

 

ID: King-ASR-360

 

This is a 1-channel Taiwanese Mandarin and English mix language mobile phone speech database, which is collected over Samsung mobile phone. This database is owned by Beijing Haitian Ruisheng Science Technology Ltd (SpeechOcean, www.speechocean.com). 1,026 speakers were recorded in total, and each speaker recorded 1 session in one of three different environments: office, restaurant or street. With discarding some unqualified utterances, the whole corpus contains the recordings of 321,890 utterances of Taiwanese mandarin and English mixed language speech data which were from all the speakers. The pure recording time is about 514 hours, including the leading silence (about 500 ms) and the trailing silence (about 500 ms). The total size of this database is 55.2 GB.

 

  1. Italian Speech Recognition Database (Mobile)-300 Speaker

 

ID: King-ASR-148

 

This is a 3-channel Italian mobile phone speech database, which is collected over 3 different mobile operating systems simultaneously: iOS, Android and Windows phone. This database is owned by Beijing Haitian Ruisheng Science Technology Ltd (SpeechOcean, www.speechocean.com). This database is performed in three different environment: office, restaurant and street. The corpus contains the recordings of 377,535 utterances of Italian speech data which were from 300 speakers. The pure recording time is about 499.7 hours (3-channel), including the leading silence (about 500 ms) and the trailing silence (about 500 ms). The total size of this database is 53.7 GB. A pronunciation lexicon with a phonemic transcription is also included. All the data was transcribed and labeled.

 

 

 

Contact Information

 

Xianfeng Cheng

 

VP

 

Tel: +86-10-62660928; +86-10-62660053 ext.8080

 

Mobile: +86 13681432590

 

Skype: xianfeng.cheng1

 

Email: chengxianfeng@speechocean.com; cxfxy0cxfxy0@gmail.com

 

Website: www.speechocean.com

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 




 

 


Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA