ISCApad #212 |
Friday, February 05, 2016 by Chris Wellekens |
Speechocean – update (Feb 2016):
Speechocean: A global language resources and data services supplier
About Speechocean
Speechocean is one of the world well-known language related resources & services provider in the fields of Human Computer Interaction and Human Language Technology. At present, we can provide data services with 110+ languages and dialects across the world.
KingLine Data Center ---Data Sharing Platform
Kingline Data Center is operated and supervised by Speechocean, which is mainly focused on language resources creating and providing for research and development of human language technology.
These diversified corpora are widely used for the research and development in the fields of Speech Recognition, Speech Synthesis, Natural Language Processing, Machine Translation, Web Search, etc. All corpora are openly accessible for users all over the world, including users from scientific research institutions, enterprises or individuals.
For more detailed information, please visit our website: http://kingline.speechocean.com
New released corpora:
ID: King-ASR-358
This database collection is an in-vehicle 4-channel Chinese speech database collected and owned by Beijing Haitian Ruisheng Science Technology Ltd.
400 speakers were recorded in total, and each speaker recorded 1 session in four different environments. 220 utterances were recorded for each speaker, and with discarding some unqualified utterances, the whole corpus contains the recordings of 351,820 utterances of Chinese Mandarin speech data which were from all the speakers. For the whole corpus, the pure recording time is about 307 hours (4 channels), including the leading and trailing silence (about 500ms). The total size of this database is about 32.9G.
ID: King-ASR-218
This is a 3-channel British English mobile speech database, which is collected over three mobile phone simultaneously (android mobiles, iPhone and windows phones) in Britain. This database is owned by of Beijing Haitian Ruisheng Science Technology Ltd (SpeechOcean, www.speechocean.com) and performed in a quiet environment. The corpus contains the 212 speakers of spontaneous dialog speech. The pure recording time is about 105 hours (1-Channel). 30 topics were contained in this database. The total size of this database is 37.6G.
A pronunciation lexicon with a phonemic transcription in OALD as appendix was carefully generated by covering all the words in the transcription files.
ID: King-ASR-269
This database collection is a Japanese speech database collected by Beijing Haitian Ruisheng Science Technology Ltd. (SpeechOcean, www.speechocean.com) over Android mobile phone. 2562 speakers were recorded in total, each speaker recorded 320 sentences. The whole corpus contains the recordings of 819,029 utterances of Japanese speech data which were from all the speakers. For the whole corpus, the pure recording time is about 940 hours, including the leading and trailing silence. The total size of this database is about 100G.
A pronunciation lexicon with a phonemic transcription in Hepburn was carefully made by covering all the words in the transcription files. All the data was transcribed and labeled.
Contact Information
Xianfeng Cheng
VP
Tel: +86-10-62660928; +86-10-62660053 ext.8080
Mobile: +86 13681432590
Skype: xianfeng.cheng1
Email: chengxianfeng@speechocean.com; cxfxy0cxfxy0@gmail.com
Website: www.speechocean.com
|
Back | Top |