ISCApad #213 |
Saturday, March 12, 2016 by Chris Wellekens |
Speechocean – update (March 2016):
Speechocean: A global language resources and data services supplier
About Speechocean Speechocean is one of the world well-known language related resources & services provider in the fields of Human Computer Interaction and Human Language Technology. At present, we can provide data services with 110+ languages and dialects across the world.
KingLine Data Center ---Data Sharing Platform Kingline Data Center is operated and supervised by Speechocean, which is mainly focused on language resources creating and providing for research and development of human language technology. These diversified corpora are widely used for the research and development in the fields of Speech Recognition, Speech Synthesis, Natural Language Processing, Machine Translation, Web Search, etc. All corpora are openly accessible for users all over the world, including users from scientific research institutions, enterprises or individuals. For more detailed information, please visit our website: http://kingline.speechocean.com
New released corpora: ID: King-ASR-054 This is a 3-channel British English mobile speech database, which is collected over three mobile phone simultaneously (android mobiles, iPhone and windows phones) in Britain. This database is owned by of Beijing Haitian Ruisheng Science Technology Ltd (SpeechOcean, www.speechocean.com) and performed in a quiet environment. The corpus contains the 302 speakers’ speech. The pure recording time is about 429 hours (3-Channel) including head silence (about 500ms) and trail silence (about 500ms). The total size of this database is 46G. ID: King-ASR-381 This is a 4-channel Romanian desktop speech database, which is collected over 4 different microphones simultaneously. This database is owned by Beijing Haitian Ruisheng Science Technology Ltd (SpeechOcean, www.speechocean.com). This database is performed in quiet office environment. The corpus contains the recordings of 474,340 utterances of speech data which were from 299 speakers. The pure recording time is about 589 hours (4-channel), including the leading silence (about 500 ms) and the trailing silence (about 500 ms). The total size of this database is 174 GB. A pronunciation lexicon with a phonemic transcription in SAMPA was carefully generated by covering all the words in the transcription files. All the data was transcribed and labeled.
Contact Information Xianfeng Cheng VP Tel: +86-10-62660928; +86-10-62660053 ext.8080 Mobile: +86 13681432590 Skype: xianfeng.cheng1 Email: chengxianfeng@speechocean.com; cxfxy0cxfxy0@gmail.com Website: www.speechocean.com
|
Back | Top |