ISCA - International Speech
Communication Association


ISCApad Archive  »  2017  »  ISCApad #230  »  Resources  »  Database  »  ELRA - Language Resources Catalogue - Update (June 2017)

ISCApad #230

Thursday, August 10, 2017 by Chris Wellekens

5-2-1 ELRA - Language Resources Catalogue - Update (June 2017)
  
ELRA - Language Resources Catalogue - Update
-------------------------------------------------------

 We are happy to announce that 3 new Written Corpora and 1 new Desktop/Microphone Speech Resource are  now available in our catalogue.

ELRA-W0118 English-Persian parallel corpus
ISLRN: 074-825-114-781-7
The English-Persian parallel corpus contains more than 200,000 aligned sentences across a variety of text types from the domains of art, law, culture, science, religion, literature, medicine, idioms, politics and others. It is an extension of the English-Persian parallel corpus already distributed by ELRA (Catalogue Reference: ELRA-W0051). This new version of the corpus is distributed with a concordance program.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1306
 
ELRA-W0119 Helsinki Corpus of Swahili
ISLRN: 941-187-059-145-7
This is a text corpus of Swahili language of 25 million words, annotated for part-of-speech, morphology and syntax. The corpus contains prose text from domains such as fiction, news media and government documents, from the period between 1953 and 2016.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1308
 
ELRA-W0120 NUM 5M Mongolian written corpus
This is a corpus of Mongolian text mostly from domains like online or printed daily newspapers, literature, and laws. Part of this corpus, about 2,800 sentences with 100,000 words, has been POS-tagged manually and stored in TEI format.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1309

ELRA-S0393 Persian Speech Corpus
This speech corpus was recorded through a 'Blubbery' model microphone by one male speaker in Persian (Tehrani accent) in a professional studio. Synthesized speech as an output using this corpus has produced a high quality, natural voice. It consists of 399 utterances for a total of about 2.5 hours, with orthographic and phonetic transcriptions.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1307

For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org

If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/catalogues/language-resources-announcements/



 
 

 





Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA