ISCApad Archive » 2016 » ISCApad #213 » Resources » Database » ELRA - Language Resources Catalogue - Update (March 2016) |
ISCApad #213 |
Saturday, March 12, 2016 by Chris Wellekens |
We are happy to announce that 1 new Written Corpus and 1 new Speech Resource are now available in our catalogue.
The corpus consists of 10 million German-English parallel sentences that were crawled from the internet between 10/2013 and 04/2015. Web pages have been automatically categorized for subject area. The corpus is available in TMX and Moses format (encoding UTF-8).
For more information, see: http://catalog.elra.info/product_info.php?products_id=1262 Large Farsdat (L-FARSDAT) is a Persian (Farsi) Speech Database containing about 73 hours of read speech from formal Farsi texts (newspapers) recorded by 100 speakers. The sampling rate is 22050 Hz for the whole corpus and the average SNR is about 28 dB. The corpus has been segmented and labelled at word and sentence levels and each word has been annotated according to the 29 standard Persian phonemes. For more information, see: http://catalog.elra.info/product_info.php?products_id=1263 For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.
|
Back | Top |