ISCA - International Speech
Communication Association

ISCApad Archive  »  2016  »  ISCApad #214  »  Resources  »  Database  »  ELRA - Language Resources Catalogue - Update (March 2016)

ISCApad #214

Monday, April 11, 2016 by Chris Wellekens

5-2-1 ELRA - Language Resources Catalogue - Update (March 2016)


We are happy to announce that 1 new Written Corpus and 1 new Speech Resource are now available in our catalogue.

ELRA-W0091 Linguatools Webcrawl Parallel Corpus German-English 2015
ISLRN: 800-190-274-236-9
The corpus consists of 10 million German-English parallel sentences that were crawled from the internet between 10/2013 and 04/2015. Web pages have been automatically categorized for subject area. The corpus is available in TMX and Moses format (encoding UTF-8).
For more information, see:

ELRA-S0380 Large Farsdat
ISLRN: 067-486-870-902-0
Large Farsdat (L-FARSDAT) is a Persian (Farsi) Speech Database containing about 73 hours of read speech from formal Farsi texts (newspapers) recorded by 100 speakers. The sampling rate is 22050 Hz for the whole corpus and the average SNR is about 28 dB. The corpus has been segmented and labelled at word and sentence levels and each word has been annotated according to the 29 standard Persian phonemes.
For more information, see:

For more information on the catalogue, please contact Valérie Mapelli

If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.




Back  Top

 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2025 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA