ISCA - International Speech
Communication Association


ISCApad Archive  »  2016  »  ISCApad #221  »  Resources  »  Database  »  ELRA - Language Resources Catalogue - Update (October 2016)

ISCApad #221

Friday, November 11, 2016 by Chris Wellekens

5-2-1 ELRA - Language Resources Catalogue - Update (October 2016)
  

 ELRA - Language Resources Catalogue - Update
*****************************************************************

 ELRA-S0386 SecuVoice
ISLRN: 583-080-936-563-9
SecuVoice consists of single-channel utterances in Spanish containing sequences of isolated digits from zero to nine. SecuVoice contains a total of 7,098 utterances (169 speakers x 42 utt./speaker) with 34,476 digits (204 digits/speaker). Along with the WAV files containing the speech utterances, XML annotation files containing detailed information about the speakers and the recorded sequences of digits are provided.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1293

We are happy to announce that a new set of 15 Written Corpora is now available in our catalogue.


Arabic-English, Arabic-French, Chinese-English and Chinese-French Written Parallel Corpora:
This set of 15 written corpora was produced by ELDA within PEA TRAD, a project supported by the French Ministry of Defence (DGA). Available resources are listed below (click on the links for further details).


ELRA-W0098 TRAD Arabic-French Newspaper Parallel corpus - Test set 1
ISLRN: 922-732-502-473-8
This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are articles collected in 2012 from the Arabic version of Le Monde Diplomatique.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1278

ELRA-W0099 TRAD Arabic-English Newspaper Parallel corpus - Test set 1
ISLRN: 764-187-795-074-0
This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are articles collected in 2012 from the Arabic version of Le Monde Diplomatique.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1279

ELRA-W0100 TRAD Arabic-French Newspaper Parallel corpus - Test set 2
This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in French. The source texts are articles collected in May 2013 from the Arabic version of Le Monde Diplomatique.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1280

ELRA-W0101 TRAD Arabic-French Parallel corpus of transcribed Broadcast News Speech
This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are transcriptions of broadcast news in Arabic recorded on France 24.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1281

ELRA-W0102 TRAD Arabic-English Parallel corpus of transcribed Broadcast News Speech
This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are transcriptions of broadcast news in Arabic recorded on France 24.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1282

ELRA-W0103 TRAD Arabic-French Web domain (blogs) Parallel corpus
This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are blog articles from 2008 to 2013.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1283

ELRA-W0104 TRAD Arabic-English Web domain (blogs) Parallel corpus
This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are blog articles from 2008 to 2013.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1284

ELRA-W0105 TRAD Arabic-French Mailing lists Parallel corpus - Test set
This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. Emails are dated from 2010 to 2012.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1285

ELRA-W0106 TRAD Arabic-English Mailing lists Parallel corpus - Test set
This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. Emails are dated from 2010 to 2012.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1286

ELRA-W0107 TRAD Arabic-French Mailing lists Parallel corpus - Development set
This is a parallel corpus of 10,000 words in Arabic and a reference translation in French. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. The collected emails are dated from 2004 to 2007.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1287

ELRA-W0108 TRAD Arabic-English Mailing lists Parallel corpus - Development set

This is a parallel corpus of 10,000 words in Arabic and a reference translation in English. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. The collected emails are dated from 2004 to 2007.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1288

ELRA-W0109 TRAD Chinese-French Web domain (blogs) Parallel corpus
This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in French. The source texts are blog articles dealing with various subjects such as economy, environment, society, technologies, etc. Articles are dated from June 2013.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1289

ELRA-W0110 TRAD Chinese-English Web domain (blogs) Parallel corpus
This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in English. The source texts are blog articles dealing with various subjects such as economy, environment, society, technologies, etc. Articles are dated from June 2013.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1290

ELRA-W0111 TRAD Chinese-French News Articles Parallel corpus
This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in French. The source texts are newspaper articles from the Chinese version of Voice of America. Articles are dated from 2011 and 2012.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1291

ELRA-W0112 TRAD Chinese-English News Articles Parallel corpus
This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in English. The source texts are newspaper articles from the Chinese version of Voice of America. Articles are dated from 2011 and 2012.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1292

___________________________________________
Previous releases from the same project were related to Pashto language and are listed below:

ELRA-S0381 TRAD Pashto Broadcast News Speech Corpus
ISLRN: 918-508-885-913-7
For more information, see: http://catalog.elra.info/product_info.php?products_id=1265

ELRA-W0092 TRAD Pashto Monolingual text Corpus
ISLRN: 394-903-293-388-0
For more information, see: http://catalog.elra.info/product_info.php?products_id=1266

ELRA-W0093 TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data
ISLRN: 802-643-297-429-4
For more information, see: http://catalog.elra.info/product_info.php?products_id=1267

ELRA-W0094 TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Test data
ISLRN: 547-897-479-723-3
For more information, see: http://catalog.elra.info/product_info.php?products_id=1268

ELRA-W0095 TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data
ISLRN: 006-102-605-738-4
For more information, see: http://catalog.elra.info/product_info.php?products_id=1269

ELRA-W0096 TRAD Pashto-French News Articles Parallel corpus
ISLRN: 649-628-149-051-7
For more information, see: http://catalog.elra.info/product_info.php?products_id=1270

ELRA-W0097 TRAD Pashto-English News Articles Parallel corpus
ISLRN: 612-936-517-010-2
For more information, see: http://catalog.elra.info/product_info.php?products_id=1271


For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org

If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/catalogues/language-resources-announcements/

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 

 

 

 

 

 


 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA