ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2016 » ISCApad #217 » Resources » Database » ELRA - Language Resources Catalogue - Update (April 2016)

ISCApad #217

Sunday, July 10, 2016 by Chris Wellekens

5-2-1 ELRA - Language Resources Catalogue - Update (April 2016)

We are happy to announce that a set of Pashto Language Resources (1 Broadcast Speech Resource and 6 Written Corpora) and 1 new Multimodal Resource are now available in our catalogue.

Pashto Language Resources: This set of Pashto Language Resources was produced by ELDA within the PEA TRAD project supported by the French Ministry of Defence (DGA). It consists of 1 Broadcast Speech Resource and 6 Written Corpora.
Available resources are listed below (click on the links for further details):

ELRA-S0381 TRAD Pashto Broadcast News Speech Corpus
ISLRN: 918-508-885-913-7

This corpus contains 108 hours of broadcast news recordings transcribed, covering more than 1,000 speakers. Transcriptions are provided together with the audio files and include about 46,000 segments and 1.1M words.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1265

ELRA-W0092 TRAD Pashto Monolingual text Corpus
ISLRN: 394-903-293-388-0

This is a monolingual text corpus in Pashto. The corpus contains about 112,000,000 tokens collected from 46 different blogs and websites.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1266

ELRA-W0093 TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data
ISLRN: 802-643-297-429-4

This corpus consists of the transcription of 106 hours of recordings in Pashto from the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381) translated into French. It contains about 832,000 source words and 747,000 target words.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1267

ELRA-W0094 TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Test data
ISLRN: 547-897-479-723-3

This is a parallel corpus, which contains 10,000 Pashto words translated into French. The source texts come from 3 broadcast news transcriptions of the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381).
For more information, see: http://catalog.elra.info/product_info.php?products_id=1268

ELRA-W0095 TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data
ISLRN: 006-102-605-738-4

This is a parallel corpus, which contains 10,000 Pashto words translated into English. The source texts come from 3 broadcast news transcriptions of the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381).
For more information, see: http://catalog.elra.info/product_info.php?products_id=1269

ELRA-W0096 TRAD Pashto-French News Articles Parallel corpus
ISLRN: 649-628-149-051-7
This is a parallel corpus, which contains 10,000 Pashto words translated into French by two different translators. The source texts have been collected from the following news websites: Azadiradio, Mashaal and Voice of America Pashto.

For more information, see: http://catalog.elra.info/product_info.php?products_id=1270

ELRA-W0097 TRAD Pashto-English News Articles Parallel corpus
ISLRN: 612-936-517-010-2
This is a parallel corpus, which contains 10,000 Pashto words translated into English by two different translators. The source texts have been collected from the following news websites: Azadiradio, Mashaal and Voice of America Pashto.

For more information, see: http://catalog.elra.info/product_info.php?products_id=1271

ELRA-S0374 FoxPersonTracks: a Benchmark for Person Re-Identification from TV Broadcast Shows
ISLRN: 168-132-570-218-1

FoxPersonTracks is a person track dataset dedicated to person re-identification. The dataset is built from a set of real life TV shows broadcasted from BFMTV and LCP TV french channels, provided during REPERE challenge. It contains a total 4,604 persontracks (short video sequences featuring an individual with no background) from 266 persons. The dataset also provides re-identification results using space-time histograms as a baseline, together with an evaluation tool in order to ease the comparison to other re- identification methods.
For more information, see: http://catalog.elra.info/product_info.php?products_id=1264

For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org

If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/catalogues/language-resources-announcements/

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy