ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2021 » ISCApad #281 » Resources » Database » ELRA - Language Resources Catalogue - Update (October 2021)

ISCApad #281

Monday, November 08, 2021 by Chris Wellekens

5-2-2 ELRA - Language Resources Catalogue - Update (October 2021)

We are happy to announce that 1 new written corpus, 1 new monolingual lexicon and 4 new bilingual dictionaries are now available in our catalogue.
Moreover, ELRA announces that the CINTIL Corpus ? International Corpus of Portuguese is now available for free for academic research.

1) New Language Resources

ELRA-W0316 Ema-lon Manipuri Corpus (including word embedding and language model)

ISLRN: 588-170-827-016-7
The Ema-lon Manipuri Corpus consists of a set of resources for Manipuri language (locally known as Meiteilon) for the purpose of machine translation. The main source for these resources is the Sangai Express news website. The resources that constitute the present corpus com-prise monolingual and parallel data in Manipuri and English (EM Corpus), as well as a FastText word embedding (EM-FT) and an ALBERT model (EM-ALBERT) available for Manipuri language.
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-W0316/

ELRA-L0130 NRC Emotion Lexicon - Revised version
ISLRN: 007-544-786-822-8
The NRC Emotion Lexicon was originally built by Saif M. Mohammad and Peter D. Turney through crowdsourcing. The NRC was created in order to assist with emotion analysis as other emotion lexicons were smaller at the time. After close inspection of the NRC emotion lexicon, a large number of troubling entries were identified, where words that should in most contexts be emotionally neutral, with no affect (e.g., lesbian, stone, mountain), are associated with emotional labels that are inaccurate, nonsensical, pejorative, or, at best, highly contingent and context-dependent (e.g. lesbian labeled as DISGUST and SADNESS, stone as ANGER, or mountain as ANTICIPATION). The revised NRC consists of 5,916 entries that result from the works referenced in Zad et al. (2021) 'Hell Hath No Fury? Correcting Bias in the NRC Emotion Lexicon', published at the WOAH, the 5th Workshop on Online Abuse and Harms.
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-L0130/

ELRA-M0086 French-Vietnamese Dictionary
ISLRN: 143-538-116-557-6
The French-Vietnamese Dictionary consists of 82,768 entries containing the following information: phonetics (using IPA), morphology, grammar, semantics, pragmatics and examples.
All headwords are pronounced with true voice by native speakers. The dictionary is provided in XML format.
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0086/

ELRA-M0087 Vietnamese-French Dictionary
ISLRN: 652-215-232-618-2
The Vietnamese-French Dictionary consists of 43,296 entries containing the following information: phonetics (using IPA), morphology, grammar, semantics, pragmatics and examples for source language only. The dictionary is provided in XML format.
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0087/

ELRA-M0088 German-Vietnamese Dictionary
ISLRN: 750-377-806-677-8
The German-Vietnamese Dictionary consists of 32,511 entries containing the following information: phonetics (using IPA), morphology, grammar, semantics, pragmatics and examples available only for the source language.
Headword (in Vietnamese) has true voice by native speakers.
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0088/

ELRA-M0089 Vietnamese-German Dictionary
ISLRN: 993-568-466-563-7
The Vietnamese-German Dictionary consists of 42,793 entries containing the following information: phonetics (using IPA), morphology, grammar, semantics, pragmatics and examples available only for the source language.
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-M0089/

2) ELRA announces that the CINTIL Corpus ? International Corpus of Portuguese is now available for free for academic research

CINTIL-Corpus Internacional do Português is a linguistically interpreted written and spoken corpus of European Portuguese. It is composed of one million annotated tokens, each one of which verified by human expert annotators. The annotation comprises information on part-of-speech, open class lemma and inflection, multi-word expressions pertaining to the class of adverbs and to the closed POS classes, and multi-word proper names (for named entity recognition). The corpus is developed over raw textual materials of several types, of which 30% are spoken materials.

The CINTIL Corpus is now available for free for academic research and can be found in the ELRA Catalogue under the following reference:
ELRA-W0050 The CINTIL Corpus ? International Corpus of Portuguese
ISLRN: 176-775-844-396-0
For more information, see: http://catalog.elra.info/en-us/repository/browse/ELRA-W0050/

For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org

If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/catalogues/language-resources-announcements

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy