ISCA - International Speech
Communication Association


ISCApad Archive  »  2021  »  ISCApad #275  »  Resources  »  Database  »  Linguistic Data Consortium (LDC) update (April 2021)

ISCApad #275

Thursday, May 13, 2021 by Chris Wellekens

5-2-1 Linguistic Data Consortium (LDC) update (April 2021)
  

In this newsletter:
New Publications:

X-SRL: Parallel Cross-lingual Semantic Role Labeling
TAC KBP English Sentiment Slot Filling – Comprehensive Training and Evaluation Data 2013-2014


New publications:
(1) X-SRL: Parallel Cross-lingual Semantic Role Labeling was developed by Heidelberg University, Department of Computational Linguistics and the Leibniz Institute for the German Language (IDS). It consists of approximately three million words of German, French, and Spanish annotated for semantic role labeling. The texts are translations of the English portion of 2009 CoNLL Shared Task Part 2 (LDC2012T04). All sentences have annotations for verbal predicates and share the original English Propbank label set across the four languages.

The 2009 CoNLL Shared Task developed syntactic dependency annotations, including the semantic dependency model roles of both verbal and nominal predicates. The following English data was used in the shared task:

For X-SRL, the English source data was automatically translated using DeepL. Automatic tokenization, lemmatization, part-of-speech tagging, and syntactic parsing were then applied to the text. The data was divided into train, development, and test partitions. Semantic labels were transferred for the train and development sections, and the test sentences were validated for translation quality, alignment, label transfer, and filtering.

X-SRL: Parallel Cross-lingual Semantic Role Labeling is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus. 2021 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $500.

*

(2) TAC KBP English Sentiment Slot Filling – Comprehensive Training and Evaluation Data 2013-2014 was developed by LDC and contains training and evaluation data produced in support of the 2013 and 2014 TAC KBP Sentiment Slot Filling tracks. The data in this release includes queries, manual runs (human-produced query responses), and assessment results for human- and system-produced query responses. Source data was English news and web text.

The regular English Slot Filling track involved mining information about entities from text using a specified set of 'slots', or attributes. The goal of the Sentiment Slot Filling task was to evaluate the quality of detectors for positive and negative sentiment. 

TAC KBP English Sentiment Slot filling – Comprehensive Training and Evaluation Data 2013-2014 is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus. 2021 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for $1000.

 

Membership Coordinator

Linguistic Data Consortium

University of Pennsylvania

T: +1-215-573-1275

E: ldc@ldc.upenn.edu

M: 3600 Market St. Suite 810

      Philadelphia, PA 19104

 

 



 

 


 





Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA