ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2017 » ISCApad #231 » Academic and Industry Notes » News from ELRA (February 2017)

ISCApad #231

Sunday, September 10, 2017 by Chris Wellekens

4-13 News from ELRA (February 2017)

Dear ELRA Member,

Here is the latest news about the most noteworthy activities conducted at ELRA and ELDA in February 2017. We would like to remind you that we welcome your suggestions and comments on the topics presented below, and on any other topic you would like to include in the next bulletins.

1.   ABOUT MEMBERSHIP
1.1.   Membership
For the period from 1st to 28th February 2017, the total number of paid up members is 25.

1.2.   Membership Drive
As a follow-up of the September 2016 meeting, a brainstorming meeting on the ELRA membership drive and related ELRA services took place on 31st January 2017 in Paris with Nicoletta Calzolari, Nick Campbell, Khalid Choukri, Henk van den Heuvel and Joseph Mariani. A report will be drafted and shared with the ELRA Board and members by Spring 2017.

1.3.   LREC 2018
Mi-February, the First Call for Papers was published on http://www.lrec-conf.org/lrec2018/lrec2018-cfp.htm and circulated on the mailing lists and on Twitter (@LREC2018, #LREC2018). It was also sent to all the LREC 2016 participants. The 11th edition of LREC will be held on May 7-12, 2018 in Miyazaki, Japan. A temporary web page has been set up at http://www.lrec-conf.org/lrec2018/lrec2018.htm and will be updated until the publication of the permanent web site.

2.   RESOURCES
We are happy to announce that 1 new Evaluation Package is now available in our catalogue.

ELRA-E0046 ETAPE Evaluation Package
ISLRN: 425-777-374-455-4

The ETAPE Evaluation Package consists of ca. 30 hours of radio and TV data, selected to include mostly non planned speech and a reasonable proportion of multiple speaker data. All data were carefully transcribed, including named entity annotation.
This package includes the material that was used for the ETAPE evaluation campaign. It includes resources, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of this evaluation package is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
For more information, see: http://catalog.elra.info/product_info.php?products_id=12

For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org
If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.
Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/LRs-Announcements.html

2.1.   ISLRN
This month, the following resources have been allocated ISLRN.

Title	ISLRN
SALA II US English database (2000 speakers)	829-229-153-801-9
ETAPE Evaluation Package	425-777-374-455-4
A multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection	723-785-513-738-2
First-Year Law Students' Court Memoranda	141-827-463-794-4
GALE Phase 3 Arabic Broadcast News Speech Part 2	459-849-510-597-1
GALE Phase 3 Arabic Broadcast News Transcripts Part 2	539-362-793-352-9
IARPA Babel Haitian Creole Language Pack IARPA-babel201b-v0.2b	763-119-338-310-1

3.   PROJECTS AND INITIATIVES
3.1.   Production Projects
Sentiment annotation in French tweets
ELDA has started a big annotation project consisting in deep sentiment and opinion tagging of tweets in the French language. Several annotators have been hired and work has already been undertaken. On this occasion, several natural language processing and data validation tools developed at ELDA for previous projects are being re-used to leverage the productivity of the annotation team and to improve the quality of the annotations.
In february, ELDA pursued its activities in the French tweet opinion annotation project and made several deliveries, to the full satisfaction of the customer.

3.2.   Projects
CRACKER (Cracking the Language Barrier: Coordination, Evaluation and Resources for European MT Research)
CRACKER is a Coordination and Support Action under the H2020 Programme from the European Commission. This action has just started and has held its kick-off meeting in Berlin last February 10th, meeting which has been organised by its coordinator Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI). The other members of the Consortium are: Charles University of Prague (CUNI), Czech Republic; Evaluations and Language Resources Distribution Agency SA (ELDA), France; Fondazione Bruno Kessler (FBK), Italy; Athena Research and Innovation Center in Information, Communication and Knowledge Technologies (ATHENA RC), Greece; University of Edinburgh (UEDIN), UK, and University of Sheffield (USFD), UK.
CRACKER aims at providing planned coordination and support to the European machine translation research community, which is suffering from the pressure of the current challenges and needs of the Digital Single Market.
ELDA decided to undertake META-SHARE upgrades again, by working in close cooperation with the ILSP. The first step is to merge ELDA and ILSP's contributions and to publish them on the META-SHARE GitHub repository.

CEF Language Resource Coordination
The SMART 2014/1074 Language Resource Coordination, funded by the CEF (Connecting Europe Facility) programme, was launched during the Riga Summit, held late April 2015 in Latvia. The objectives of this 2-year project are to:
?   improve availability and simplify access to language resources (LRs) relevant for MT,
?   establish an observatory for language resources across EU Member States and CEF associated countries,
?   raise awareness among stakeholders about the value and use of data for automated translation
?   clarify legal and commercial issues related to the data.
Targeted data are those produced by the public sector in the EU, which can be made available for re-use through the EU Open Data portal, with suitable copyright protection.
The project is coordinated by Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI) and the other members of the European Language Resources Coordination Consortium (ELRC) are ELRA, TILDE, ILSP and TAUS.
Eight tasks have been specified for this programme and ELDA will lead three of them including the setup the technical Helpdesk (T2), the organization of 30+ training workshops (T6) and the Language Resources collection (T7).
In February, ELDA continued its main activities regarding the production and validation of data, the upgrading of the data processing and packaging tools, as well as discussions with potential donators. Specific effort was dedicated to 1) updating the validation guidelines and drafting a validation report template to be exploited in the coming validation phase, 2) running a deeper analysis of legal issues with respect to donated data as well as the supporting of partners in drafting specific user agreements with donators, and 3) maintaining and upgrading the crawled data management toolkit, mainly to enhance the manual validation integration, and to allow the toolkit to be used for donated data handling and validation.
The ELRC website provides information on the project and access to services such as the Helpdesk can be found at http://elrc.tilde.com/home.

European Language Resource Coordination +
Following the work of the European Language Resource Coordination (ELRC) action (http://lr-coordination.eu/) within CEF.AT, the European Commission has launched two further actions under the same principles and also within the Connecting Europe Facility (CEF) Programme:
?   SMART 2015/1091 Tools and Resources for CEF Automated Translation Lot 2 (ELRC+2)
?   SMART 2015/1091 Tools and Resources for CEF Automated Translation Lot 3 (ELRC+3)
Both of them are 3-year actions and their inception meetings with the European Commission took place on January 17th, 2017, in Luxembourg.

European Language Resource Coordination +2 (ELRC+2)
The inception meeting of the European Language Resource Coordination +2 (ELRC+2) took place at the EC premises in Luxembourg, with the participation of the ELRC+2 Consortium, namely, ELDA (France), DFKI (Germany), ILSP (Greece) and TILDE (Latvia), as well as representatives of DG Connect and DG Translation from the EC.
The goals of this 3-year project are to:
?   set up and operate a repository to host Language Resources to support MT systems within CEF Automated Translation platform;
?   set up and operate an intellectual property rights (IPR) support and clearance desk for Language Resources;
?   complement and continue Language Resource coordination activities undertaken by ELRC service contract (SMART 2014/1074), such as improving the availability of LRs held by the public sector, establishing an observatory for LRs across EU Member States and CEF associated countries and raising awareness among public data holders of the value of LRs for MT.
The project, which will be coordinated by DFKI, comprises ten tasks. ELDA will be leading three of them:
?   the technical helpdesk (T3),
?   the legal helpdesk (T4)
?   the IPR Clearance of 200 LRs (T5).
In February the consortium finalized the inception report, which further specifies the methodology, agreed progress indicators, resources and objectives in accordance with the feedback provided by the EC during the inception meeting. Within T8 (country-specific workshops) the consortium produced a draft for a workshop concept and master agenda to be approved by the EC. ELRC+2 workshops constitute the second round of ELRC reach-out activities. The main novelty of this new series of workshops lies in the reinforcement of the policy-level component targeting decision-makers, as well as in the introduction of a hands-on session for data holders and potential contributors.
Fortnightly web conferences with the EC have continued to take place in order to discuss topics such as the involvement of DGT in ELRC activities and the organisation of the ELRC conference (T7) to be held before the end of 2017.

European Language Resource Coordination +3 (ELRC+3)
ELRC+3 counts on the ELRC Consortium Members as partners of the present action: Tilde (coordinator - Latvia), ELDA (France), DFKI (Germany) and ILSP (Greece). The main objective of ELRC+3 is to continue the ELRC's ongoing work in helping the EC obtain resources for the training and optimization of the CEF Automated Translation platform, for the CEF languages, and in domains of interest to the CEF Digital Service Infrastructures (DSIs). For that purpose, this action aims to identify, collect, clear, produce, process and make available further resources to the EC.
?
In this context, ELDA will be leadering the following activities:
?   Adaptation of the existing ELRC database of sources, revising and customising it for the new needs and requirements.
?   Identification of licensing conditions and right holder(s) for the new resources.
?   Dissemination activities, also in support of the ELRC+2 action.
?   Anonymisation of language resource databases: this will depend on the requirements of the language resource stakeholders and regulations on personal data protection.
?   Validation of language resources and their metadata, which implies the quality evaluation of each deliverable language resource (both monolingual and parallel).
?   Clearing of IPRs and other legal issues that may arise for the data collected.
In February, the Inception report has been reviewed by the European Commission (EC) during February and its final version is under preparation for March, following EC's recommendations. In the meantime, work has started for the different tasks, in particular concerning the identification and processing of an initial batch of language resources. With regard to dissemination, the ELRC website is going to be enhanced so as to welcome the needs of the new ELRC+2 and ELRC+3 projects. Discussion has also started on the new on-site assistance instrument that is defined to take place within the project. This assistance is intended to go beyond that currently offered within ELRC, supporting data owners with their technical questions related to data processing and provision.

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy