ISCApad Archive » 2018 » ISCApad #235 » Events » Other Events » (2017-05-08) 11th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA, Miyazaki, Japan |
ISCApad #235 |
Wednesday, January 10, 2018 by Chris Wellekens |
11th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA
Co-located with LREC 2018, Phoenix Seagaia Resort, Miyazaki, Japan
Tuesday, May 8, 2018
Submission deadline: January 20, 2018
SHARED TASK: Identifying parallel sentences in comparable corpora
********************************************************************
MOTIVATION
In the language engineering and the linguistics communities, research in
comparable corpora has been motivated by two main reasons. In language engineering, on the one hand, it is chiefly motivated by the need to use comparable corpora as training data for statistical NLP applications such as statistical and neural machine translation or cross-lingual retrieval. In linguistics, on the other hand, comparable corpora are of interest in themselves by making possible cross-language discoveries and comparisons. It is generally accepted in both communities that comparable corpora are documents in one or several languages that are comparable in content and form in various degrees and dimensions. We believe that the linguistic definitions and observations related to comparable corpora can improve methods to mine such corpora for applications of statistical NLP. As such, it is of great interest to bring together builders and users of such corpora. TOPICS Given that LREC takes place for the first time in Asia, this year's
special theme is 'Comparable Corpora for Asian Languages'. But we solicit contributions also on all other topics related to comparable corpora, including but not limited to the following: Building Comparable Corpora: ? Human translations
? Automatic and semi-automatic methods ? Methods to mine parallel and non-parallel corpora from the Web ? Tools and criteria to evaluate the comparability of corpora ? Parallel vs non-parallel corpora, monolingual corpora ? Rare and minority languages, across language families ? Multi-media/multi-modal comparable corpora Applications of comparable corpora:
? Human translations
? Language learning ? Cross-language information retrieval & document categorization ? Bilingual projections ? Machine translation ? Writing assistance ? Machine learning techniques using comparable corpora Mining from Comparable Corpora:
? Induction of morphological, grammatical, and translation rules from comparable corpora
? Extraction of parallel segments or paraphrases from comparable corpora ? Extraction of bilingual and multilingual translations of single words and multi-word expressions, proper names, and named entities from comparable corpora ? Induction of multilingual word classes from comparable corpora ? Cross-language distributional semantics SUBMISSION INFORMATION Please follow the style sheet and templates provided for the main conference at http://lrec2018.lrec-conf.org/en/submission/authors-kit/
The submission website is https://www.softconf.com/lrec2018/BUCC2018/ Papers should be submitted as a PDF file. Submissions must describe original and unpublished work and range from four (4) to eight (8) pages including references. Reviewing will be double blind, so the papers should not reveal the authors? identity. Accepted papers will be published in the workshop proceedings. Double submission policy: Parallel submission to other meetings or publications is possible but must be immediately notified to the workshop organizers. For further information, please contact Reinhard Rapp: reinhardrapp (at) gmx (dot) de For further information see BUCC 2018 website: http://comparable.limsi.fr/bucc2018/
IMPORTANT DATES Paper submission deadline: 20 January 2018
Notification of acceptance: 10 February, 2018 Early bird registration (reduced rates): 15 February, 2018 Camera ready final papers: 25 February, 2018 Workshop date: May 8, 2018 SHARED TASK: Identifying parallel sentences in comparable corpora As a continuation of the previous year's shared task, we announce a modified
shared task for 2018. As is well known, a bottleneck in statistical machine translation is the scarceness of parallel resources for many language pairs and domains. Previous research has shown that this bottleneck can be reduced by utilizing parallel portions found within comparable corpora. These are useful for many purposes, including automatic terminology extraction and the training of statistical MT systems. The aim of the shared task is to quantitatively evaluate competing methods for extracting parallel sentences from comparable monolingual corpora, so as to give an overview on the state of the art and to identify the best performing approaches. Any submission to the shared task is expected to be accompanied by a short
paper (4 pages plus references). This will be accepted for publication in the workshop proceedings after a basic quality check: hence the submission will go via Softconf with the standard peer-review process. SHARED TASK SCHEDULE
Shared task sample and training sets released: 22 December 2017
Shared task test set release: 22 January 2018 Shared task test submission deadline: 29 January 2018 Shared task paper submission deadline: 2 February 2018 Shared task camera ready papers: 25 February 2018 For further information concerning the shared task see https://comparable.limsi.fr/bucc2018/bucc2018-task.html
WORKSHOP ORGANIZERS Reinhard Rapp (Magdeburg-Stendal University of Applied Sciences and University of Mainz, Germany), Chair
Pierre Zweigenbaum (LIMS, CNRS, Université Paris-Saclay, Orsay, France), Shared task organizer Serge Sharoff (University of Leeds, United Kingdom) PROGRAMME COMMITTEE Ahmet Aker (University of Sheffield, UK)
Caroline Barrière (CRIM, Montréal, Canada) Hervé Déjean (Xerox Research Centre Europe, Grenoble, France) Éric Gaussier (Université Joseph Fourier, Grenoble, France) Silvia Hansen-Schirra (University of Mainz, Germany) Natalie Kubler (Université Paris Diderot USPC, Frtance) Philippe Langlais (Université de Montréal, Canada) Michael Mohler (Language Computer Corp., US) Emmanuel Morin (Université de Nantes, France) Dragos Stefan Munteanu (Language Weaver, Inc., US) Lene Offersgaard (University of Copenhagen, Denmark) Ted Pedersen (University of Minnesota, Duluth, US) Reinhard Rapp (Magdeburg-Stendal University of Applied Sciences and University of Mainz, Germany) Serge Sharoff (University of Leeds, UK) Michel Simard (National Research Council Canada) Richard Sproat (OGI School of Science & Technology, US) Pierre Zweigenbaum (LIMSI, CNRS, Université Paris-Saclay, Orsay, France) IDENTIFY, DESCRIBE AND SHARE YOUR LANGUAGE RESOURCES Please make sure that your papers take into account the following information from the LREC-organizers about the LRE Map, the 'Share your LRs!' initiative and the ISLRN number:
* Describing your LRs in the LRE Map is now a normal practice in the
submission procedure of LREC (introduced in 2010 and adopted by other conferences). To continue the efforts initiated at LREC 2014 about ?Sharing LRs? (data, tools, web-services, etc.), authors will have the possibility, when submitting a paper, to upload LRs in a special LREC repository. This effort of sharing LRs, linked to the LRE Map for their description, may become a new ?regular? feature for conferences in our field, thus contributing to creating a common repository where everyone can deposit and share data. * As scientific work requires accurate citations of referenced work so
as to allow the community to understand the whole context and also replicate the experiments conducted by other researchers, LREC 2018 endorses the need to uniquely Identify LRs through the use of the International Standard Language Resource Number (ISLRN, www.islrn.org), a Persistent Unique Identifier to be assigned to each Language Resource. The assignment of ISLRNs to LRs cited in LREC papers will be offered at submission time. |
Back | Top |