| LREC Workshop on Cross-Language Search and Summarization of Text and Speech
May 16, 2020
Palais du Pharo, Marseilles, France
Call for Papers (http://users.umiacs.umd.edu/~oard/clssts)
In today?s global world, people may need Cross-Language Information Retrieval (CLIR) enables end users to issue queries in their own language, but provides results from multiple languages around the world, often using translation so that the end user can quickly understand whether the retrieved results are relevant. Cross-language summarization can make it easier for an end user to determine if a document is relevant by providing a summary in the user?s language of the foreign language document, highlighting the evidence for relevance. When the foreign language is a low-resource language, cross-language search and summarization are more difficult; translation capabilities may be poor and the lack of resources makes it difficult to train CLIR and summarization systems. To complicate matters even more, when the collection contains speech as well as text, producing accurate search results and generating comprehensible summaries is even more difficult.
This workshop aims to stimulate the collection and provision of resources that can improve systems that perform cross-language search and summarization. To facilitate dissemination of information about existing resources, the workshop will feature keynote speeches and panels by people who have worked in this area, have cross-language resources to share, or can describe ongoing research programs and shared tasks. Papers are also solicited that describe recent and current research in these areas, that describe relevant resources, or that stake out positions on the directions in which the authors think the field should move.
To set the stage, the organizers will provide two small spoken language test collections that include waveforms, transcriptions and possibly queries with relevance judgments. These are conversational genres, one in Somali (a very-low resource language) and the other in Bulgarian (a moderate-resource language) both of which include approximately 80 hours of speech. We will welcome papers that provide results on these test collections as well as results on any datasets that are available from by ELDA, LDC, or other repositories. Participants are also encouraged to describe other datasets that they have access to and to report results on these.
We solicit papers on research that broadly relates to supporting information access to lower-resource languages addressing topics such as the following:
Test collections for evaluating CLIR
Development of new cross-lingual resources
Datasets for cross-lingual summarization
Methods for CLIR
CLIR over speech
Evidence generation for CLIR
Methods for cross-lingual summarization
Methods for cross-lingual query-focused summarization
Snippet generation
Speech summarization
Multilingual language generation
Zero-shot learning and domain adaptation
Explainable methods for cross-lingual NLP
Paper length: Both long papers (8 pages plus references) and short papers (4 pages plus references) are welcome. Papers must follow the LREC stylesheet available here. Papers must be submitted through START at this link: https://www.softconf.com/lrec2020/CLSSTS2020/
Important dates:
Submissions due: February 15th, 11:59pm AOE
Acceptance notifications: March 12th
Camera ready copy due: April 1st
Workshop date: May 16th
Contact person: Kathy McKeown, Kathy@cs.columbia.edu
Organizing Committee:
James Allan, UMass Amherst (USA)
Lu Wang, Northeastern University (USA)
Kathy McKeown, Columbia University (USA)
Douglas W. Oard, University of Maryland (USA)
Steve Renals, University of Edinburgh (UK)
Richard Schwartz, BBN (USA)
Identify, describe and share your Lexical Resource (LR):
Authors will have the opportunity, when submitting a paper, to upload LRs in a special LREC repository. This effort of sharing LRs, linked to the LRE Map for their description contributes to creating a common repository where everyone can deposit and share data. As scientific work requires accurate citations of referenced work so as to allow the community to understand the whole context and also replicate the experiments conducted by other researchers, LREC 2020 endorses the need to uniquely Identify LRs through the use of the International Standard Language Resource Number (ISLRN, www.islrn.org), a Persistent Unique Identifier to be assigned to each Language Resource. The assignment of ISLRNs to LRs cited in LREC papers will be offered at submission time.
-- The University of Edinburgh |