| POSTDOCTORAL POSITION AT LIMSI-CNRS
Topic: New methods for learning Named Entity Recognition systems in a multilingual context.
The development of Natural Language Processing systems is impeded by the absence of annotated data in some languages. A possible solution consists in transferring analyses available in one language to comparable texts in another language. This makes it possible to train new systems based on these new annotations.
The proposed project will tackle more specifically the case of Named Entity Recognition in a context where a 'synchronous comparable corpus' (or 'noisy parallel corpus') is available: AFP news in French, English, Spanish (German, Portuguese and Arabic are also available).
The goal is to take advantage of the parallelism of news written in multiple languages to help recognize named entities:
- in a language for which a Named Entity Recognition system is already available, strengthen recognition through entities found in parallel articles in other languages;
- in a (target) language for which no NER system is available, transfer into that language the analyses obtained in a (source) language for which a NER system is available, and use them to train a system in that (target) language.
LIMSI has:
- NER systems for French, English and Spanish (with varying levels of performance);
- methods for detecting translation relations among news articles.
This work takes place in the context of the EDyLex project, funded by the French National Research Agency, whose goal is to process unknown words in texts (http://sites.google.com/site/projetedylex/). The work of the successful candidate will be focused on the detection and categorization of named entities in a multilingual context.
The work will be performed at LIMSI-CNRS in Orsay (http://www.limsi.fr/) on the campus of University Paris-South. Two teams at LIMSI are involved in the project: ILES (written and signed language processing) and TLP (spoken language processing).
QUALIFICATIONS AND POSITION
The successful candidate will have a track record of Machine Learning for Natural Language Processing research. A strong preference will be given to those candidates with experience in Named Entity Recognition or Speech Language Understanding. Fluency in one or more languages of the project beyond English is mandatory. Applicants should have received (by the starting date) a PhD in Machine Learning, Computational Linguistics or related areas.
This position is for 12 months and may begin as early as Oct 1st, 2011, or soon thereafter. Salary follows CNRS scales and depends on the candidate's experience (the minimum monthly net salary is about 2,000 €).
To apply, please send a cover letter, describing how the applicant's knowledge and research background will contribute to the project, a CV, and the names and contact information of two referees to:
Pierre Zweigenbaum (pz@limsi.fr) and Sophie Rosset (rosset@limsi.fr) |