6-20 (2018-12-14) Master R2 internship at Loria, Nancy, France

Master R2 internship in Natural Language Processing: Introduction of semantic

information in a speech recognition system



Supervisors: Irina Illina, MdC, Dominique Fohr, CR CNRS


Team: Multispeech, LORIA-INRIA




Duration: 5-6 months


Deadline to apply : January 20th, 2019


Required skills: background in statistics, natural language processing and computer program skills (Perl, Python). Candidates should email a detailed CV with diploma


Motivations and context


Semantic and thematic spaces are vector spaces used for the representation of words, sentences or textual documents. The corresponding models and methods have a long history in the field of computational linguistics and natural language processing. Almost all models rely on the hypothesis of statistical semantics which states that: statistical schemes of appearance of words (context of a word) can be used to describe the underlying semantics. The most used method to learn these representations is to predict a word using the context in which this word appears [Mikolov et al., 2013b, Pennington et al., 2014], and this can be realized with neural networks. These representations have proved their effectiveness for a range of natural language processing tasks [Baroni et al., 2014]. In particular, Mikolov?s Skip-gram and CBOW models et al. [Mikolov et al., 2013b, Mikolov et al., 2013a] have become very popular because of their ability to process large amounts of unstructured text data with reduced computing costs. The efficiency and the semantic properties of these representations motivate us to explore these semantic representations for our speech recognition system.

Robust automatic speech recognition (ASR) is always a very ambitious goal. Despite constant efforts and some dramatic advances, the ability of a machine to recognize the speech is still far from equaling that of the human being. Current ASR systems see their performance significantly decrease when the conditions under which they were trained and those in which which they are used differ. The causes of variability may be related to the acoustic environment, sound capture equipment, microphone change, etc.


The speech recognition (ASR) stage will be supplemented by a semantic analysis to detect the words of the processed sentence that could have been misrecognized and to find words having similar pronunciation and matching better the context. For example, the sentence « Silvio Berlusconi, prince de  Milan » can be recognized by the speech recognition system as : « Silvio Berlusconi, prince de mille ans ». Good semantic context representation of the sentence could help to find and correct this error.

The Master internship will be devoted to the innovative study of the taking into account of semantics through predictive representations that capture the semantic features of words and their context. Research will be conducted on the combination of semantic information with information from denoising to improve speech recognition. As deep neural networks (DNNs) can model complex functions and get outstanding performance, they will be used in all our modeling.


