ISCApad #216

Tuesday, June 14, 2016 by Chris Wellekens

6 Jobs

6-1

(2015-12-02) Master2 position at Multispeech Team, LORIA (Nancy, France)

Master2 position at Multispeech Team, LORIA (Nancy, France)

Automatic speech recognition: contextualisation of the language model based on neural networks by dynamic adjustment

Framework of ANR project ContNomina

The technologies involved in information retrieval in large audio/video databases are often based on the analysis of large, but closed, corpora, and on machine learning techniques and statistical modeling of the written and spoken language. The effectiveness of these approaches is now widely acknowledged, but they nevertheless have major flaws, particularly for what concern proper names, that are crucial for the interpretation of the content.

In the context of diachronic data (data which change over time) new proper names appear constantly requiring dynamic updates of the lexicons and language models used by the speech recognition system.

As a result, the ANR project ContNomina (2013-2017) focuses on the problem of proper names in automatic audio processing systems by exploiting in the most efficient way the context of the processed documents. To do this, the student will address the contextualization of the recognition module through the dynamic adjustment of the language model in order to make it more accurate.

Subject

Current systems for automatic speech recognition are based on statistical approaches. They require three components: an acoustic model, a lexicon and a language model. This stage will focus on the language model. The language model of our recognition system is based on a neural network learned from a large corpus of text. The problem is to re-estimate the language model parameters for a new proper name depending on its context and a small amount of adaptation data. Several tracks can be explored: adapting the language model, using a class model or studying the notion of analogy.

Our team has developed a fully automatic system for speech recognition to transcribe a radio broadcast from the corresponding audio file. The student will develop a new module whose function is to integrate new proper names in the language model.

Required skills

Background in statistics and object-oriented programming.

Localization and contacts

Loria laboratory, Multispeech team, Nancy, France

Irina.illina@loria.frdominique.fohr@loria.fr

Candidates should email a detailed CV and diploma

References

[1] J. Gao, X. He, L. Deng Deep Learning for Web Search and Natural Language Processing , Microsoft slides, 2015

[2] X. Liu, Y. Wang, X. Chen, M. J. F. Gales, and P. C. Woodland. Efficient lattice rescoring using recurrent neural network langage models, in Proc. ICASSP, 2014, pp. 4941?4945.

[3] M. Sundermeyer, H. Ney, and R. Schlüter. From Feedforward to Recurrent LSTM Neural Networks for Language Modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, volume 23, number 3, pages 517-529, March 2015.

ISCApad #216

Subject

Some references

Environment

Supervisors

Application