ISCA - International Speech
Communication Association


ISCApad Archive  »  2018  »  ISCApad #239  »  Jobs  »  (2018-04-16) PhD grant, INRIA Nancy France

ISCApad #239

Friday, May 11, 2018 by Chris Wellekens

6-76 (2018-04-16) PhD grant, INRIA Nancy France
  
 
Natural language processing: adding new words to a speech recognition system using Deep Neural Networks
 
 
- Location: INRIA/LORIA Nancy Grand Est research center France
- Project-team: Multispeech
- Scientific Context:

Voice is seen as the next big field for computer interaction. The research company Gartner reckons that by 2018, 30% of all interactions with devices will be voice-based: people can speak up to four times faster than they can type, and the technology behind voice interaction is improving all the time.

As of October 2017, Amazon Echo is present in about 4% of American households. Voice assistants are proliferating in smartphones too: Apple?s Siri handles over 2 billion commands a week, and 20% of Google searches on Android-powered handsets in America are done by voice input.

The proper nouns (PNs) play a particular role: they are often important to understand a message and can vary enormously. For example, a voice assistant should know the names of all your friends; a search engine should know the names of all famous people and places, names of museums, etc.

An automatic speech recognition system uses a lexicon containing the most frequent words of the language and only the words of the lexicon can be recognized by the system. It is impossible to add all possible proper names because there are millions proper names and new ones appear every day. A competitive solution is to dynamically add new PNs into the ASR system. The idea is to add only relevant proper names: for instance if we want to transcribe a video document about football results, we should add the names of famous football players and not politicians.

In this study, we will focus on the problem of proper names in automatic recognition systems. The problem is to find relevant proper names for the audio document we want to transcribe. To select the relevant proper names, we propose to use an artificial neural network.

- Missions:

We assume that in an audio document to transcribe we have missing proper names, i.e. proper names that are pronounced in the audio document but that are not in the lexicon of the automatic speech recognition system; these proper names cannot be recognized (out-of-vocabulary proper names, OOV PNs)

Tgoal of this PhDThesis is to find a list of relevant OOV PNs that correspond to an audio document and to integrate them in the speech recognition system. We will use a Deep neural network to find relevant OOV PNs The input of the DNN will be the approximate transcription of the audio document and the output will be the list of relevant OOV PNs with their probabilities. The retrieved proper names will be added to the lexicon and a new recognition of the audio document will be performed.

During the thesis, the student will investigate methodologies based on deep neural networks [Deng2013]. The candidate will study different structures of DNN and different representation of documents [Mikolov2013]. The student will validate the proposed approaches using the automatic transcription system of radio broadcast developed in our team.

- Bibliography:

 

[Mikolov2013] Mikolov, T., Chen, K., Corrado, G. and Dean, J. ?Efficient estimation of word representations in vector space?, Workshop at ICLR, 2013.

 

[Deng2013] Deng, L., Li, J., Huang, J.-T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., Gong, Y. and Acero A. ?Recent advances in deep learning for speech research at Microsoft?, Proceedings of ICASSP, 2013.

 

[Sheikh2016] Sheihk, I., Illina, I., Fohr, D., Linarès, G. ?Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition?. Interspeech, 2016.

- Skills and profile: Master in computer science, background in statistics, natural language processing, experience with deep learning tools (keras, kaldi, etc.) and computer program skills (Perl, Python).

- Additional information:

 

Supervision and contact: Irina Illina, LORIA/INRIA (illina@loria.fr), Dominique Fohr INRIA/LORIA (dominique.fohr@loria.fr) https://members.loria.fr/IIllina/, https://members.loria.fr/DFohr/

Additional links: Ecole Doctorale IAEM Lorraine

 

Duration: 3 years

Starting date: between Oct. 1st 2018 and Jan. 1st 2019

Deadline to apply : May 1st 2018

 

The candidates are required to provide the following documents in a single pdf or ZIP file: 

  • CV

  • A cover/motivation letter describing their interest in the topic 

  • Degree certificates and transcripts for Bachelor and Master (or the last 5 years)

  • Master thesis (or equivalent) if it is already completed, or a description of the work in progress, otherwise

  • The publications (or web links) of the candidate, if any (it is not expected that they have any)

In addition, one recommendation letter from the person who supervises(d) the Master thesis (or research project or internship) should be sent directly by his/her author to the prospective PhD advisor.


Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA