PhD Thesis position at INRIA Nancy
Motivations
Through collaboration with a company which sells documentary rushes,
we are interested in indexing these rushes using the automatic recognition
of the rush dialogues.
The speech team has developed a system for automatic transcription
of broadcast news: ANTS.
Automatic transcription systems are now
reliable to transcript read or 'prepared' speech such as broadcast
news, but their performance decreases on spontaneously uttered speech[1, 4, 5].
Spontaneous speech is characterized by :
- speech disfluencies (filled pause, repetition, repair, false start and partial word),
- pronunciation variants as word and syllable contractions (/want to/ > /wanna/),
- speaking rate variations (reducing the articulation of some phonemes and lengthening other phonemes)
- live environment (laughs, applause) and simultaneous speech.
In addition to disfluencies, spontaneous speech is characterized by ungrammatical sentences and a language register which
is difficult to model because of the small amount of available transcribed data. Therefore, processing spontaneous speech is one of
the challenges of Automatic Speech Recognition (ASR).
Subject
The purpose of the subject is to take into account the specific phenomena
related to spontaneous speech such as hesitations, pauses, false starts, ... to improve the recognition rate[4,6,7].
To do this, it will be necessary to model these specific phenomena.
We have a speech corpus in which these events were
labeled. This corpus will be used to select parameters, estimate models and evaluate the results.
Scope of Work
The work will be done within the Speech team of Inria-Loria.
The student will use the software ANTS for automatic speech recognition developed by the team.
Profile of candidate
The applicants for this PhD position should be fluent in English or in French. Competence in French is optional, though applicants will be encouraged to acquire this skill during training.
Strong software skills are required, especially Unix/linux, C, Java, and a scripting language such as Perl or Python.
Contact:
fohr@loria.fr or illina@loria.fr or mella@loria.fr
[1] S. Galliano, E. Geoffrois, D.Mostefa , K. Choukri, JF. Bonastre and G. Gravier, The ESTER Phase II Evaluation Campaign for Rich Transcription of French broadcast news, EUROSPEECH 2005,
[2] I. Irina, D. Fohr, O. Mella and C.Cerisara, The Automatic News Transcription System: ANTS some realtime experiments, ISCPL2004
[3] D. Fohr, O. Mella, I. Irina and C. Cerisara, Experiments on the accuracy of phone models and liaison processing in a French broadcast news transcription system, ISCPL2004
[4] J.-L Gauvain, G. Adda, L. Lamel, L. F. Lefevre and H. Schwenk, Transcription de la parole conversationnelle Revue TAL vol 45 n3
[5] M. Garnier-Rizet, G. Adda, F. Cailliau, J.-L. Gauvain, S. Guillemin-Lanne, L. Lamel, S. Vanni, C. Waaste-Richard, CallSurf: Automatic transcription, indexing a nd structuration of call center conversational speech for knowledge extraction and query by content. LREC 2008
[6] J.Ogata, M.Goto, The use of acoustically detected filled and silent pauses in spontaneous speech recognition ICASSP 2009
[7] F. Stouten, J. Duchateau, J.-P. Martens and P. Wambacq, Coping with disfluencies in spontaneous speech recognition: Acoustic detection and linguistic context manipulation, Speech Communication vol 48, 2006