ISCApad Archive » 2012 » ISCApad #166 » Jobs » (2012-03-12) Postdoc position: Acoustic to articulatory mapping of fricative sounds LORIA Nancy France |
ISCApad #166 |
Wednesday, April 11, 2012 by Chris Wellekens |
Postdoc position: Acoustic to articulatory mapping of fricative sounds
15 months, start between September and December 2012 at LORIA (Nancy, France).
Contact : Yves.Laprie@loria.fr
Context This subject deals with acoustic to articulatory mapping [Maeda et al. 2006], i.e. the recovery of the vocal tract shape from the speech signal possibly supplemented by images of the speaker’s face. This is one of the great challenges in the domain of automatic speech processing which did not receive satisfactory answer yet. The development of efficient algorithms would open new directions of research in the domain of second language learning, language acquisition and automatic speech recognition.
The objective is to develop inversion algorithms for fricative sounds. Indeed, there exist now numerical simulation models for fricatives. Their acoustics and dynamics are better known than those of stops and it will be the first category of sounds to be inverted after vowels for which the Speech group has already developed efficient algorithms. The production of fricatives differs from that of vowels about two points:
The approach proposed is analysis-by-synthesis. This means that the signal, or the speech spectrum, is compared to a signal or a spectrum synthesized by means of a speech production model which incorporates two components: an articulatory model intended to approximate the geometry of the vocal tract and an acoustical simulation intended to generate a spectrum or a signal from the vocal tract geometry and the noise source. The articulatory model is geometrically adapted to a speaker from MRI images and is used to build a table made up of couples associating one articulatory vector and the corresponding acoustic image vector. During inversion, all the articulatory shapes whose acoustic parameters are close to those observed in the speech signal are recovered. Inversion is thus an advanced table lookup method which we used successfully for vowels [Ouni & Laprie 2005] [Potard et al. 2008].
Activities The success of an analysis by synthesis method relies on the implicit assumption that synthesis can correctly approximate the speech production process of the speaker whose speech is inverted. There exist fairly realistic acoustic simulations of fricative sounds but they strongly depend on the precision of the geometrical approximation of the vocal tract used as an input. There also exist articulatory models of the vocal tract which yield very good results for vowels. On the other hand, these models are inadequate for those consonants which often require a very accurate articulation at the front part of the vocal tract. The first part of the work will be about the elaboration of articulatory models that are adapted to the production of consonants and vowels. The validation will consist of piloting the acoustic simulation from the geometry and of assessing the quality of the synthetic speech signal with respect to the natural one. This work will be carried out for some X-ray films, whose the acoustic signal recorded during the acquisition of them is sufficiently good.
The second part of the work will be about several aspects of the inversion strategy. Firstly, it is now accepted that spectral parameters implying a fairly marked smoothing and frequency integration have to be used, which is the case of MFCC (Mel Frequency Cepstral Coefficients) vectors. However, the most adapted spectral distance to compare natural and synthetic spectra has to be investigated. Another solution consists in modeling the source so as to limit its impact on the computation of the spectral distance.
The second point is about the construction of the articulatory table which has to be revisited for two reasons: (i) only the cavity downstream the constriction plays an acoustic role, (ii) the location of the noise source is an additional parameter but it depends on the other articulatory parameters. The third point concerns the way of taking into account the vocal context. Indeed, the context is likely to provide important information about the vocal tract deformations before and after the fricative sound, and thus constraints for inversion.
A very complete software environment already exists in the Speech group for acoustic-to-articulatory inversion, which can be exploited by the post-doctoral student.
References - [S. Ouni and Y. Laprie 2005] Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion, Journal of the acoustical Society of America, Vol. 118, pp. 444-460 - [B. Potard, Y. Laprie and S. Ouni], Incorporation of phonetic constraints in acoustic-to-articulatory inversion, JASA, 123(4), 2008 (pp.2310-2323). - [Maeda et al. 2006] Technology inventory of audiovisual-to-articulatory inversion http://aspi.loria.fr/Save/survey-1.pdf
Expected skills Knowledge of speech processing and articulatory modeling, Acoustics, Computer sciences, Applied mathematics
|
Back | Top |