ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2010 » ISCApad #147 » Jobs

ISCApad #147

Sunday, September 12, 2010 by Chris Wellekens

6 Jobs

6-1

(2010-04-21) Post doc at LIMSI Paris

Post-doctoral position at LIMSI – Audio & Acoustic Group

The Audio & Acoustic group at LIMSI (http://www.limsi.fr/Scientifique/aa/) is currently recruiting for a for a 1 year postdoctoral research position on a CNRS grant. Two research subjects are available for this singular position. Candidates should send a CV, letter of motivation for the selected topic, and at least 2 references to Christophe d’Alessandro: cda@limsi.fr. Candidature letters of motivation should cite previous experience, relevance, and specific interests related to the details of the project. These documents should be received by then end of May. Notification should be made by the end of June. The selected candidate should be available to start between 1-July and 1-October.

Research Subject 1- A study on expressive prosody and voice quality using a gesturally driven voice synthesizer.

The modeling and analysis of expressive prosody raise many problems both on the perceptive and on the signal measurement sides. The analysis of voice source quality changes for expressive speech in particular faces the limitation of inversion procedures. The Audio & Acoustic group at LIMSI has developed a real-time version of the CALM voice source synthesizer (Doval et al., 2003; d’Alessandro et al., 2006; Le Beux, 2009), mapped on to several gestural controllers (graphic tablet, joystick, cyber glove...). This device constitutes a powerful tool for the analysis of expressive voice quality in an analysis by synthesis paradigm.

Hand gestures have been proven to be adequate in order to control prosodic variations (d’Alessandro et al., 2007). By playing this speech synthesizer, like a music through gestural device, a user is able to generate language specific interjections, based on vocalic expressive non-words. Such non-words are meaningful in a given language and culture and convey strong cues to meaning during a spoken interaction (see Wierzbicka, 1992, and also Contini, 1989, for Sardinian or Campbell, 2007, for Japanese).

The proposed project aims at acquiring data from such gesturally driven speech production. Then, the analysis of the synthesizer’s parameters in light of perception tests’ results may help to gain a better understanding of the use of voice quality variations during expressive speech. The different stages of the project (gestural production of expressive speech, subjective evaluation of the productions, modeling of the acoustic parameters of the voice) require different skills and the successful candidate will be able to focus on parts of the projects according to his/her own research interests. This project may also be extended towards the use and evaluation of an immersive 3D expressive speech synthesizer (see Research Subject 2).

The successful candidate will have a PhD in either phonetic, language sciences or psycholinguistics or any related field (with a strong emphasis on speech prosody analysis), and/or a PhD in signal processing or natural language processing (with a good knowledge of acoustic voice analysis). Musical training/practice would be an advantage.

References:

d'Alessandro, C., Rilliard, A. & Le Beux, S. (2007). Computerized chironomy : evaluation of hand-controlled intonation reiteration. INTERSPEECH 2007, Antwerp, Belgium : 2007, 1270-1273.

d'Alessandro, N., d'Alessandro, C., Le Beux, S. & Doval, B. (2006). Real-time CALM synthesizer : new approaches in hands-controlled voice synthesis. New Interface for Musical Expression, Paris, France, 266-271.

Contini, M. (1989). L’interjection en Sarde. Une approche linguistique.In Espaces Romans. Études de dialectologie et de géolinguistique offertes à Gaston Tuaillon, Volume 2, ELLUG:Grenoble, 320-329.

Campbell, N. (2007). The role and use of speech gestures in discourse. Archives of Acoustics, 32(4), 803–814.

Doval, B. ; D'alessandro, C. ; Henrich, N. (2003). The voice source as a causal/anticausal linear filter. VOQUAL'03 - Voice Quality : functions, analysis and synthesis, Geneva, Switzerland : 2003.

Le Beux, S. (2009). Contrôle gestuel de la prosodie et de la qualité vocale. PhD thesis, Université Paris Sud/LIMSI, Orsay, France.

Wierbizcka, A. (1992). The semantic of interjection. Journal of pragmatics, 18, 159-192.

Research Subject 2- Vocal directivity, real and virtual, study and practice

Directivity of the human voice has been the topic of recent research efforts. This 1 year post-doc position concerns the development and combination of recent efforts at LIMSI pertaining to the measurement and understanding of vocal directivity, and its integration into an immersive virtual environment. Preliminary studies on vocal directivity patterns have recently been shown to vary significantly between phonemes. Relying predominantly on several recently acquired databases of sung and spoken voice directivity, a detailed analysis shall be carried out. Two branches of research are then open to follow. First, it is hoped that a numerical model could be employed (using Boundary Element Method modeling) to validate any conclusions on the physical basis for the directivity variations. A second direction of the proposed project concerns the incorporation of voice directivity patterns into an immersive virtual environment text-to-speech simulator. These variations have also been found to be perceptible, and there is an interest to study to what degree these variations are important to the perceived quality of immersive environments. As such, there shall be implementation of the effect, as well as the performance of subjective evaluations. This integration can be in the form of a text-to-speech synthesizer, or an expressive voice synthesizer (see Research Subject 1). The proportion of effort allocated to these two aspects of the project will depend on the skills and interest of the chosen candidate.

A thesis in acoustics, audio signal processing, or other similar field is required. Familiarization with measurement and analysis procedures is required. Familiarity with general software tools such as MatLab, real-time processing software, such as Max/MSP or PureData, and BEM software are a benefit. Candidates should be highly motivated, and capable of working both independently and in a multi-disciplinary group environment.

References:

Katz, Brian F.G. & d'Alessandro, Christophe, 'Directivity Measurements of the Singing Voice.' Proceedings of the 19th International Congress on Acoustics (ICA'2007), Madrid, 2-7 September 2007.

Brian F.G. Katz, Fabien Prezat, and Christophe d'Alessandro, 'Human voice phoneme directivity pattern measurements.' Fourth Joint Meeting: ASA and ASJ, Honolulu, November 2006, J. Acoust. Soc. Am., Vol. 120(5), Pt. 2, November 2006.

Martin, J.-C., d'Alessandro, C., Jacquemin, C., Katz, B., Max, A., Pointal, L. and Rilliard, A., '3D Audiovisual Rendering and Real-Time Interactive Control of Expressivity in a Talking Head.' Proceedings of the 7th

International Conference on Intelligent Virtual Agents (IVA'2007), Paris, France September 17-19, 2007.

Martin, J.-C., Jacquemin, C., Pointal, L., Katz, B., d'Alessandro, C., Max, A. and Courgeon, M., 'A 3D Audio-Visual Animated Agent for Expressive Conversational Question Answering.' International Conference on Auditory-Visual Speech Processing (AVSP'2007). Eds. J. Vroomen, M. Swerts, E. Krahmer. Hilvarenbeek, The Netherlands August 31 - September 3, 2007.

Martin, J.-C.; D'Alessandro, C.; Jacquemin, C.; Katz, B.F.G.; Max, A.; Pointal, L.; Rilliard, A., '3D audiovisual rendering and real-time interactive control of expressivity in a Talking Head.' IVA 2007. 7th International Conference on Intelligent Virtual Agents, Paris, 17-19 September 2007.

N. Misdariis, A. Lang, B. Katz and P. Susini, 'Perceptual effects of radiation control with a multi-louspeaker device.' Proceedings of the 155th ASA, 5th Forum Austicum, & 2nd ASA-EAA Joint Conference, Paris, 29Jun-6Jul 2008.

Markus Noisternig, Brian FG Katz, Samuel Siltanen, and Lauri Savioja, 'Framework for Real-Time Auralization in Architectural Acoustics.' Journal of Acta Acustica united with Acoustica, Vol. 94 (2008), pp. 1000-1015, doi 10.3813/aaa.918116

M. Noisternig, L. Savioja and B. Katz, 'Real-time auralization system based on beam-tracing and mixed-order Ambisonics.' Proceedings of the 155th ASA, 5th Forum Austicum, & 2nd ASA-EAA Joint Conference, Paris, 29Jun-6Jul 2008.

M. Noisternig, B. Katz and C. D'Alessandro, 'Spatial rendering of audio-visual synthetic speech use for immersive environments.' Proceedings of the 155th ASA, 5th Forum Austicum, & 2nd ASA-EAA Joint Conference, Paris, 29Jun-6Jul 2008.

The LIMSI is located approximately 30 minutes South of Paris by commuter train (RER B). The laboratory accommodates approximately 120 permanent personnel (researchers, professors and assistant professors, engineers, technicians) and about sixty PhD candidates. It undertakes multidisciplinary research in Mechanical and Chemical Engineering and in Sciences and Technologies for Information and Communication. The research fields cover a wide disciplinary spectrum from thermodynamics to cognition, encompassing fluid mechanics, energetics, acoustics and voice synthesis, spoken language and text processing, vision, virtual reality...

ISCApad #147

Engineer in HLT Evaluation Department

Programmer