ISCA - International Speech
Communication Association


ISCApad Archive  »  2014  »  ISCApad #198  »  Jobs  »  (2014-12-06) 2 Post-doc positions at the Italian Institute of Technology.

ISCApad #198

Sunday, December 14, 2014 by Chris Wellekens

6-26 (2014-12-06) 2 Post-doc positions at the Italian Institute of Technology.
  

New techniques for vision-assisted speech processing


BC: 69721 (draft)

BC: 69724 (draft)

Istituto Italiano di Tecnologia (http://www.iit.it) is a private Foundation with the objective of promoting Italy's technological development and higher education in science and technology. Research at IIT is carried out in highly innovative scientific fields with state-of-the-art technology.

iCub Facility (http://www.iit.it/icub) and Robotics, Brain and Cognitive Sciences (http://www.iit.it/rbcs) Departments are looking for 2 post-docs to be involved in the H2020 project EcoMode funded by the European Commission under the H2020-ICT-2014-1 call (topic ICT-22-2014 – Multimodal and natural computer interaction).

Job Description: Robust automatic speech recognition in realistic environments for human-robot interaction, where speech is noisy and distant, is still a challenging task. Vision can be used to increase speech recognition robustness by adding complementary speech-production related information. In this project visual information will be provided by an event-driven (ED) camera. ED vision sensors transmit information as soon as a change occurs in their visual field, achieving incredibly high temporal resolution, coupled with extremely low data rate and automatic segmentation of significant events. In an audio-visual speech recognition setting ED vision can not only provide new additional visual information to the speech recognizer, but also drive the temporal processing of speech by locating (in the temporal dimension) visual events related to speech production landmarks.

The goal of the proposed research is the exploitation of highly dynamical information from ED vision sensors for robust speech processing. The temporal information provided by EDC sensors will allow to experiment with new models of speech temporal dynamics based on events as opposed to the typical fixed-length segments (i.e. frames).

In this context, we are looking for 2 highly motivated Post-docs, respectively tackling vision (Research Challenge 1) and speech processing (Research Challenge 2), as outlined herebelow:

Research Challenge 1 (vision @ iCub facility – BC 69721): the post-doc will mostly work on the detection of features from event-driven cameras instrumental for improving speech recognition (e.g. lips closure, protrusion, shape, etc…). The temporal features extracted from the visual signal will be used for crossmodal event-driven speech segmentation that will drive the processing of speech. In the attempt to increase the robustness to acoustic noise and atypical speech, acoustic and visual features will be combined to recover phonetic gestures of the inner vocal tract (articulatory features).

Research Challenge 2 (speech processing @ RBCS – BC 69724): the post-doc will mainly develop a novel speech recognition system based on visual, acoustic and (recovered) articulatory features, that will be targeted for users with mild speech impairments. The temporal information provided by EDC sensors will allow to experiment with new strategies to model the temporal dynamics of normal and atypical speech. The main outcome of the project will be an audio-visual speech recognition system that robustly recognizes the most relevant commands (key phrases) delivered by users to devices in real-word usage scenarios.

The resulting methods for improving speech recognition will be exploited for the implementation of a tablet with robust speech processing. Given the automatic adaptation of the speech processing to the speech production rhythm, the speech recognition system will target speakers with mild speech impairments, specifically subjects with atypical speech flow and rhythm, typical of some disabilities and of the ageing population. The same approach will then be applied to the humanoid robot iCub to improve its interaction with humans in cooperative tasks.

Skills: We are looking for highly motivated people and inquisitive minds with the curiosity to use a new and challenging technology that requires a rethinking of visual and speech processing to achieve a high payoff in terms of speed, efficiency and robustness. The candidates we are looking for should also have the following additional skills:
  • PhD in Computer Science, Robotics, Engineering (or equivalent) with a background in machine learning, signal processing or related areas;
  • ability to analyze, improve and propose new algorithms;
  • Good knowledge of C, C++ programming languages with proven experience.
Team-work, PhD tutoring and general lab-related activities are expected.

An internationally competitive salary depending on experience will be offered.

Please note that these positions are pending the signature of the grant agreement with the European Commission (expected start date in early 2015)


How to apply:
Challenge 1: Send applications and informal enquires to jobpost.69721@icub.iit.it This e-mail address is being protected from spambots. You need JavaScript enabled to view it
Challenge 2: Send applications and informal enquires to jobpost.69724@rbcs.iit.it This e-mail address is being protected from spambots. You need JavaScript enabled to view it

The application should include a curriculum vitae listing all publications and pdf files of the most representative publications (maximum 2) If possible, please also indicate three independent reference persons.

Presumed Starting Date:
Challenge 1: January 2015 (but later starts are also possible).
Challenge 2: June 2015 (but later starts are also possible).
Evaluation of the candidates starts immediately and officially closes on November 10th, 2014, but will continue until the position is filled.

References:
Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A 128×128 120 dB 15 μs latency asynchronous temporal contrast vision sensor. Solid-State Circuits, IEEE Journal of, 43(2), 566-576.
Rea, F., Metta, G., & Bartolozzi, C. (2013). Event-driven visual attention for the humanoid robot iCub. Frontiers in neuroscience, 7.
Benosman, R.; Clercq, C.; Lagorce, X.; Sio-Hoi Ieng; Bartolozzi, C., 'Event-Based Visual Flow,' Neural Networks and Learning Systems, IEEE Transactions on , vol.25, no.2, pp.407,417, Feb. 2014, doi: 10.1109/TNNLS.2013.2273537
Potamianos, G. Neti, C. Gravier, G. Garg, A. and Senior, A.W. (2003)  “Recent Advances in the Automatic Recognition of Audiovisual Speech” in Proceedings of the IEEE Vol. 91 pp. 1306-1326 
Glass, J. (2003)“A probabilistic framework for segment-based speech recognition”, Computer Speech and Language, vol. 17, pp. 137-152.
Badino, L., Canevari, C., Fadiga, L., Metta, G. 'Deep-Level Acoustic-to-Articulatory Mapping for DBN-HMM Based Phone Recognition', in IEEE SLT 2012, Miami, Florida, 2012

Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA