ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2014 » ISCApad #194 » Jobs » (2014-06-012) 2 PhD scholarships at Italian Institute of Technology, Genova, Italy

ISCApad #194

Monday, August 04, 2014 by Chris Wellekens

6-40 (2014-06-012) 2 PhD scholarships at Italian Institute of Technology, Genova, Italy

1. Acoustic-articulatory modeling for automatic speech recognition

Tutors: Leonardo Badino, Lorenzo Rosasco, Luciano Fadiga

Department: Robotics Brain and Cognitive Sciences (Italian Istitute of Technology), Genova, Italy

http://www.iit.it/rbcs

Description: State-of-the art Automatic Speech Recognition (ASR) systems produce remarkable results in some scenarios but still lags behind human level performance in several real usage scenarios and often perform poorly whenever the type of acoustic noise, the speaker’s accent and speaking style are 'unknown' to the system, i.e., are not sufficiently covered in the data used to train the ASR system.

The goal of the present theme is to improve ASR accuracy by learning representations of speech that combine the acoustic and the (vocal tract) articulatory domain as opposed to purely acoustic representations, which only consider the surface level of speech (i.e., speech acoustics) and ignore its causes (the vocal tract movements). Although in real usage settings the vocal tract cannot be observed during recognition it is still possible to exploit the articulatory representations of speech where phonetic targets (i.e., the articulatory targets necessary to produce a given sound) are largely invariant (e.g., to speaker variability) and complex (in the acoustic domain) speech phenomena have simple descriptions.

Joint acoustic-articulatory modeling will be applied in two different ASR training settings: a typical supervised machine learning setting where phonetic transcriptions of the training utterances are provided by human experts, and a weakly supervised machine learning setting where much sparser and less informative labels (e.g., word-level rather than phone level labels) are available.

Requirements: The successful candidate will have a degree in computer science, bioengineering, physics or related disciplines, and a background in machine learning. Interest in neuroscience.

Reference: King, S., Frankel, J., Livescu, K., McDermott, E., Richmond, K., Wester, M. (2007). 'Speech production knowledge in automatic speech recognition'. Journal of the Acoustical Society of America, vol. 121(2), pp. 723-742.

Contacts: leonardo.badino@iit.it, lorosasco@mit.edu, luciano.fadiga@iit.it 2

2. Speech production for automatic speech recognition in human–robot verbal interaction

Tutors: Giorgio Metta, Leonardo Badino, Luciano Fadiga

Department: iCub Facility (Istituto Italiano di Tecnologia), Genova, Italy

http://www.iit.it/iCub

Description: State-of-the art Automatic Speech Recognition (ASR) systems produce remarkable results in partially controlled scenarios but still lags behind human level performance in unconstrained real usage situations and perform poorly whenever the type of acoustic noise, the speaker’s accent and speaking style are 'unknown' to the system, i.e., are not sufficiently covered in the data used to train the ASR system. The goal of this PhD theme is to attack the problem of ASR in a human to robot conversation. To this aim, we will create a robust Key Phrases Recognition system where commands delivered by the user to the robot (i.e., the key phrases) have to be recognized in unconstrained utterances (i.e., utterances with hesitations, disfluencies, additional out-of-task words, etc.), in the challenging conditions of human-robot verbal interaction where speech is typically distant (to the robot) and noisy. To increase the robustness of the ASR, articulatory information will be integrated into a Deep Neural Network – Hidden Markov Model system.

This work will be carried out and tested on the iCub platform.

Requirements: background in computer science, bioengineering, computer engineering, physics or related disciplines. Solid programming skills in C++, Matlab, GPU (CUDA) are a plus. Attitude for problem solving. Interests in understanding/learning basic biology.

Reference: Barker, J., Vincent, E., Ma, N., Christensen, H., Green, P., (2013) 'The PASCAL CHiME Speech Separation and Recognition Challenge'. Computer Speech and Language, vol. 27(3), pp. 621-633.

Contacts: leonardo.badino@iit.it, giorgio.metta@iit.it, luciano.fadiga@iit.it

Additional information

Starting date: November 2014.

PhD scholarship: the scholarship will cover all fees with a gross salary of 16500 euros/year (≈1250 euros/month after taxes)

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy