ISCApad #257 |
Tuesday, November 12, 2019 by Chris Wellekens |
Applications are invited for a three-year Early Stage Researcher PhD positions in the speech technology for pathological speech. Description
The thesis focuses on studying the link between the internal representations of Deep Neural Networks (DNNs) and the subjective representation of speech intelligibility. We propose to explore the saliency detection capabilities of DNNs when used in a regression task for predicting speech intelligibility scores as given by human experts. By saliency, we mean to retrieve which frequency bands are important and used by a DNN to make its predictions.
The final expectation is to identify regions of interest in the speech signal, both in time and frequency, that characterise the level of speech impairment.
The experiments will be processed on various samples of speech performed by 150 people (100 patients and 50 healthy controls). This database was recorded within the INCA C2SI project, and contains speech from patients treated for cancer of the oral cavity or pharynx. It contains also various metadata such as the location of the tumor, the impairment in terms of severity and intelligibility that were appreciated by human experts, self evaluation questionnaires on the patient?s quality of life? Various tasks were recorded such as a sustained vowel, read speech, nonsense words, prosodic exercises, picture description, etc.
There will be also the possibility to extend the work to another corpus which is composed of voice of patients suffering from Parkinson disease.
At first, the PhD will have to take benefit from the various analysis and descriptions that were done during the C2SI project trying to correlate the impact of the tumor and the communication ability. Those results will help attesting the human representation of the impact of the disease. Then, a DNN representation will be modeled to fit the data, taking care of the data sparsity. The last part of the work will be to explore the intern representation of the DNN, trying to explore what part of the signal help to make a decision on the impact of the disease and that will be the final goal of the thesis, studying the automatic representation that lies in the model the student will propose.
This work is funded by the TAPAS project (https://www.tapas-etn-eu.org) which is a Horizon 2020 Marie Sk?odowska-Curie Actions Initial Training Network European Training Network (MSCA-ITN-ETN) project that aims to transform the well being of people across Europe with debilitating speech pathologies (e.g., due to stroke, Parkinson's, etc.). These groups face communication problems that can lead to social exclusion. They are now being further marginalised by a new wave of speech technology that is increasingly woven into everyday life but which is not robust to atypical speech.
The supervision of the PhD will take place at IRIT laboratory by the SAMoVA team in Toulouse. SAMoVA does research in the domain of ?analysis, modeling and structuring of audiovisual content?. The application areas are diverse: speech processing, identification of languages, speaker verification and speech and music indexing. The researchers expertise covers novel machine learning and audio processing technologies and is now focused on deep learning methods, leading to several publications in international conferences.
Eligibility Criteria: Early Stage Researchers (ESRs) shall, at the time of recruitment by the host organization, be in the first four years (full-time equivalent research experience) of their research careers. - The ESR may be a national of a Member State, of an Associated Country or of any Third Country.
Applications can be done through the website : https://www.tapas-etn-eu.org/positions/recruitment
Contact : Julie Mauclair (mauclair@irit.fr)
|
Back | Top |