ISCA - International Speech
Communication Association


ISCApad Archive  »  2021  »  ISCApad #275  »  Jobs

ISCApad #275

Thursday, May 13, 2021 by Chris Wellekens

6 Jobs
6-1(2020-12-01) Funded PhD Position at University of Edinburgh, Scotland, UK

Funded PhD Position at University of Edinburgh

 

PhD Position: Automatic Affective Behaviour Monitoring through speech and/or multimodal means while preserving user’s privacy

 

For details please visit:

https://www.findaphd.com/phds/project/automatic-affective-behaviour-monitoring-through-speech-while-preserving-user-s-privacy/?p125956

……………………………………………………………………………………………………………………………………………………………………….

About the Project

The Advanced Care Research Centre at the University of Edinburgh is a new £20m interdisciplinary research collaboration aiming to transform later life with person centred integrated care

The vision of the ACRC is to play a vital role in addressing the Grand Challenge of ageing by transformational research that will support the functional ability of people in later life so they can contribute to their own welfare for longer. With fresh and diverse thinking across interdisciplinary perspectives our academy students will work to creatively embed deep understanding, data science, artificial intelligence, assistive technologies and robotics into systems of health and social care supporting the independence, dignity and quality-of-life of people living in their own homes and in supported care environments.

The ACRC Academy will equip future leaders to drive society’s response to the challenges of later life care provision; a problem which is growing in scale, complexity and urgency. Our alumni will become leaders in across a diverse range of pioneering and influential roles in the public, private and third sectors.

Automatic affect recognition technologies can monitor a person’s mood and mental health by processing verbal and non-verbal cues extracted from the person’s speech. However, the speech signal contains biometric and other personal information which can, if improperly handled, threaten the speaker’s privacy. Hence there is a need for automatic inference and monitoring methods that preserve privacy for speech data in terms of collection, training of machine learning models and use of such models in prediction. This project will focus on research, implementation and assessment of solutions for handling of speech data in the user’s own environment while protecting their privacy. We are currently studying the use of speech in healthy ageing and care in combination with IoT/Ambient Intelligence technologies in a large research project. This project will build on our research in this area.

 

The goals of this PhD project are:

  • to establish and assess user privacy requirements,
  • to devise privacy-preserving automatic affect recognition methods,
  • to develop speech data collection methods and tools for privacy-sensitive contexts, and
  • to evaluate these methods with respect to performance and privacy preservation requirements.

 

Training outcomes include machine learning methods for inference of mental health status, privacy-preserving machine learning and signal processing, and applications of such methods in elderly care.

Back  Top

6-2(2020-12-03) 6 months internship at GIPSA-Lab, Grenoble, France

Deep learning-based speech coding and synthesis in adverse conditions.

Projet : Vokkero 2023

Type : Internship, 6 months, start of 2021

Offre : vogo-bernin-pfe-2

Contact : r.vincent@vogo.fr

Keywords : Neural vocoding, deep-learning, speech synthesis, training dataset, normalisation.

Résumé : The project consists in evaluating the performances of the LPCNet neural vocoder for

speech coding and decoding under adverse conditions (noisy environment, varied speech style, etc.)

and in proposing learning techniques to improve the quality of synthesis.

1 L’entreprise VOGO, le Gipsa-lab

Vogo is an SME based in Montpellier, south of France : www.vogo-group.com. Vogo is the first Sportech

listed on Euronext Growth and develops solutions that enrich the experience of fans and professionals

during sporting events. Its brand Vokkero is specialized in the design and production of radio communication

systems : www.vokkero.com. It offers solutions for teams working in very noisy environments and

is notably a world reference in the professional sports refereeing market.

Gipsa-lab is a CNRS research unit joint with Grenoble-INP (Grenoble Institute of Technology), and

Université Grenoble Alpes. With 350 people, including about 150 doctoral students, Gipsa-lab is a multidisciplinary

research unit developing both basic and applied researches on complex signals and systems.

Gipsa-lab is internationally recognised for the research achieved in Automatic Control, Signal and Images

processing, Speech and Cognition, and develops projects in the strategic areas of energy, environment,

communication, intelligent systems, Life and Health and language engineering.

2 Le projet Vokkero 2023

Every 3 years, Vokkero renews its Hardware (radio, cpu) and Software (rte, audio processing) platforms,

in order to design new generations of products. The project extends over several years of study

and it is within this framework that the internship is proposed. In the form of a partnership with the Gipsalab,

the project consists in the study of speech coding using « neural networks » approaches, in order

to obtain performances not yet reached by classical approaches. The student will work at the GIPSA-lab

in the CRISSP team of the Speech and Cognition cluster under the supervision of Olivier PERROTIN, research

fellow at CNRS, and at the R&D of Vogo Bernin, with Rémy VINCENT, project leader on the Vogo

side.

3 Context & Objectives

The project consists in evaluating the performances of the LPCNet neural vocoder for speech coding

and decoding under adverse conditions (noisy environment, varied speech style, etc.) and in proposing

learning techniques to improve the quality of synthesis.

3.1 Context

Vocoders (voice coders) are models that allow a speech signal to be first reduced to a small set of

parameters (this is speech analysis or coding) and then reconstructed from these parameters (this is

speech synthesis or decoding). This coding/decoding process is essential in telecommunication applications,

where speech is coded, transmitted and then decoded at the receiver. The challenge is to minimise

the quantity of information transmitted, while keeping the quality of the reconstructed speech signal as

high as possible. Current techniques use high-quality speech signal models, with a constraint on algorithmic

complexity to ensure real-time processes in embedded systems. Examples of Codecs widelay

used are Speex (Skype) and its little brother, Opus (Zoom). A few orders of magnitude : OPUS converts a

sampled stream at 16kHz into a bitstream at 16kbits (i.e. a compression ratio of 1 :16), the reconstructed

signal is also at 16kHz and has 20ms of latency.

Since 2016 a new type of vocoder has emerged, called neural vocoder. Based on deep neural network

architectures, these are able to generate a speech signal from the classical input parameters of a

vocoder, without a priori knowledge of an explicit speech model, but using machine learning. The first

system, Google’s WaveNet [1], is capable of reconstructing a signal almost identical to natural speech,

but at a very high computation cost (20 seconds to generate a sample, 16,000 samples per second).

Since then, models have been simplified and are capable of generating speech in real time (WaveRNN

[2], WaveGlow [3]). In particular, the LPCNet neural vocoder [4, 5], also developed by Mozilla, is able to

convert a 16kHz sampled stream into a 4kbits bitstream, and reconstruct a 16kHz audio signal. This mix

of super-compression combined with bandwidth extension leads to much higher equivalent compression

ratios than 1 :16 !

However, the ability of these systems to generate high-quality speech has only been evaluated following

training on large and homogeneous databases, i.e. 24 hours of speech read by a single speaker

and recorded in a quiet environment [6]. On the other hand, in the application of Vokkero, speech is

recorded in adverse conditions (very noisy environment), and presents a significant variability (spoken

voice, shouted voice, multiplicity of referees, etc.). Is a neural vocoder trained on a read speech database

capable of decoding speech of this type? If not, is it possible to train the model on such data, while

they are only available in small quantities ?

The aim of this internship is to explore the limits of the LPCNet vocoder in application to the decoding

of referee speech. Various learning strategies (curriculum training, transfer learning, learning on

augmented data, etc.) will then be explored to try to adapt pre-trained models to our data.

3.2 Tasks

The student will evaluate the performance of a pre-trained LPCNet vocoder on referee speech data,

and will propose learning strategies to adapt the model to this new data, in a coding/re-synthesis scena

rio :

1. Get familiar with the system, performance evaluation on an audio-book database (baseline) ;

2. Evaluation of LPCNet on the Vokkero database and identification of the limits (ambient noise, pretreatments,

voice styles, etc.) ;

3. Study of strategies to improve system performance by data augmentation :

— Creation of synthetic and specific databases : noisy atmospheres, shouted voices ;

— Recording campaigns on Vokkero systems, in anechoic rooms and/or real conditions if the sanitary

situation allows it ;

— Comparison of the 2 approaches according to various learning strategies to learn a new model

from this data.

3.3 Required Skills

The student is expected to have a solid background in speech signal processing and an interest in

Python development. Experience in programming deep learning models in Python is a plus. The student

is expected to show curiosity for research, scientific rigour in methodology and experimentation, and

show autonomy for technical and organisational aspects. Depending on the candidate’s motivation, and

subject to obtaining funding, it is possible to pursue this topic as a PhD thesis.

The student will be able to subscribe to the company’s insurance system, will have luncheon vouchers

and will receive a monthly gratuity of 800€.

Références

[1] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. W.

Senior et K. Kavukcuoglu, “WaveNet : A Generative Model for Raw Audio”, CoRR, t. abs/1609.03499,

2016. arXiv : 1609.03499 (cf. p. 1).

[2] N. Kalchbrenner, E. Elsen, K. Simonyan, S. Noury, N. Casagrande, E. Lockhart, F. Stimberg, A. van

den Oord, S. Dieleman et K. Kavukcuoglu, “Efficient Neural Audio Synthesis”, CoRR, t. abs/1802.08435,

2018. arXiv : 1802.08435 (cf. p. 1).

[3] R. Prenger, R. Valle et B. Catanzaro, “Waveglow : A Flow-based Generative Network for Speech Synthesis”,

in Proceedings of the International Conference on Acoustics, Speech and Signal

Processing (ICASSP), Brighton, UK : IEEE, mai 2019, p. 3617-3621 (cf. p. 1).

[4] J.-M. Valin et J. Skoglund, “LPCNET : Improving Neural Speech Synthesis through Linear Prediction”,

in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP),

sér. ICASSP ’19, Brighton, UK : IEEE, mai 2019, p. 5891-5895 (cf. p. 1).

[5] ——, “A Real-Time Wideband Neural Vocoder at 1.6kb/s Using LPCNet”, in Proceedings of Interspeech,

Graz, Austria : ISCA, sept. 2019, p. 3406-3410 (cf. p. 1).

[6] P. Govalkar, J. Fischer, F. Zalkow et C. Dittmar, “A Comparison of Recent Neural Vocoders for Speech

Signal Reconstruction”,


 

Back  Top

6-3(2020-12-04) 6 mois de post-doctorat, IRIT, Toulouse, France

Dans le cadre du projet interdisciplinaire INGPRO, projet Région Occitanie sur l'étude de l'Incidence des Gestes sur la PROnonciation, l'IRIT (équipe SAMoVA https://www.irit.fr/SAMOVA/site/) propose 6 mois de post-doctorat pour travailler sur le traitement des donne?es de parole (évaluation manuelle et automatique) recueillies lors d'une expe?rimentation qui se de?roulera au printemps 2021. Cette expe?rimentation implique le recueil de donne?es orales dans différentes conditions expe?rimentales ainsi que l'analyse des données collectées. 

Ce travail se fera en collaboration  avec la société Archean Technologie (http://www.archean.tech/archean-labs-en.html) et le laboratoire Octogone de l'UT2J (https://octogone.univ-tlse2.fr/), partenaires du projet. Si vous êtes intéressé.e, vous trouverez ci-dessous le détail de l'offre proposée.

Offre : https://www.irit.fr/SAMOVA/site/wp-content/uploads/2020/12/Ficheposte_PostDoc_INGPRO_2021.pdf

Des compléments sur le projet INGPRO sont accessibles ici : https://www.irit.fr/SAMOVA/site/projects/current/ingpro/

 

Poste à pourvoir : CDD (post-doc catégorie A)
Durée : 6 mois
Lieux : IRIT, 118, route de Narbonne - 31062 TOULOUSE, déplacements ponctuels à prévoir à Mautauban ( ARCHEAN LABS, 20 place Prax-Paris - 82000 MONTAUBAN)
Contacts : Isabelle Ferrané (isabelle.ferrane@irit.fr), Charlotte Alazard (charlotte.alazard@univ-tlse2.fr) porteuse du projet
Salaire : selon expérience
Dossier de candidature : à envoyer avant le 15 janvier 2021 pour une prise de poste au plus tard au 1/04/2021
Diplôme : Doctorat en linguistique avec une spécialisation en phonétique (une dimension acquisitionnelle en L2 serait un plus)

Back  Top

6-4(2020-12-05) 5/6 months Internship, LIS-Lab, Université Aix-Marseille, France

Deep learning for speech perception
(Apprentissage profond pour la perception de la parole)

Length of internship: 5-6 months
Start date: between January and March
Contact: Ricard Marxer

Context
---
Recent deep learning (DL) developments have been key to breakthroughs in many artificial
intelligence (AI) tasks such as automatic speech recognition (ASR) [1] and speech
enhancement [2]. In the past decade the performance of such systems on reference corpora
has consistently increased driven by improvements in data-modeling and representation
learning techniques. However our understanding of human speech perception has not
benefited from such advancements. This internship sets the ground for a project that
proposes to gain knowledge about our perception of speech by means of large-scale
data-driven modeling and statistical methods. By leveraging modern deep learning
techniques and exploiting large corpora of data we aim to build models capable of
predicting human comprehension of speech at a higher level of detail than any other
existing approaches [3].

This internship is funded by the ANR JCJC project MIM (Microscopic Intelligibility
Modeling). It aims at predicting and describing speech perception at the stimuli,
listener and sub-word level. The project will also fund a PhD position, the call for
applications will be published in the coming months. A potential followup in PhD could be
foreseen for the successful candidate of this internship.

Subject
---
In an attempt to use DL methods for speech perception tasks, this internship aims at
participating in the first Clarity challenge. This challenge tackles the difficult task
of performing speech enhancement for optimising intelligibility of a speech signal in
noisy conditions. The challenge opens in January 2021, it is the first of its kind with
the objective of advancing hearing-aid signal processing and the modelling of
speech-in-noise perception.

Several research directions will be explored, including but not limited to:
- perceptual-based loss functions
- advanced speech representation learning pipelines
- DL-based multichannel processing techniques

Given that the baseline and data of the challenge are to be published in January 2021 and
the difficulty of the task remains uncertain, a backup plan is foreseen for this
internship that is more tightly related to the context of the ANR project.

In the MIM project, we focus on corpora of consistent confusions: speech-in-noise stimuli
that evoke the same misrecognition among multiple listeners.  In order to simplify this
first approach to microscopic  intelligibility  prediction,  we  will  restrict  to 
single-word  data.   This  should  reduce  the  lexical factors to aspects such as usage
frequency and neighborhood density, significantly limiting the complexity of the required
language model. Consistent confusions are valuable experimental data about the human
speech perception process. They provide targets for how intelligibility models should
dif-ferentiate from automatic speech recognition (ASR) systems.  While ASR models are
optimised to recognise what has been uttered, the proposed models should output what has
been perceived by a set of listeners. A sub-task encompasses creating baseline models
that predict listeners? responses to the noisy speech stimuli. We will target predictions
at different levels of granularity such as predicting the type of confusion, which phones
are misperceived or how a particular phone is confused.

Several models regularly used in speech recognition tasks will be trained and evaluated
in predicting the misperceptions of the consistent confusion corpora. We will first focus
on well established models such as GMM-HMM and/or simple deep learning architectures.
Advanced neural topologies such as TDNNs, CTC-based or attention-based models will also
be explored, even though the relatively small amount of training data in the corpora is
likely to be a limiting factor. As a starting point we envisage solving the 3 tasks
described in [3] consisting of 1) predicting the probability of occurrence of
misrecognitions at each position of the word, 2) given the position, predicting a
distribution of particular phone misperceptions, and 3) predicting the words and the
number of times they have been perceived among a set of listeners.  Predictions will be
evaluated using the metrics also defined in [3] and random and oracle predictions will be
used as references. These baseline models will be trained using only in-domain data and
optimized on word recognition tasks.


Profile
---
The candidate shall have the following profile:
- Master 2 level or equivalent in one of the following fields: machine learning, computer
science, applied mathematics, statistics, signal processing
- Good English written and spoken language skills
- Programming skills, preferably in Python

Furthermore the ideal candidate would have:
- Experience in one of the main DL frameworks (e.g. PyTorch, Tensorflow)
- Notions in speech or audio processing

Application procedure
---
In order to apply send the following to the contact address ricard.marxer@lis-lab.fr:
- CV
- Motivation letter
- Latest grades transcript (M1 and the 1st semester of M2 if available)

References
---
[1] Barker, J., Marxer, R., Vincent, E., & Watanabe, S. (2017). The third ?CHiME? speech
separation and recognition challenge: Analysis and outcomes. Computer Speech & Language,
46, 605?626.
[2] Marxer, R., & Barker, J. (2017). Binary Mask Estimation Strategies for Constrained
Imputation-Based Speech Enhancement. In Proc. Interspeech 2017 (pp. 1988?1992).
[3] Marxer, R., Cooke, M., & Barker, J. (2015). A framework for the evaluation of
microscopic intelligibility models. In Proceedings of the Annual Conference of the
International Speech Communication Association, INTERSPEECH (Vol. 2015-January, pp.
2558?2562).

Back  Top

6-5(2020-12-06) chercheur mi-temps, Université de Mons, Belgique

Le laboratoire de phonétique de l?UMONS (https://sharepoint1.umons.ac.be/FR/universite/facultes/fpse/serviceseetr/sc_langage/Pages/default.aspx), rattaché à l?Institut de Recherches en Sciences du Langage (https://langage.be/), est un labo SHS partenaire d?un projet R&D porté par la société https://roomfourzero.com/ et financé par la DG06 Wallonie (https://recherche-technologie.wallonie.be/): le projet « SALV », pour « sens à la voix ».

 

Pour mener à bien ce projet, nous recrutons un chercheur à mi-temps pour une première période de 6 mois ou à temps plein pour une première période de 3 mois à partir de janvier 2021 (avec possibilité d?extension à 1 ou 2 ans).

 

La mission est double :

  • Le/la candidat.e sera en charge des missions dévolues à l?UMONS dans le cadre du projet SALV (+ d?infos sur le projet ci-dessous) ; à ce titre il/elle sera la personne qui fera l?interface avec Room 40 et coordonnera les activités d?éventuels autres participants au projet dans le labo (p.ex. étudiants, stagiaires, chercheurs)
  • Le/la candidat.e recherché.e développera une stratégie générale (voire des outils spécifiques) permettant aux chercheurs du labo (originaires pour la plupart des SHS) d?optimiser leurs procédures de collecte, transcription et analyse de données parole en utilisant les techniques de traitement automatique de la parole.

 

Les modalités pratiques de l?engagement sont à ce stade très flexibles, à discuter avec les possibilités/souhaits du candidat retenu. Le salaire est fixé selon le diplôme (idéalement : doctorat en informatique ; autres formations : à discuter) et l?expérience.

 

Si vous êtes intéressé.e ou si vous voulez simplement en savoir plus : veronique.delvaux@umons.ac.be

 

 

****************

SALV

 

Room 40 (https://roomfourzero.com/) est une société jeune et dynamique qui fournit un ensemble de produits et services incluant notamment l?analyse en temps réel et la détection d?anomalie dans des flux audios et vidéos, ainsi que l?analyse contextuelle fine, basée sur le concept d?ontologie, des significations explicites et implicites de textes et bribes de textes tels que ceux échangés sur les réseaux sociaux ou via SMS.

 

Le projet SALV a pour objectif de développer une technologie de transcription et d?analyse de contenu vocal incluant un ensemble d?informations paralinguistiques contextuelles (émotions, stress, attitudes). Il se base en partie sur l?utilisation de technologies d?analyse de texte en temps réel et d?ontologies spécifiques que Room 40 a sous licence et commercialise déja?. Pour ce projet, de nouveaux outils de transcription de parole en texte devront e?tre développés afin d?intégrer ces informations paralinguistiques sous forme de métadonnées au texte de la transcription. La conjonction de ces deux types d?information devrait grandement améliorer la qualité du résultat de l?analyse.

 

Au terme du projet, on vise une solution d?analyse de parole comportant: (i) une approche pour retranscrire des paroles en texte; (ii) une méthode pour annoter le texte de métadonnées reprenant des éléments paralinguistiques; (iii) un syste?me pour l?analyse pertinente du contenu audio combinant la retranscription et des éléments paralinguistiques; enfin (iv) une couche applicative intégrant les éléments ci-dessus et comprenant des algorithmes d?analyse de contenu, des interfaces graphiques spécifiques a? certains segments de marché, ainsi qu?un nombre d?APIs garantissant l?interopérabilité du syste?me avec les infrastructures existantes des partenaires.

Back  Top

6-6(2020-12-07) Internships at IRIT, Toulouse, France

L?équipe SAMoVA de l?IRIT à Toulouse propose plusieurs stages de fin d?étude (M2, ingénieur) en 2021 :

- Apprentissage profond (deep learning) de représentation audio
- Apprentissage profond (deep learning) pour la segmentation en locuteurs en flux
- Caractérisation et modélisation de voix pathologiques
- Mesure d?intelligibilité pour la parole pathologique

Tous les détails (sujets, contacts) sont disponibles dans la section 'Jobs' de l?équipe :
https://www.irit.fr/SAMOVA/site/jobs/

Back  Top

6-7(2020-11-20) 6 month-Internship, Ludo-Vic SAS, Paris France

 6 mois de stage Master 2

Détection de baisse d’engagement

VUE D'ENSEMBLE

L’objectif principal de ce stage est la détection de baisse d’engagement durant une interaction

avec nos agents conversationnels. La solution peut être trouvée en utilisant des modèles à base

de règles ou en utilisant des techniques de machine/deep learning [1, 2, 3, 4, 5].

OBJECTIFS

1. Analyser le comportement (mouvement de tête, émotion, ...) pour trouver les

caractéristiques de baisse d’engagement.

2. Modélisation et détection de la baisse d’engagement

3. Evaluation

4. Application en temps réel

Conditions du stage

Le stage se déroulera sur une période de 6 mois dans le département R&D du Ludo-Vic SAS.

Des outils de travail à distance sont disponibles au sein de l’entreprise.

Profil recherché

Bac +5 dans le domaine de l’informatique et de l'IA.

Capacité à réaliser des interactions et des animations 3D.

Expérience avec Unity3D et compétence en langage C# sont un vrai plus.

Rémunération : conditions standards de rémunération de stage.

CONTACTS ET CANDIDATURE

Merci d’envoyer votre CV (vos relevés de notes, vos rapports de projets/stages…) à :

- Jack Amberg : jack@ludo-vic.com

- Atef Ben-Youssef : atef@ludo-vic.com

Back  Top

6-8(2020-12-16) Master 2 / PFE internship at GIPSA-lab (Grenoble)

Stage MASTER / PFE 2020-2021

REAL-TIME SILENT SPEECH SYNTHESIS

BASED ON END-TO-END DEEP LEARNING MODELS

Context

Various pathologies affect the voice sound source, i.e. the vibration of the vocal folds, thus preventing any sound

production despite the normal functioning of articulators (movements of the jaw, tongue, lips, etc.): this is known as

silent speech. Silent speech interfaces [Denby et al., 2010] consist in converting inaudible cues such as articulators

movements into an audible speech signal to rehabilitate the speaker’s voice. At GIPSA-lab, we have a system for

measuring articulators using ultrasound imaging and video and for converting this data into acoustic parameters that

describe a speech signal, using machine learning [Hueber and Bailly, 2016, Tatulli and Hueber, 2017]. The speech

signal is then reconstructed from the predicted acoustic parameters using a vocoder [Imai et al., 1983]. Current

silent speech interfaces have two main limitations: 1) The intonation (or speech melody), normally produced by

the vibration of the vocal folds, is absent in the considered type of pathologies and is difficult to reconstruct from

articulatory information only; 2) The generated speech quality is often limited by the type of vocoder used. While the

recent emergence of neural vocoders has allowed a leap in the quality of speech synthesis [van den Oord et al., 2016],

they have not yet been integrated into silent speech interface, where the constraint of real-time generation is crucial.

Objectives

Mapping

We propose in this internship to address these two problems, by implementing an

end-to-end silent speech synthesis system with deep learning models. In particular,

it will consist in interfacing our system for articulation measurement and acoustic parameter

generation with the LPCNet neural vocoder [Valin and Skoglund, 2019].

The latter takes asinput acoustic parameters coming from articulation on the one hand,

and the intonation on the other hand. This distinction offers the possibility of decorrelating

both controls, by proposing a gestural control of the intonation for example [Perrotin, 2015].

Regarding the acoustic parameters, the first step will be to adapt the acoustic output

of our system to match theinput of LPCNet. Moreover, LPCNet is trained by default

on acoustic parameters extracted from natural speech, forwhich large databases

are available. However, the acoustic parameters predicted from silent speech are degraded,

and produced in small quantities. We will thus study the robustness of LPCNet to a degraded input, and several re-training strategies (adaptation of LPCNet to new data, end-to-end learning, etc.) will be explored. Once the system is functional, the second part of the

internship will consist in implementing the system in real-time, so that the speech s

ynthesis is generated synchronously with the user’s articulation. All stages of

implementation (learning strategies, real-time system) will be evaluated in terms of intelligibility, sound quality, and intonation reconstruction.

Tasks

The tasks expected during this internship are:

 Implement the full silent speech synthesis pipeline by interfacing the lab ultrasound

system with LPCNet, and explore training strategies.

 Evaluate the performance of the system regarding speech quality and reconstruction

errors.

 Implement and evaluate a real-time version of the system.

Required skills

 Signal processing and machine learning.

 Knowledge of Python and C is required for implementation.

 Knowledge of Max/MSP environment would be a plus for real-time implementation.

 Strong motivation for methodology and experimentation.

Allowance

The internship allowance is fixed by ministerial decree (about 570 euros / month).

Grenoble Images Parole Signal Automatique

UMR CNRS 5216 – Grenoble Campus

38400 Saint Martin d’Hères - FRANCE

Stage MASTER / PFE 2020-2021

Contact

Olivier PERROTIN + 33 4 76 57 45 36 olivier.perrotin@grenoble-inp.fr

Thomas HUEBER + 33 4 76 57 49 40 thomas.hueber@grenoble-inp.fr

References

[Denby et al., 2010] Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J. M., and Brumberg, J. S. (2010). Silent speech interfaces.

Speech Communication, 52(4):270–287.

[Hueber and Bailly, 2016] Hueber, T. and Bailly, G. (2016). Statistical conversion of silent articulation into audible speech using fullcovariance

hmm. Computer Speech & Language, 36(Supplement C):274–293.

[Hueber et al., 2010] Hueber, T., Benaroya, E.-L., Chollet, G., Denby, B., Dreyfus, G., and Stone, M. (2010). Development of a silent

speech interface driven by ultrasound and optical images of the tongue and lips. Speech Communication, 52(4):288–300.

[Imai et al., 1983] Imai, S., Sumita, K., and Furuichi, C. (1983). Mel log spectrum approximation (mlsa) filter for speech synthesis.

Electronics and Communications in Japan (Part I: Communications), 66(2):10–18.

[Perrotin, 2015] Perrotin, O. (2015). Chanter avec les mains: Interfaces chironomiques pour les instruments de musique numériques. PhD

thesis, Université Paris-Sud, Orsay, France.

[Tatulli and Hueber, 2017] Tatulli, E. and Hueber, T. (2017). Feature extraction using multimodal convolutional neural networks for visual

speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP ’17,

pages 2971–2975, New Orleans, LA, USA.

[Valin and Skoglund, 2019] Valin, J.-M. and Skoglund, J. (2019). Lpcnet: Improving neural speech synthesis through linear prediction.

In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), ICASSP ’19, pages 5891–5895, Brighton, UK.

IEEE.

[van den Oord et al., 2016] van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior,

A. W., and Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. CoRR, abs/1609.03499.

Back  Top

6-9(2020-12-18) Ingenieur de recherche (CDD), Lab. ALAIA, France

Le laboratoire Commun ALAIA, destiné à l'Apprentissage des langues Assisté par Intelligence Artificielle, recrute un ingénieur de recherche en CDD (12 mois avec prolongation possible).

Le travail à réaliser se fera en coordination avec les deux partenaires impliqués dans le LabCom : l'IRIT (Institut de Recherche en Informatique de Toulouse) et la société Archean Technologies et plus particulièrement son pôle R&D Archean Labs (Montauban 82).

ALAIA est centré sur l'expression et la compréhension orale d'une langue étrangère (L2). Les missions consisteront à concevoir, développer et intégrer des services innovants basés sur l'analyse des productions des apprenants L2, la détection et la caractérisation d'erreurs allant du niveau phonétique au niveau linguistique.

Les compétences attendues portent sur le traitement automatique de la parole et du langage, le machine learning et sont indispensables pour être opérationnel dès la prise de fonction. De bonnes connaissances en développement d'applications web seraient un plus.

Les candidatures sont à adresser à Isabelle Ferrané (isabelle.ferrane@irit.fr) et Lionel Fontan (lfontan@archean.tech). N'hésitez pas à nous contacter pour de plus amples informations.

Back  Top

6-10(2021-01-03) Postdoc researcher, Seikei University, Japan

We are seeking a highly motivated and ambitious post-doctoral researcher for the project of ?socially and culturally aware human-agent interaction,? led by Prof. Yukiko Nakano at Seikei University in Tokyo, Japan. The project is part of a larger government funded project for a human-avatar symbiotic society. The mission of our group (http://iui.ci.seikei.ac.jp/en/) is to research and develop technologies for human-agent/robot interaction, behavior generation, and behavior adaptation by focusing on social and cultural awareness.

Qualifications
- The ideal candidate must have a PhD degree and a strong background in machine learning and human-agent/robot interaction.
- Skills: Good skills in programming, such as Python and C#. Solid knowledge and experience in machine learning and deep learning using PyTorch. Solid knowledge of statistical analysis.
- Preferred qualifications: Programming skill in Unity or any animation engines.
- Research interests: human-agent/robot interaction, behavior generation (gestures, facial expressions, eye gaze, posture etc.), multimodal dialogue systems, social signal processing, multimodal machine learning, avatar communication, cross-cultural communication.

Employment
- Full time position.
- The employment contract can be extended based on the annual evaluation until November 2025 at the longest.
- Start date: after April 2021.
- The salary will be determined based on experience and expertise.
 
Application
Please submit your application by e-mail to y.nakano@st.seikei.ac.jp. Please send your application as soon as possible. The recruitment will end when a person is selected. The application should include;
1. Curriculum vitae including relevant professional experience and knowledge
2. One-page summary of your research background and interests
3. Summary of your Doctoral dissertation

Contact
If you have any questions, please contact Yukiko Nakano (y.nakano@st.seikei.ac.jp).

Back  Top

6-11(2021-01-04) Open positions at IDIAP, Martigny, Suisse

There is a fully funded PhD position open at Idiap Research Institute on 'Neural
Architectures for Speech Technology'.

The research will build on work done over the past few years at Idiap on creating tools
for physiologically plausible modelling of speech. The current 'toolbox' contains
rudimentary muscle models and means to drive these using conventional (deep) neural
networks. More recently, the focus has been on theoretical underpinnings via rigorous
Bayesian techniques.

Although the project remit is quite open, a significant research thread will be to
factorise current neural vocoders into physiological and probabilistic components; this
will be with a focus on identifying how they may be controlled by external agents such as
dialogue managers. Another possible thread is to examine these models in the context of
speech recognition. In doing this, we hope not only to enable the next generation of
expressive speech recognition and synthesis, but also to make inference about the
underlying physiological mechanisms of speech production and perception.

For more information, and to apply, please follow this link:
 https://www.idiap.ch/education-and-jobs/job-10313

Idiap is located in Martigny in French speaking Switzerland, but functions in English and
hosts many nationalities. PhD students are typically registered at EPFL. All positions
offer quite generous salaries. Martigny is a local art, culture and viticulture hub, and
is close to all manner of skiing, hiking and mountain life.

There are other open positions on Idiap's main page:
 https://www.idiap.ch/en/join-us/job-opportunities

Back  Top

6-12(2021-01-04) research scientist in spoken language processing at Naver Labs Europe, Grenoble, France
We are seeking to recruit a research scientist in spoken language processing at Naver Labs Europe (Grenoble, France) - https://europe.naverlabs.com 
More details below (you can apply online here as well)
 

DESCRIPTION

NAVER LABS Europe's mission is to create new ways to interact with digital and physical agents, while paving the way for these innovations into a number of NAVER flagship products and services. This includes research in models and algorithms to give humans faster and better access to data and to allow them to interact with technology in simpler and more natural ways. To fulfill our vision of intelligent devices communicating seamlessly with us, we need to considerably improve existing technology and methods that solve natural language processing problems.

We are looking for applications from research scientists  to make outstanding contributions to the invention, development and benchmarking of spoken language processing techniques. The research scientist would be part of the Natural Language Processing group of NAVER LABS Europe and her mission would be to develop research on one or more of the following themes: spoken language translation, speech recognition, text-to-speech synthesis, voice-based conversational search (with potential collaborations with the Search&Recommendation group).

At NAVER LABS we encourage participation in the academic community. Our researchers collaborate closely with universities and regularly publish in venues such as ACL, EMNLP,  Interspeech, KDD, SIGIR, ICLR, ICML and NeurIPS.

REQUIRED SKILLS

- Ph.D. in spoken language processing, speech processing, NLP or machine learning.

- Knowledge of latest developments in statistical and deep learning as applied to NLP and speech.

- Strong publication record in top-tier NLP, speech or machine learning conferences.

- Strong development skills, preferably in python and knowledge of relevant frameworks (tensorflow, pytorch, etc).

APPLICATION INSTRUCTIONS

You can apply for this position online. Don't forget to upload your CV and cover letter before you submit. Incomplete applications will not be accepted.

ABOUT NAVER LABS

NAVER LABS Europe has full-time positions, PhD and PostDoc opportunities throughout the year which are advertised here and on international conference sites that we sponsor such as CVPR, ICCV, ICML, NeurIPS, EMNLP, ACL etc.

NAVER LABS Europe is an equal opportunity employer.

NAVER LABS are in Grenoble in the French Alps. We have a multi and interdisciplinary approach to research with scientists in machine learning, computer vision, artificial intelligence, natural language processing, ethnography and UX working together to create next generation technology and services that deeply understand users and their contexts.

Back  Top

6-13(2021-01-07) Speech-NLP Master 2 Internship Year 2020-2021 at LISN (ex LIMSI), University Paris-Saclay, France

Speech-NLP Master 2 Internship Year 2020-2021

Speech Segmentation and Automatic Detection of Conflicts in

Political Interviews

LISN – Université Paris-Saclay

Internship for Last Year Engineer or Master 2 Students

Keywords: Machine Learning, Diarization, Digital Humanities, Political Speech, Prosody,

Expressive Speech

Context

This internship is part of the Ontology and Tools for the Annotation of Political Speech

(OOPAIP 2018), a transdisciplinary project funded under the DIM-STCN (Text Sciences and

New Knowledge) by the Regional Council of Ile de France. The project is carried out by the

European Center for Sociology and Political Science (CESSP) of the University of Paris 1

Panthéon-Sorbonne, the National Audiovisual Institute (INA), and the LISN. Its objective is to

design new approaches to develop detailed, qualitative, and quantitative analyzes of political

speech in the French media. Part of the project concerns the study of the dynamics of conflicting

interactions in interviews and political debates, which requires a detailed description and a

large corpus to allow for the models’ generalization. Some of the main challenges concern the

performance of speaker and speech style segmentation, e.g., improving the segmentation accuracy,

detecting superimposed speech, measuring vocal effort and other expressive elements.

Objectives

The main objective of the internship is to improve the automatic segmentation of political

interviews. In this context, we will be particularly interested in the detection of hubbub (strong

and prolonged overlapped speech). More precisely, we would like to extract features from the

speech signal (Eyben et al. 2015) correlated with the level of conflictual content in the exchanges,

based, for example, on the arousal level in the speaker’s voice—intermediate level between

the speech signal analysis and the expressivity description (Rilliard, d’Alessandro, and Evrard

2018)—or vocal effort (Liénard 2019).

The internship will initially be based on two corpora of 30 political interviews manually annotated

in speech turns and speech acts—within the framework of the OOPAIP project. It will begin

with a state of the art review of speech diarization and overlapped speech detection (Chowdhury

et al. 2019). The aim will then be to propose solutions based on recent frameworks (Bredin

et al. 2020) to improve the precise localization of speaking segments, in particular when the

frequency of speaker changes is high.

In the second part of the internship, we will look at a more detailed measurement and prediction

of the conflicting level of exchanges. We will search for the most relevant features to describe the

conflicting level and by adapting or developing a neural network architecture for its modeling.

The programming language used for this internship will be Python. The candidate will have

access to the LISN computing resources (servers and clusters with recent generation GPUs).

 

Publications

Depending on the degree of maturity of the work carried out, we expect the applicant to:

Distribute the tools produced under an open-source license

Write a scientific publication

Conditions

The internship will take place over a period of 4 to 6 months at the LISN (formerly LIMSI) in the

TLP group (spoken language processing). The laboratory is located near the plateau de Saclay,

university campus building 507, rue du Belvédère, 91400 Orsay. The candidate will be supervised

by Marc Evrard (evrard@limsi.fr). Allowance under the official standards (service-public.fr).

Applicant profile

Student in the last year of a 5-years diploma in the field of computer science (AI is a plus)

Proficiency in Python language and experience in using ML libraries (Scikit-Learn, Tensor-

Flow, PyTorch)

Strong interest in digital humanities and political science in particular

Experience in automatic speech processing is preferred

Ability to carry out a bibliographic study from scientific articles written in English

To apply: Send an email to evrard@limsi.fr including a résumé and a cover letter.

Bibliography

Bredin, Hervé, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin

Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, and Marie-Philippe Gill. 2020.

“Pyannote. Audio: Neural Building Blocks for Speaker Diarization.” In ICASSP. IEEE.

Chowdhury, Shammur Absar, Evgeny A Stepanov, Morena Danieli, and Giuseppe Riccardi.

2019. “Automatic Classification of Speech Overlaps: Feature Representation and Algorithms.”

Computer Speech & Language 55: 145–67.

Eyben, Florian, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos

Busso, Laurence Y Devillers, et al. 2015. “The Geneva Minimalistic Acoustic Parameter Set

(GeMAPS) for Voice Research and Affective Computing.” IEEE Transactions on Affective

Computing 7 (2): 190–202.

Liénard, Jean-Sylvain. 2019. “Quantifying Vocal Effort from the Shape of the One-Third Octave

Long-Term-Average Spectrum of Speech.” The Journal of the Acoustical Society of America

146 (4): EL369–75.

OOPAIP. 2018. “(Ontologie Et Outil Pour l’annotation Des Interventions Politiques).”

DIM STCN (Sciences du Texte et connaissances nouvelles) Conseil régional d’Ile de

France. http://www.dim-humanites-numeriques.fr/projets/oopaip-ontologie-et-outilspour-

lannotation-des-interventions-politiques/.

Rilliard, Albert, Christophe d’Alessandro, and Marc Evrard. 2018. “Paradigmatic Variation of

Vowels in Expressive Speech: Acoustic Description and Dimensional Analysis.” The Journal

of the Acoustical Society of America 143 (1): 109–22.

Back  Top

6-14(2021-01-13) PhD position at CWI, Amsterdam, The Netherlands
We have a PhD position available here in CWI, on the topic of user-centered optimisation for immersive media. 
The full ad, including the link to apply, can be found here: https://www.cwi.nl/jobs/vacancies/868111
 
I would like to ask you if you could disseminate the call within your network. You can also redirect any potential candidate to me, if they have any questions: irene@cwi.nl
The deadline for applications is February 1st. 
Back  Top

6-15(2021-01-19) Associate professor, Telecom Paris, France

Telecom Paris is hiring an associate professor in machine learning for distributed/multi-view machine listening and audio content analysis


Institut Polytechnique de Paris [1] - Telecom Paris [2], LTCI lab [3], ADASP group [4]


-- Important Dates (UPDATED!)
? *March 20th 2021: closing date*
?    End of April 2021: hearings of preselected candidates


Applications are invited for a permanent (indefinite tenure) faculty position at the Associate Professor level (Maitre de Conferences) in machine learning for distributed/multi-view machine listening and audio content analysis.

-- Context

Telecom Paris [2] is a French public institution for engineering higher education (grande ecole) and scientific research, founded in July 1878. It is a founding member of the Institut Polytechnique de Paris [1], a world-class scientific and technological institution. Located in Palaiseau, at the Plateau de Saclay (Paris outskirts), this Institution is a partnership between Ecole Polytechnique, ENSTA Paris, ENSAE Paris, Telecom Paris and Telecom SudParis, with HEC as a key partner. Students and faculty benefit from close relationships between the different institutions.
The Information Processing and Communication Laboratory (LTCI) [3] is Telecom Paris? in-house research laboratory. Since January 2017, it has continued the work previously carried out by the CNRS joint research unit of the same name. The LTCI was created in 1982 and is known for its extensive coverage of topics in the field of information and communication technologies. The LTCI?s core subject areas are computer science, networks, data science, signal and image processing and digital communications. The laboratory is also active in issues related to systems engineering and applied mathematics.
The open position will be hosted by Telecom Paris? Audio Data Analysis and Signal Processing (ADASP) group [4], a subgroup of the statistics, signal processing and machine learning (S²A) team, within the Images, Data & Signals (IDS) department [5].


-- Main missions

The hired associate professor will be expected to:

[Research activities]
?    Develop research in multi-view/distributed machine learning applied to machine listening, in line with the topics of Telecom Paris? Audio Data Analysis and Signal Processing (ADASP) group
?    Develop both academic and industrial collaborations, including collaborative activities with other Telecom Paris research departments and teams, and research contracts with industrial players
?    Submit proposals to national and international research project calls


[Teaching activities]
?    Participate in teaching activities at Telecom Paris and its partners (as part of joint Master programs), especially in machine learning, signal processing, and machine listening, including life-long training programs (e.g. the local Data Scientist certificate)

[Impact]
?    Publish high quality research work in leading journals and conferences
?    Play an active role in the research communities relevant to the position (serving in scientific committees and boards, organizing seminars, workshops, special sessions...)


-- Candidate profile

As a minimum requirement, the successful candidate will have:

?    A PhD degree
?    A track record of research and publication in one or more of the following areas: machine learning, signal processing or machine listening
?    Experience in deep learning, audio data analysis, machine listening, music data analysis, multi-view learning, distributed learning
?    Experience in teaching
?    Good command of English

The ideal candidate will also (optionally) have:
?    Knowledge in frugal learning techniques
?    Experience in source separation/enhancement and signal denoising techniques
?    Experience in distributed computing environments

Other skills expected include:
?    Capacity to work in a team and develop good relationships with colleagues and peers
?    Good communication and pedagogical skills

-- Place of work

Palaiseau (Paris outskirts), France

-- How to apply (UPDATED!)

The application shall be submitted, through this link: https://institutminestelecom.recruitee.com/o/maitre-de-conference-en-machine-listening, as a single pdf file, including:

?    a letter of motivation
?    a complete and detailed curriculum vitae
?    statements of research and teaching interests (4 pages)
?    three main publications
?    contact information for two references, to be sent to Slim Essid

-- Contact

Slim Essid (Coordinator of the ADASP group), https://perso.telecom-paris.fr/~essid/



[1] https://www.ip-paris.fr/en
[2] https://www.telecom-paris.fr/en/home
[3] https://www.telecom-paris.fr/en/research/laboratories/information-processing-and-communication-laboratory-ltci
[4] https://adasp.telecom-paris.fr
[5] http://www.tsi.telecom-paristech.fr/en/      

Back  Top

6-16(2021-02-15) Ingenieur contractuel Police Technique et Scientifique France

 

Un poste d'ingénieur contractuel à la section audio de la police technique et scientifique est à pourvoir.
Pour plus d'informations, voici le lien

https://place-emploi-public.gouv.fr/offre-emploi/police-scientifique---ipts--adjoint-au-chef-de-la-section-audio-reference-2021-545548/

Back  Top

6-17(2021-03-08) Fully funded PhD at KTH, Stockholm, Sweden

A fully funded PhD position in Deep Learning for Conversational AI

KTH, Royal Institute of Technology, Stockholm, Sweden. Apply here (deadline 2/4)

https://www.kth.se/en/om/work-at-kth/lediga-jobb/what:job/jobID:379667/where:4/

Back  Top

6-18(2021-03-08) PhD and RA positions at University of Trento, Italy
PhD and RA Positions in Conversational AI in the Health Domain? at University of Trento, Italy
 
and add this link :
 
Back  Top

6-19(2021-03-08) Two PhD positions at NTNU, Trondheim, Norway.

Two  

Two PhD positions are open at NTNU Trondheim, Norway

 

https://www.jobbnorge.no/en/available-jobs/job/200820/2-phd-positions-in-machine-learning-for-speech-analysis-and-recognition

Back  Top

6-20(2021-03-09) Associate professor at Telecom Paris, France



 Telecom Paris is hiring an associate professor in machine learning for
distributed/multi-view machine listening and audio content analysis

See offer here:
https://adasp.telecom-paris.fr/news/job_offers/highlights/adasp_position_machine_listening_2021/ or read
on...


Institut Polytechnique de Paris [1] - Telecom Paris [2], LTCI lab [3], ADASP group [4]


-- Important Dates
? *March 20th 2021: closing date*
?    End of April 2021: hearings of preselected candidates


Applications are invited for a permanent (indefinite tenure) faculty position at the
Associate Professor level (Maitre de Conferences) in machine learning for
distributed/multi-view machine listening and audio content analysis.

-- Context

Telecom Paris [2] is a French public institution for engineering higher education (grande
ecole) and scientific research, founded in July 1878. It is a founding member of the
Institut Polytechnique de Paris [1], a world-class scientific and technological
institution. Located in Palaiseau, at the Plateau de Saclay (Paris outskirts), this
Institution is a partnership between Ecole Polytechnique, ENSTA Paris, ENSAE Paris,
Telecom Paris and Telecom SudParis, with HEC as a key partner. Students and faculty
benefit from close relationships between the different institutions.
The Information Processing and Communication Laboratory (LTCI) [3] is Telecom Paris?
in-house research laboratory. Since January 2017, it has continued the work previously
carried out by the CNRS joint research unit of the same name. The LTCI was created in
1982 and is known for its extensive coverage of topics in the field of information and
communication technologies. The LTCI?s core subject areas are computer science, networks,
data science, signal and image processing and digital communications. The laboratory is
also active in issues related to systems engineering and applied mathematics.
The open position will be hosted by Telecom Paris? Audio Data Analysis and Signal
Processing (ADASP) group [4], a subgroup of the statistics, signal processing and machine
learning (S²A) team, within the Images, Data & Signals (IDS) department [5].


-- Main missions

The hired associate professor will be expected to:

[Research activities]
?    Develop research in multi-view/distributed machine learning applied to machine
listening, in line with the topics of Telecom Paris? Audio Data Analysis and Signal
Processing (ADASP) group
?    Develop both academic and industrial collaborations, including collaborative
activities with other Telecom Paris research departments and teams, and research
contracts with industrial players
?    Submit proposals to national and international research project calls


[Teaching activities]
?    Participate in teaching activities at Telecom Paris and its partners (as part of
joint Master programs), especially in machine learning, signal processing, and machine
listening, including life-long training programs (e.g. the local Data Scientist
certificate)

[Impact]
?    Publish high quality research work in leading journals and conferences
?    Play an active role in the research communities relevant to the position (serving in
scientific committees and boards, organizing seminars, workshops, special sessions...)


-- Candidate profile

As a minimum requirement, the successful candidate will have:

?    A PhD degree
?    A track record of research and publication in one or more of the following areas:
machine learning, signal processing or machine listening
?    Experience in deep learning, audio data analysis, machine listening, music data
analysis, multi-view learning, distributed learning
?    Experience in teaching
?    Good command of English

The ideal candidate will also (optionally) have:
?    Knowledge in frugal learning techniques
?    Experience in source separation/enhancement and signal denoising techniques
?    Experience in distributed computing environments

Other skills expected include:
?    Capacity to work in a team and develop good relationships with colleagues and peers
?    Good communication and pedagogical skills

Note that you do *not* need to speak French to apply. 

-- Place of work

Palaiseau (Paris outskirts), France

-- How to apply

The application shall be submitted, through this link:
https://institutminestelecom.recruitee.com/o/maitre-de-conference-en-machine-listening,
as a single pdf file, including:

?    a letter of motivation
?    a complete and detailed curriculum vitae
?    statements of research and teaching interests (4 pages)
?    three main publications
?    contact information for two references, to be sent to Slim Essid

-- Contact

Slim Essid (Coordinator of the ADASP group), https://perso.telecom-paris.fr/~essid/

Back  Top

6-21(2021-03-16) PhD position at INRIA, Nancy, France
********** PhD position *************
 

Title: Robust and Generalizable Deep Learning-based Audio-visual Speech Enhancement

The PhD thesis will be jointly supervised by Mostafa Sadeghi (Inria Starting Faculty Position) and Romain Serizel (Associate Professor, Université de Lorraine).

 

Contacts: Mostafa Sadeghi (mostafa.sadeghi@inria.fr) and Romain Serizel (romain.serizel@loria.fr)

 

Context: Audio-visual speech enhancement (AVSE) refers to the task of improving the intelligibility and quality of a noisy speech utilizing the complementary information of visual modality (lips movements of the speaker) [1]. Visual modality can help distinguish target speech from background sounds especially in highly noisy environments. Recently, and due to the great success and progress of deep neural network (DNN) architectures, AVSE has been extensively revisited. Existing DNN-based AVSE methods are categorized into supervised and unsupervised approaches. In the former category, a DNN is trained to map noisy speech and the associated video frames of the speaker into a clean estimate of the target speech. The unsupervised methods [2] follow a traditional maximum likelihood-based approach combined with the expressive power of DNNs. Specifically, the prior distribution of clean speech is learned using deep generative models such as variational autoencoders (VAEs) and combined with a likelihood function based on, e.g., non-negative matrix factorization (NMF), to estimate the clean speech in a probabilistic way. As there is no training on noisy speech, this approach is unsupervised.

Supervised methods require deep networks, with millions of parameters, as well as a large audio-visual dataset with diverse enough noise instances to be robust against acoustic noise. There is also no systematic way to achieve robustness to visual noise, e.g., head movements, face occlusions, changing illumination conditions, etc. Unsupervised methods, on the other hand, show a better generalization performance and can achieve robustness to visual noise thanks to their probabilistic nature [3]. Nevertheless, their test phase involves a computationally demanding iterative process, hindering their practical use.

 

Objectives: Project description: In this PhD project, we are going to bridge the gap between supervised and unsupervised approaches, benefiting from both worlds. The central task of this project is to design and implement a unified AVSE framework having the following features: 1- Robustness to visual noise, 2- Good generalization to unseen noise environments, and 3- Computational efficiency at test time. To achieve the first objective, various techniques will be investigated, including probabilistic switching (gating) mechanisms [3], face frontalization [4], and data augmentation [5]. The main idea is to adaptively lower bound the performance by that of audio-only speech enhancement when the visual modality is not reliable. To accomplish the second objective, we will explore techniques such as acoustic scene classification combined with noise modeling inspired by unsupervised AVSE, in order to adaptively switch to different noise models during speech enhancement. Finally, concerning the third objective, lightweight inference methods, as well as efficient generative models, will be developed. We will work with the AVSpeech [6] and TCD-TIMIT [7] audio-visual speech corpora.

 

References:

[1] D. Michelsanti, Z. H. Tan, S. X. Zhang, Y. Xu, M. Yu, D. Yu, and J. Jensen, ?An overview of deep-learning based audio-visual speech enhancement and separation,? arXiv:2008.09586, 2020.

[2] M. Sadeghi, S. Leglaive, X. Alameda-Pineda, L. Girin, and R. Horaud, ?Audio-visual speech enhancement using conditional variational auto-encoders,? IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 28, pp. 1788 ?1800, 2020.

[3] M. Sadeghi and X. Alameda-Pineda, ?Switching variational autoencoders for noise-agnostic audio-visual speech enhancement,? in ICASSP, 2021.

[4] Z. Kang, M. Sadeghi, R. Horaud, ?Face Frontalization Based on Robustly Fitting a Deformable Shape Model to 3D  Landmarks,? arXiv:2010.13676, 2020.

[5] S. Cheng, P. Ma, G. Tzimiropoulos, S. Petridis, A. Bulat, J. Shen, M. Pantic, ?Towards Pose-invariant Lip Reading,?  in ICASSP, 2020.

[6] A. Ephrat, I. Mosseri, O. Lang, T. Dekel, K. Wilson, A. Hassidim, W.T. Freeman, M. Rubinstein, ?Looking to Listen  at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation,? SIGGRAPH 2018.

[7] N. Harte and E. Gillen, ?TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech,? IEEE Transactions on Multimedia, vol.17, no.5, pp.603-615, May 2015.

 

Skills:

  • Master's degree, or equivalent, in the field of speech/audio processing, computer vision, machine learning, or in a related field,
  • Ability to work independently as well as in a team,
  • Solid programming skills (Python, PyTorch),
  • A decent level of written and spoken English.

Benefits package:

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural, and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration:

Salary: 1982? gross/month for 1st and 2nd year. 2085? gross/month for 3rd year.

Monthly salary after taxes: around 1596,05? for 1st and 2nd year. 1678,99? for 3rd year. (medical insurance included).

Back  Top

6-22(2021-03-20) Post-doc at Nara Institute for Science and Technology, Japan

[Postdoctoral researcher, Nara Institute of Science and Technology]
Augmented Human Communication Laboratory directed by Professor Satoshi
Nakamura has a postdoctoral research position available in the area of human
information processing, dialogue systems, statistical modeling, machine
learning, and brain science for the Training Adapted Personalised Affective
Social Skills with Cultural Virtual Agents (ANR-CREST: JPMJCR19A).
Affiliation: Division of Information Science, Nara Institute of Science and
Technology, Japan.
Position: Postdoctoral researcher
Recruitment personnel: 1 person
Appointment time: May 1, 2021, or later as early as possible to March 31,
2022.
Term: Contract can be renewed every year. The longest employment is until
March 31, 2025.
Trial period: No trial period

Job description:
The research area is related to the Training Adapted Personalised Affective
Social Skills with Cultural Virtual Agents (ANR-CREST: JPMJCR19A), and
fields such as dialogue system using machine learning, and multimodal information
processing.
Application conditions:
- Doctoral researcher
A person with a Ph.D. who builds a dialogue system using statistical
methods, machine learning, and multimodal information processing. Those who have
the necessary knowledge and experience regarding the area and are willing to
conduct research independently.

Salary: Determined based on the university regulations
Benefits: Join health insurance, pension insurance, accident compensation
insurance, and employment insurance

Workplace:
Augmented Human Communication Laboratory, Nara Institute of Science and
Technology.
Employment period: May 1, 2021, or later as early as possible to March 31,
2022. Contract can be renewed every year. The longest employment is until
March 31, 2025.

Work style:
? Working days: Monday to Friday
? Holidays: Saturdays, Sundays, national holidays, summer holidays,
year-end and new year holidays, and the anniversary of the university
foundation (October 1)
? Working hours: Discretionary work system

Deadline: April 9, 2021 (Friday)


[Application method]
Documents to be submitted:
(1) Resume (using the university's format: see the URL below)
https://www.naist.jp/en/about_naist/job_opportunities/resume_format.html
(2) List of research achievements
(3) Research motivation (A4 one page)
(4) Three major publications
(5) Letters from two references with their address, telephone number and
email address included.
Where to submit application documents:
After stating 'Postdoctoral researcher application' in the title, Please
submit by e-mail to the following contact address.
E-mail: tapas-positions@is.naist.jp

[Selection details (selection method, decision on acceptance / rejection),
result notification]
(1) 1st selection: document screening
(2) 2nd selection: online interview
After screening the documents, we will contact you for an interview.
* Application documents will be used only for the purpose of recruitment
screening and will not be used for any other purpose. The application
documents will not be returned regardless of the result of acceptance or
rejection. In consideration of the risk of personal information leakage due
to misdelivery, application documents for non-employees will be responsibly
deleted at the end of recruitment activities.

Contact:
?630-0192
8916-5 Takayama-Cho, Ikoma, Nara, Japan
Professor Satoshi Nakamura, Augmented Human Communication Laboratory, Nara
Institute of Science and Technology
E-mail:  tapas-positions@is.naist.jp
############################

Back  Top

6-23(2021-04-05) Researchers in Speech, Text and Multimodal Machine Translation @ DFKI Saarbrücken, Germany

Researchers in Speech, Text and Multimodal Machine Translation @ DFKI Saarbrücken, Germany

--------------------------------------------------------------

The MT group at MLT@DFKI Saarbrücken is looking for

     senior researchers/researchers/junior researchers

in speech, text and multimodal machine translation using deep learning.

3 year contracts. Possibility of extension. Ideal starting dates around June/July 2021.

Key responsibilities:
- Research and development in speech, text and multimodal MT
- Scientific publications
- Co-supervision of BSc/MSc students and research assistants
- Possibility of teaching at Saarland University (UdS)
- Senior: PhD co-supervision
- Senior: Project/grant acquisition and management

Qualifications senior researchers/researchers:
- PhD in NLP/Speech/MT/ML/CS or related
- strong scientific and publication track record in speech/text/multimodal-NLP/MT

Qualifications junior researchers:
- MSc in CS/NLP/Speech/ML/MT or related (possibility to do a PhD at
DFKI/UdS)

All:
- Strong background in machine learning and deep learning
- Strong problem solving and programming skills
- Strong communication skills in written and spoken English (German an asset, but not a requirement)

Working environment: the post are in the ?Multilinguality and Language Technology? MLT Lab at DFKI (the German Research Center for Artificial Intelligence https://www.dfki.de/en/web/) in Saarbrücken, Germany. MLT is led by Prof. Josef van Genabith. MLT is a highly international team and does basic and applied research.

Application: a short cover letter indicating which level (senior / researcher / junior) you apply for, a CV, a brief summary of research interests, and contact information for three references. Please submit your application by Friday April 23rd, 2021 as PDF to Prof. Josef van Genabith (josef.van_genabith@dfki.de) indicating your earliest possible start date. Positions remain open until filled.

Selected MT@MLT group publications 2020/21: Xu et al. Probing Word Translation in the Transformer and Trading Decoder for Encoder Layers.
NAACL-HLT 2021. Chowdhury et al. Understanding Translationese in Multi-View Embedding Spaces. COLING 2020. Pal et al. The Transference Architecture for Automatic Post-Editing. COLING 2020. Ruiter et al.
Self-Induced Curriculum Learning in Self-Supervised Neural Machine Translation. EMNLP-2020. Zhang et al. Translation Quality Estimation by Jointly Learning to Score and Rank. EMNLP 2020. Xu et al. Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change. ACL 2020. Xu et al. Learning Source Phrase Representations for Neural Machine Translation. ACL 2020. Xu et al. Lipschitz Constrained Parameter Initialization for Deep Transformers. ACL 2020. Herbig et al.
MMPE: A Multi-Modal Interface for Post-Editing Machine Translation. ACL 2020. Herbig et al. MMPE: A Multi-Modal Interface using Handwriting, Touch Reordering and Speech Commands for Post-Editing Machine Translation. ACL 2020. Alabi et al. Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of Yorùbá and Twi. LREC 2020.
Costa-jussàet al. Multilingual and Interlingual Semantic Representations for Natural Language Processing: A Brief Introduction. In: Computational Linguistics (CL) Special Issue: Multilingual and Interlingual Semantic Representations for Natural Language Processing. Xu et al. Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating. IJCAI 2020

DFKI is one of the leading AI centers worldwide, with several sites in Germany. DFKI Saarbrücken is part of the Saarland University (UdS) Informatics Campus. UdS has exceptionally strong CS and CL schools and, in addition to DFKI, a Max Plank Institute for Informatics, a Max Plank Institute for Software Systems, the Center for Bioinformatics, and the CISPA Helmholz Center for Information Security.

Geographic environment: Saarbrücken (http://www.saarbruecken.de/en) is the capital of Saarland, one of the Federal States in Germany, located right in the heart of Europe and a cultural center in the border region of Germany, France and Luxembourg. Frankfurt and Paris are less than 2 hours by train. Living cost is moderate in comparison with other cities in Germany and Europe.


Back  Top

6-24(2021-04-02) PhD at Université d'Avignon, France

**** If you don't read French and are interested in a PhD position in AI/NLP please
contact us directly for further information. French speaking is not required for the
position. ****

 Les réponses doivent nous parvenir de préférence **avant le 10 mai**.

PROPOSITION SUJETS DE THESES

CONTRATS DOCTORAUX 2021-2024

Appel cible (merci de cocher la case correspondante):

X Contrat doctoral ministeriel ED 536

□ Contrat doctoral ministeriel ED 537

------------------------------------------------------------------------------------------------------------------------

Directeur de these : Fabrice LEFEVRE

Co-directeur eventuel :

Co-encadrant eventuel : Bassam JABAIAN

Titre en francais : Transformer et renforcer pour le transfert et l’apprentissage en ligne des

agents conversationnels vocaux

Titre en anglais : Transformer and Reinforce for transfer and online learning of vocal

conversational agents

Mots-cles : IA, natural language processing , human-machine vocal interactions, deep learning,

deep reinforcement learning, transfer learning

Co tutelle : XXX - Non Pays :

Opportunites de mobilite a l’international du doctorant dans le cadre de sa these : oui

Profil du candidat :

Le candidat doit avoir un master en informatique avec une composante sur les méthodes

d'apprentissage automatique et/ou sur l’ingénierie de la langue. La bourse de thèse fera l’objet

d’un concours au sein de l’Ecole Doctorale 536 de l’université d’Avignon, avec une audition du

candidat retenu par les encadrants de thèse.

Pour postuler merci d’envoyer un mail avant le 10 mai 2021 à Fabrice Lefèvre

(fabrice.lefevre@univ-avignon.fr) et Bassam Jabaian (bassam.jabaian@univ-avignon.fr)

incluant : votre CV, une lettre de motivation avec votre positionnement sur les propositions

d’études ci-dessous, d’éventuelles lettres de recommandation et vos relevés de notes.

Presentation detaillee du sujet :

Domaine / Thematique : IA/NLP

Objectif : Permettre le transfert et l'apprentissage en ligne des agents conversationnels vocaux

avec une combinaison Transformers/Renforcement

Contexte et enjeux : Parmi les activités de recherche en intelligence artificielle, améliorer

l'interaction vocale avec les machines reste un défi majeur d’actualité. Le LIA traite de

multiples aspects liés à l’interaction vocale mais cherche à travers cette thèse à approfondir en

particulier la recherche sur les techniques d’apprentissage des agents conversationnels vocaux

à base de réseaux de neurones profonds supervisés et renforcés. De tels agents dialoguant

sont un enjeu primordial afin d’améliorer les capacités de nos sociétés à gérer une

distanciation sociale contrôlée, notamment par la délégation de certaines tâches risquées à

des artefacts matériels efficients, et bien acceptés par le grand public.

Les récentes évolutions en réseaux de neurones ont permis d’élaborer des systèmes de

génération de texte (ou modèles de langage) de grande qualité. Ils sont pour cela appris sur

des quantités gigantesques de documents, mais permettent en contrepartie une couverture

très large du langage humain. Les représentants les plus avancés dans ce domaine sont les

Transformers, qui permettent d’éliminer le recours à la récurrence dans les réseaux (couteux

en calcul) en privilégiant un mécanisme d’attention démultipliée (multi-head self-attention).

De nombreux dérivés de ces modèles existent et ont permis des gains conséquents en

performance sur de nombreuses tâches impliquant la génération de texte en langage naturel.

Ainsi BERT [1] et GPT forment les grandes familles (et leurs multiples descendants distilBERT,

alBERT, GPT-2…). Mais si de tels modèles permettent de porter à un plus haut niveau de

performance nos capacités de modélisation du langage, il reste encore à savoir les mettre en

oeuvre pour des tâches plus spécifiques ou exigeantes, comme les systèmes d’interaction

orale.

Ainsi le problème de leur application au cas des agents conversationnels reste ouvert car à la

fois l’interaction directe avec les humains accentue l’impact des erreurs et imperfections des

modèles et d’autre part la gestion des interactions se fait dans un contexte finalisé, où

l’objectif n’est pas le simple échange de données langagières mais la réussite d’un objectif

latent (obtenir une information précise, réaliser ou faire réaliser une action…). Aussi le

challenge principal que nous souhaitons porter dans la thèse est de permettre une adaptation

sur une tache particuliere des capacites d’un Transformer pre-entraine, notamment pour

l’elaboration d’un agent conversationnel. Des approches par transfert d’apprentissage ont

déjà été initiées mais leurs résultats sont contrastés et doivent être renforcés [2]. Nous

identifions deux axes majeurs pour la thèse :

Axe 1/ Transfert et apprentissage en ligne / Tout d’abord les approches de transfert reposent

toujours sur le recours à de nouvelles données pré-collectées auxquelles sont confrontés les

modèles [2]. Ainsi, dans la continuité de nos précédents travaux sur l’apprentissage en ligne

des systèmes de dialogue, nous souhaiterions élaborer et évaluer des strategies efficaces pour

permettre le recours a des apprentissages par renforcement [3, 4]. Pour rendre les systèmes

artificiels capables d'apprendre à partir des données, deux hypothèses fortes sont

généralement faites : (1) la stationnarité du système (l'environnement de la machine ne

change pas avec le temps), (2) l'interdépendance entre la collecte des données et le processus

d'apprentissage (l'utilisateur ne modifie pas son comportement dans le temps). Or les

utilisateurs ont une tendance naturelle à adapter leur comportement en fonction des réactions

de la machine, ce qui gêne la convergence de l'apprentissage vers un équilibre lui permettant

de satisfaire en permanence les attentes de l'utilisateur. Aussi les interfaces vocales doivent

évoluer vers une nouvelle génération de systèmes interactifs, capables d'apprendre

dynamiquement sur le long terme à partir d'interactions, tout en anticipant les variations du

comportement des humains, étant eux-mêmes vu comme des systèmes évolutifs.

L’enjeu est alors, dans le contexte de l’apprentissage par renforcement profond [5] de pouvoir

démontrer l’optimalité de la convergence des algorithmes utilisés pour mettre à jour les poids

de certaines couches du modèle au fur et à mesure des interactions avec des utilisateurs, sans

prendre le risque d’une atténuation des performances initiales. La détermination optimale des

paramètres à modifier doit pouvoir être automatisée. Ce projet s’inscrit aussi dans le cadre de

l’apprentissage en continu (continual learning) [6] d’un agent conversationnel.

Axe 2/ Modelisation de l’oral / Ensuite l’essentiel des modèles pré-cités modélisent

exclusivement le langage écrit et intègrent peu de mécanismes dédiés à la nature du langage

parlé. Aussi nous souhaiterions augmenter les capacités de telles machines à faire face à : 1)

des entrées utilisateurs plus naturelles, et comprenant donc de nombreux écarts vis-à-vis de

l’écrit (agrammaticalité, confusions, reprises, corrections, hésitations…) et 2) des erreurs dans

les transcriptions dues au composant de reconnaissance de la parole. Il est donc nécessaire de

pouvoir interfacer le composant d’analyse de la parole avec la chaine de modelisation du

langage qui suit (analyse sémantique, suivi de l’état de dialogue, gestion du dialogue,

génération et synthèse de parole) de manière à prendre en compte les multiples hypotheses

realistes (et non plus seulement la meilleure). Et enfin permettre un arbitrage entre ces

hypothèses qui prenne en compte les traitements suivants, en conformité avec le processus

cognitif humain équivalent (capable de re-traiter ses hypothèses acoustiques les plus

probables en cas de conflit avec ses inférences sémantiques).

Cette étude pourra être menée dans plusieurs cadres applicatifs, à préciser au démarrage de la

thèse : par exemple un robot Pepper dialoguant affecté à la gestion de l’accueil d’un lieu public

(par exemple dans un hôpital ou un musée). Il sera alors possible de déléguer des tâches de

premier contact et d’orientation à des artefacts insensibles aux transmissions biologiques, ce

qui constitue un atout hautement stratégique afin d’améliorer la gestion d’une situation de

crise, du type de la pandémie mondiale de coronavirus en cours.

[1] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional

Transformers for Language Understanding,” arXiv.org, Oct. 2018.

[2] T. Wolf, V. Sanh, J. Chaumond, and C. Delangue, “TransferTransfo: A Transfer Learning

Approach for Neural Network Based Conversational Agents,” arXiv.org, Jan. 2019.

[3] E. Ferreira, B. Jabaian, and F. Lefèvre, “Online adaptative zero-shot learning spoken

language understanding using word-embedding,” in Proceedings of 2015 IEEE International

Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, 2015, pp. 5321–5325.

[4] M. Riou, B. Jabaian, S. Huet, and F. Lefèvre, “Joint On-line Learning of a Zero-shot Spoken

Semantic Parser and a Reinforcement Learning Dialogue Manager,” in IEEE International

Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United

Kingdom, May 12-17, 2019, 2019, pp. 3072–3076.

[5] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “A Brief Survey of Deep

Reinforcement Learning,” IEEE SIGNAL Process. Mag. Spec. ISSUE Deep Learn. IMAGE Underst.,

Aug. 2017.

[6] Z. Chen and B. Liu, Lifelong Machine Learning, Second Edition, vol. 12, no. 3. Morgan &

Claypool Publishers, 2018.

Les sujets devront être adressés à

secretariat-ed@univ-avignon.fr



Back  Top

6-25(2021-04-15) Director, Center for Language and Speech Processing, Baltimore, MA, USA

POSITION: Director, Center for Language and Speech Processing

REPORTS TO: Ed Schlesinger, Benjamin T. Rome Dean Johns Hopkins University, Whiting School of Engineering

INSTITUTION: Johns Hopkins University, Baltimore, MD https://engineering.jhu.edu/

                                                               2.23.21

The Whiting School of Engineering at Johns Hopkins University invites nominations and applications for the position of Director of the Center for Language and Speech Processing (CLSP). The Director will be appointed as a full-time tenured faculty member in the Whiting School of Engineering and will be encouraged to remain active in research, with strategic leadership of the Center as their top priority. This is an outstanding opportunity for an accomplished scholar with leadership experience to further strengthen an exceptional interdisciplinary research center at the nation’s first research university. The best candidates will embody the intellectual distinction, entrepreneurial capacity, collaborative spirit, transparency, inclusiveness, and creativity that characterize the School’s culture and will bring a scholarly record deserving appointment as tenured professor at The Johns Hopkins University.

The Center for Language and Speech Processing

CLSP is one of the Whiting School’s 25 Interdisciplinary Centers and Institutes. The Center currently comprises over 25 tenure-line and research faculty whose primary appointments are in the Whiting School of Engineering or in other closely related schools, along with over 70 PhD students. CLSP was established in 1992 and grew to prominence under the directorship of the late Frederick Jelinek. It aims to understand how human language is used to communicate ideas, and to develop technology for machine analysis, translation, and transformation of multilingual speech and text. In 2007 CLSP gained a sibling, the national Human Language Technology Center of Excellence (https://hltcoe.jhu.edu), a governmentfunded research center at Johns Hopkins that develops critical speech and language technology for government use; several HLTCOE researchers are tightly integrated into CLSP. Recently, CLSP has further expanded its research portfolio by adding several prominent researchers in computer vision and related fields. As part of its educational mission, CLSP coordinates a full complement of courses dealing with a diverse array of topics in language and speech. It offers a weekly seminar featuring prominent visiting speakers in speech and language processing. It also runs the Fred Jelinek Memorial Workshop in Speech and Language Technology (JSALT), a widely-known residential research workshop that annually assembles teams of researchers from around the world to spend 6 summer weeks conducting intensive research on fundamental problems. Held annually since 1995, the workshop has produced many important advances in speech and language technology.

Opportunities for the Center Director

The CLSP Director will work with colleagues in and beyond CLSP to increase its impact by both enhancing its historic strengths and positioning it as a central element of a set of AI-related initiatives across the Whiting School and the University more broadly. To these ends, the Director will identify ways in which the Center will continue to grow and evolve and through which the Center, the Whiting School, and Hopkins can recruit, sustain, and deploy the human and financial resources needed to further distinguish itself.The Director will work to maintain the Center’s position as the disciplinary and intellectual hub of language and speech processing research within the University, enabling CLSP to contribute to and benefit from the success of significant institutional investment in artificial intelligence and machine learning more broadly, including potential applications to key societal problems such as healthcare and scientific endeavors such as linguistics and neuroscience. Collaborations with the Applied Physics Lab (www.jhuapl.edu) present opportunities to bring additional resource, expertise, and scale to advance CLSP research including potentially in classified research. Beyond Hopkins, CLSP’s Director will foster connections with industry as part of the Center’s efforts to expand its base of resources and relationships, to disseminate knowledge and discoveries, and to develop and transfer technologies that may have an impact in the world. In these various external activities, the Director will work with the University’s technology ventures office (https://ventures.jhu.edu), with faculty and students, and with alumni and donors. Specific strategies for enhancing CLSP’s strengths, broadening its impact, and positioning it relative to Hopkins-wide initiatives, along with measures of success and the prioritization of activities designed to achieve success, will be developed by the Director in collaboration with CLSP’s faculty and the Dean.

Diversity, equity, and inclusion at the Whiting School

WSE has a stated commitment to diversity, equity, and inclusion: “Diversity and inclusion enrich our entire community and are critical to both educational excellence and to the advancement of knowledge. Discovery, creativity, and innovation flourish in an environment where the broadest range of experiences are shared, where all voices are heard and are valued, and where individuals from different cultures and backgrounds can collaborate freely to understand and solve problems in entirely new ways.” As the leader of the Center and within the School, CLSP’s Director will work to enhance and expand diversity and inclusion at all levels and will ensure that the Center is a welcoming and supportive environment for all.

Position Qualifications

The new Director will be a proven, entrepreneurial leader who can bring faculty, staff, and students together to pursue a compelling vision of CLSP as an international hub for Language and Speech Processing research and as a site of innovation, teaching, and translation. They will have strong skills for mentoring junior faculty and will promote the interests of the Center. Intellectual curiosity and fundraising experience are valued. They will have a dossier that represent a distinguished track record of scholarship and teaching; a passionate commitment to research, discovery, and application; and an interest in and success at academic administration. Expected educational background and qualifications include:

• An earned doctorate in an area such as electrical and computer engineering, computer science, or a closely related field and a scholarly record deserving appointment as tenured professor at The Johns Hopkins University;

• Recognized leadership in their respective field with a distinguished national and international reputation for research and education;

• Excellent communication skills in both internal and external interactions;

• Strong commitment to diversity and inclusion at all levels among faculty, students, and staff, along with measurable and sustained impact on the diversity and inclusiveness of organizations they have led or been part of; and

• Leadership and administrative experience within a complex research environment or in national/international organizations connected to their respective field.

 

                                                                     *

The Whiting School of Engineering has engaged Opus Partners (www.opuspartners.net) to support the recruitment of the CLSP Director. Craig Smith, Partner, and Jeff Stafford, Senior Associate, are leading the search. Applicants should submit their CV and a letter of interest outlining their research and leadership experience to Jeffrey.stafford@opuspartners.net. Nominations, expressions of interest, and inquiries should go to the same address. Review of credentials will begin promptly and will continue until the appointment is finalized. Every effort will be made to ensure candidate confidentiality. The Whiting School of Engineering and CLSP are committed to building a diverse educational environment, and women and minorities are strongly encouraged to apply. Johns Hopkins University is an equal opportunity employer and does not discriminate on the basis of gender, marital status, pregnancy, race, color, ethnicity, national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, other legally protected characteristics or any other occupationally irrelevant criteria. The University promotes Affirmative Action for minorities, women, individuals who are disabled, and veterans. Johns Hopkins University is a drug-free, smoke-free workplace.

 

Back  Top

6-26(2021-04-11) These CIFRE: Système dialogique de questions-réponses contrôlé : application aux forums sur la santé des femmes, LIG, Univ. Grenoble, France

Offre de thèse CIFRE: Système dialogique de questions-réponses contrôlé : application aux forums sur la santé des femmes

Laboratoire d'Informatique de Grenoble / Université Grenoble Alpes (http://lig-getalp.imag.fr/), Grenoble

Société Shesmet (https://www.shesmet.com), Paris

L?objectif de cette thèse de doctorat est de concevoir des méthodes permettant à un système de dialogue de répondre précisément à une question concernant la santé intime des femmes. En effet, la santé génésique et sexuelle des femmes est un sujet encore trop peu abordé dans son ensemble et trop souvent résumé à la santé reproductive. Pourtant les femmes ont physiologiquement plusieurs étapes de vie qui vont impacter de manière plus ou moins forte leur bien-être mental et physique : la puberté, la maternité, la ménopause et l?après ménopause. La santé sexuelle des femmes est aussi un enjeu de politique publique qui a évolué au cours des ans et qui reste au c?ur des problématiques de notre société : précarité menstruelle, contraception, accès à l?IVG, violences sexuelles. L?accès à une information de qualité, personnalisée et en tout anonymat est un fort vecteur d?autonomisation et d?égalité de soins pour l?ensemble de la population féminine. Pourtant, aujourd'hui les femmes voulant se renseigner sur ces thèmes sont souvent en prise avec un flot d'informations qui peuvent être discordantes, incomplètes et de sources non vérifiables (p.ex., les forum de santé alimentés par les utilisateurs). C'est pourquoi Shesmet et le laboratoire d'informatique de Grenoble (LIG) s'associent pour proposer une méthode dialogique de question réponse qui permette d'adapter une réponse experte et vérifiée au contexte particulier d'une question de santé exprimée par une utilisatrice. Cette approche est originale dans le sens ou elle tire partie du meilleur des capacités humaines (réponses pertinente et sans erreur) et computationnelles (capacité des modèles profonds à traiter des données à grande échelle).

Objectif de la thèse

Au cours de la dernière décennie, les systèmes traitement automatique du langage naturel ont fait de grands progrès grâce à l'émergence de l'apprentissage profond. La technique est aujourd'hui suffisamment mature pour être intégrée dans les assistants personnels  [Chen et Gao, 2017] et les systèmes de Question/Réponse. L'architecture actuelle des réseaux neuronaux comprend les RNN (LSTM/GRU) [Hochreiter et Schmidhuber, 1997 ; Cho et al., 2014] et les transformer [Vaswani et al., 2017], en combinaison avec les mécanismes d'attention [Bahdanau et al., 2014] pour permettre l'utilisation d'informations contextuelles allant au-delà d'un seul ou de quelques tours de dialogue [Bothe et al., 2018]. Cependant, ces corpus sont entraînés sur des masses de données tellement grandes et peu contrôlées que les modèles ont tendance à reproduire les comportements de ces données. Par exemple, les grands corpus de journaux généralistes font généralement la part belle au genre masculin. De même les systèmes de question/réponse sont généralement limités à trouver un extrait dans un grand corpus ou à générer une réponse à partir d'un modèle profond. Contrairement à ces systèmes de question réponses classiques, l'objectif sera ici de utiliser l'expertise de spécialistes en santé pour adapter une réponse au contexte de la question [Wu2019]. Ainsi, les experts humains conçoivent des réponses de grande qualité et vérifiées tandis que les systèmes profonds les adaptent aux plus grands nombre en évitant les erreurs usuelles des modèles profonds.

La tâche est donc de concevoir un système capable :

1. de classifier les énoncés du dialogue et les associés à un ensemble de réponses pré-établies ;

2. d?éditer les réponses pré-établie afin de les adapter à la question et au contexte dialogique ;

3. d'estimer le degré de réassurance nécessaire à insérer dans la réponse ;

4. d'expliquer les réponses données.

Dans le cadre de ce programme indicatif de travail, ce doctorat intéressera aux verrous suivants.

  • Des domaines peu doté : Il existe peu de corpus accessible hormis les données disponibles au sein de l'entreprise. Une piste de recherche sera d'utiliser les modèles pré-entrainés du LIG sur le français (dont Flaubert, [Le2020], modèle Bert pour le français que l'équipe GETALP a largement contribué a développer) qui est disponible via la bibliothèque Transformer de Hugging Face qui sera transférée [Wolf et al 2019] à la nouvelle tâche de conversation.

  • Des biais de modèles. En effet, le sujet se prête à l?analyse d?un enjeu sociétal propre au développement des TAL : la prise en compte des biais de genre face à une population cible principalement féminines. Le LIG a développé une expertise sur ce problème tant du point de vue des modèles textuels qu'oraux [Garnerin2020].

  • La contextualisation en dialogue. Dans un forum, l'interaction ne peut être assumée comme étant dyadique (plus de deux personnes) comme dans le dialogue classique (1 personne + un système) dialogue. Comment prendre en compte la contribution de plusieurs intervention pour personnaliser la réponse à faire à une seule personne reste un problème ouvert.

  • Explicabilité. Afin de garantir la transparence du système et de permettre aux utilisatrices d?interpréter les réponses fournies, le système doit être en mesure d'expliquer pourquoi une réponse précise à été donnée. Une technique est de fournir les éléments du dialogue qui sur lesquels la réponse a été sélectionnée et adaptée [Atanasova2020] mais d'autres méthodes pourront être explorées.

Environnement scientifique

La thèse sera menée au sein de l'équipe Getalp du laboratoire LIG (https://lig-getalp.imag.fr/). La personne recrutée sera accueillie au sein de l?équipe qui offre un cadre de travail stimulant, multinational et agréable. Par ailleurs, la personne recrutée passera un temps significatif au sein de l'entreprise Shesmet. Shesmet est une startup en e-santé travaillant à la fois sur des projets de recherche et développement et sur des missions d?accompagnement autour de l?innovation en santé auprès d?institutionnels en santé, publics et privés. La société a lancé en 2020 My S Life, une plateforme d'information en santé intime et sexuelle de la femme (www. myslife.co)

Les moyens pour mener à bien le doctorat seront assurés tant en ce qui concerne les missions en France et à l?étranger qu?en ce qui concerne le matériel (ordinateur personnel, accès aux serveurs GPU du LIG, Grille de calcul Jean Zay du CNRS).

Comment postuler ?

Les candidats doivent être titulaires d'un Master en informatique ou en traitement automatique du langage naturel (ou être sur le point d'en obtenir un). Ils doivent avoir une bonne connaissance des méthodes d?apprentissage automatique et idéalement une expérience en collecte et gestion de corpus. Ils doivent également avoir une bonne connaissance de la langue française. Une expérience dans le domaine du dialogue, des systèmes question réponse ou la génération automatique de textes serait un plus.

Les candidatures sont attendues jusqu'au 3 mai 2021. Elles doivent contenir : CV + lettre/message de motivation + notes master + lettre(s) de recommandations; et être adressées à François Portet (Francois.Portet@imag.fr), Didier Schwab (Didier.Schwab@imag.fr) et Juliette Mauro (juliette.mauro@shesmet.com).

References

[Atanasova2020] P Atanasova, JG Simonsen, C Lioma, I Augenstein A Diagnostic Study of Explainability Techniques for Text Classification. Proceedings of EMNLP 2020

[Bahdanau2014] D Bahdanau, K Cho, Y Bengio. 'Neural machine translation by jointly learning to align and translate', arXiv preprint arXiv:1409.0473, 2014

[Bothe2018] Chandrakant Bothe, Cornelius Weber, Sven Magg, Stefan Wermter 'A Context-based Approach for Dialogue Act Recognition using Simple Recurrent Neural Networks', LREC 2018.

[Chen2017] Yun-Nung Chen, Jianfeng Gao, Open-Domain Neural Dialogue Systems, IJCNLP 2017

[Cho2014] Cho K., van Merrienboer B., Gülçehre Ç., Bougares F., Schwenk H., Bengio Y., « LearningPhrase Representations using RNN Encoder-Decoder for Statistical Machine Translation », CoRR, 2014.

[Garnerin2020] Mahault Garnerin, Solange Rossato, Laurent Besacier: Gender Representation in Open Source Speech Resources. LREC 2020: 6599-6605

[Hochreiter1997] Hochreiter S., Schmidhuber J., « Long Short-Term Memory »,Neural Comput., vol. 9, no8,p. 1735-1780, November, 1997

[Le2020] Le, Hang and Vial, Loic and Frej, Jibril and Segonne, Vincent and Coavoux, Maximin and Lecouteux, Benjamin and Allauzen, Alexandre and Crabbé, Benoit and Besacier, Laurent and Schwab, Didier (2020) FlauBERT: Unsupervised Language Model Pre-training for French, Proceedings of The 12th Language Resources and Evaluation Conference, Marseille, France, 2479--2490. https://github.com/getalp/Flaubert

[ParlAI] https://parl.ai/docs/tutorial_basic.html

[Vaswani2017] A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, et al. 'Attention is all you need', 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.

[Wolf2019] Thomas Wolf and Victor Sanh and Julien Chaumond and Clement Delangue (2019) TransferTransfo: {A} Transfer Learning Approach for Neural Network Based Conversational Agents, arxiv, 2019 https://github.com/huggingface/transfer-learning-conv-ai

[Wu2019] Wu, Y., Wei, F., Huang, S., Wang, Y., Li, Z., & Zhou, M. (2019, July). Response generation by context-aware prototype editing. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 7281-7288).

Back  Top

6-27(2021-04-09) Ingénieur.e développement - Inria Bordeaux, France

Ingénieur.e développement - Inria Bordeaux Sud-Ouest

Thématique : Conception d’une architecture logicielle pour une application en

apprentissage statistique (analyse et classification des voix pathologiques)

Type de contrat : CDD

Début : à partir du 1er juin 2021 et jusqu’au 31 juillet 2021 (possibilité de prolongation)

Date limite de candidature : 15 mai 2021

Lieu : Inria Bordeaux Sud-Ouest

Niveau de diplôme exigé : Bac + 5 ou équivalent

Autre diplôme apprécié : thèse de doctorat

Fonction : Ingénieur scientifique contractuel

Niveau d'expérience souhaité : 3 à 12 ans

Salaire brut mensuel : 2632€ à 3543€, selon diplômes et expérience professionnelle acquise sur poste similaire

Responsable : Khalid Daoudi

Contexte et atouts du poste

Inria, institut national de recherche dédié au numérique, promeut l’excellence scientifique au service du

transfert technologique et de la société.

Inria emploie 2700 collaborateurs issus des meilleures universités mondiales, qui relèvent les défis des sciences

informatiques et mathématiques. Son modèle agile lui permet d’explorer des voies originales avec ses partenaires

industriels et académiques, et de répondre aux enjeux pluridisciplinaires et applicatifs de la transition numérique.

Engagé auprès des acteurs de l’innovation, Inria crée les conditions de rencontres profitables entre recherche

publique, R&D privée et entreprises. Inria transfère vers les startup, les PME et les grands groupes ses résultats et ses

compétences, dans des domaines tels que la santé, les transports, l’énergie, la communication, la sécurité et la protection

de la vie privée, la ville intelligente, l’usine du futur... Inria développe aussi une culture entrepreneuriale ayant conduit à

la création de 120 startup.

Le centre Inria Bordeaux Sud-Ouest est un des neuf centres d’Inria et compte 20 équipes de recherche. Le

centre Inria est un acteur majeur et reconnu dans le domaine des sciences numériques. Il est au coeur d’un riche

écosystème de R&D et d’innovation : PME fortement innovantes, grands groupes industriels, pôles de compétitivité,

acteurs de la recherche et de l’enseignement supérieur.

GEOSTAT est une équipe de recherche Inria dont la thématique de recherche est le traitement de signaux

naturels complexes, notamment en biophysique (geostat.bordeaux.inria.fr/).

Mission confiée

Plusieurs maladies et pathologies peuvent causer des dysfonctionnements ou des altérations dans la production

de la parole. Les plus connues sont les maladies neurodégénératives (telles que les maladies de Parkinson et

d’Alzheimer) et les maladies respiratoires (telles que l’asthme, la BPCO ou la Covid-19). On parle alors de troubles de

la parole ou de parole pathologique.

Il est maintenant établi que certaines de ces maladies se caractérisent par une manifestation précoce des

troubles de la parole. Le développement de biomarqueurs objectifs vocaux est devenu ainsi un enjeu majeur pour l’aide

au diagnostic et suivi de ces maladies. La mission de l’ingénieur(e) recruté(e) s’inscrit dans ce cadre.

L’objectif de la mission est de concevoir une architecture logicielle, en Python, pour :

1 développer une boîte à outils générique de traitement du signal dédiée à l’analyse de la parole pathologique ;

2 implémenter un biomarqueur vocal de la fonction respiratoire en utilisant des techniques d’apprentissage

statistique, dont le Deep Learning.

Cette dernière tâche s’inscrit dans le cadre d’un projet de recherche clinique en partenariat avec l’AP-HP

(Assistance Publique - Hôpitaux de Paris), notamment le service de pneumologie et de réanimation de L'hôpital La

Pitié-Salpêtrière. Le but de ce projet est le développement d’un biomarqueur vocal de l’état respiratoire et de son

évolution pour l’aide au télé-suivi de patients atteints d’une affection respiratoire, dont la Covid-19.

Principales activités

Pour des raisons de sécurité et de confidentialité, les données vocales et cliniques des patients sont hébergées

sur les serveurs EDS (Entrepôt de Données de Santé) de l’AP-HP.

La première tâche sera ainsi de développer une API permettant la communication avec l’infrastructure

d’hébergement.

La deuxième tâche sera d’implémenter des techniques éprouvées d’analyse de la parole pathologies puis

d’autres issus de recherches récentes. Cette tâche s’appuiera, le cas échéant, sur Parselmouth

(parselmouth.readthedocs.io/en/stable/) qui est une librairie Python pour Praat (www.fon.hum.uva.nl/praat/).

La troisième étape consistera à implémenter et expérimenter des techniques d’apprentissage statistique en

utilisant les données de patients. Cette tâche s’appuiera sur les framework habituels de Machine Learning (TensorFlow,

PyTorch, Scikitlearn).

Encadrement

L’ingénieur.e disposera d’un encadrement scientifique, par Khalid Daoudi de l’équipe GEOSTAT, et technique

par Dan Dutartre et François Rué du Service d'Expérimentation et de Développement (SED) d’Inria-Bordeaux.

Compétences

Être titulaire d’un diplôme d’ingénieur et/ou doctorat en sciences du numérique

Disposer d’une expérience significative dans le développement ou le pilotage d’un projet logiciel en python.

. Disposer d’une formation solide en apprentissage statistique (Machine Learning) ainsi que d’une expérience notable

dans ce domaine ;

. Disposer d’une expertise solide en développement logiciel pour être en capacité de s’adapter à différents types

langages des plus standards (Python, C, C++) ; une forte compétence en python est requise ;

. Des connaissances en traitement du signal seraient un plus très apprécié ;

. Maîtriser les concepts, la méthodologie et les outils de la qualité logicielle ;

. Maîtriser les méthodologies de gestion de projet logiciel collaboratif ;

. Maîtriser les méthodologies d’architectures logicielles modulaires ;

. Excellent relationnel ;

. Savoir travailler en équipe pluridisciplinaires ;

. Savoir s’adapter au contexte projet ;

. Être autonome dans son organisation personnelle et le reporting ;

. Avoir une bonne communication écrite et orale en français ;

. Maîtriser l’anglais technique et scientifique.

Candidature

Le(a) candidat(e) est invité(e) à envoyer sa candidature à khalid.daoudi@inria.fr ; francois.rue@inria.fr ;

dan.dutartre@inria.fr

Back  Top

6-28(2021-04-11) ​Proposal for a postdoctoral position at INRIA, Bordeaux, France

Proposal for a postdoctoral position at INRIA, Bordeaux, France

Title: Sparse predictive models for the analysis and classification of pathological speech

Keywords: Pathological speech processing, Sparse modeling, Optimization algorithms, Machine learning,

Parkinsonian disorders, Respiratory diseases

Contact and Supervisor: Khalid Daoudi (khalid.daoudi@inria.fr)

INRIA team: GEOSTAT (geostat.bordeaux.inria.fr)

Duration: from 01/11/2021 to 31/12/2022 (could be extended to an advanced or a permanent position)

Salary: 2653€ / month

Profile: PhD degree obtained after August 2019 or to be defended by the end of 2021. High quality applications

with a PhD obtained before August 2019 could be considered for an advanced research position.

Required Knowledge and background: A solid knowledge in speech/signal processing; A good mathematical

background; Basics of machine learning; Programming in Matlab and Python.

Scientific research context

During this century, there has been an ever increasing interest in the development of objective vocal biomarkers

to assist in diagnosis and monitoring of neurodegenerative diseases and, recently, respiratory diseases because of

the Covid-19 pandemic. The literature is now relatively rich in methods for objective analysis of dysarthria, a

class of motor speech disorders [1], where most of the effort has been made on speech impaired by Parkinson’s

disease. However, relatively few studies have addressed the challenging problem of discrimination between

subgroups of Parkinsonian disorders which share similar clinical symptoms, particularly is early disease stages

[2]. As for the analysis of speech impaired by respiratory diseases, the field is relatively new (with existing

developments in very specialized areas) but is taking a great attention since the beginning of the pandemic.

On the other hand, the large majority of existing processing methods (of pathological speech in general) still

heavily rely on a core of feature estimators designed and optimized for healthy speech. There exist thus a strong

need for a framework to infer/design speech features and cues which remain robust to the perturbations caused

by (classes of) disordered speech. The first and main objective of this proposal is to explore the framework of

sparse modeling of speech which allow a certain flexibility in the design and parameter estimation of the sourcefilter

model of speech production. This exploration will be essentially based on theoretical advances developed

by the GEOSTAT team and which have led to a significant impact in the field of image processing, not only at

the scientific level [3] but also at the technological level (www.inria.fr/fr/i2s-geostat-un-innovation-lab-enimagerie-

numerique).

The second objective of this proposal is to use the resulting representations as inputs to basic machine learning

algorithms in order to conceive a vocal biomarker to assist in the discrimination between subgroups of

Parkinsonian disorders (Parkinson’s disease, Multiple-System Atrophy, Progressive Supranuclear Palsy) and in

the monitoring of respiratory diseases (Covid-19, Asthma, COPD).

Both objectives benefit from a rich dataset of speech and other biosignals recently collected in the framework of

two clinical studies in partnership with university hospitals in Bordeaux and Toulouse (for Parkinsonian

disorders) and in Paris (for respiratory diseases).

Work description

As stated above, the work to be carried is decomposed in two parts. The main part consists in developing new

algorithms, based on sparse modeling, for the analysis of a class of disordered speech. The second part consists

in exploring machine learning tools to develop vocal biomarkers for the purpose of (differential) diagnosis and

monitoring of the diseases under study.

1. Sparse modeling for disordered speech analysis

The first task will be to investigate sparsity in the framework of linear prediction modeling of speech. The latter

is indeed one of the building blocks for the estimation of core glottal, phonation and articulatory features. Sparse

linear prediction (SLP) has been recently investigated in a convex setting using the L1-norm and applied,

essentially, to speech coding [4]. We will start by investigating the potential of this convex setting in disordered

speech analysis. We will then explore the use of non-convex penalties that allow sparsity control and a better

decoupling the vocal tract filter from excitation source. We will study the spectral properties of the different

models and revisit a set of acoustic features which are not robust to perturbations raising in dysarthric speech.

We will then explore the potential of SLP in designing new features which could be informative about dysarthria.

The algorithmic developments will be evaluated using a rich set of biosignals obtained from patients with

Parkinsonian disorders and from healthy controls. The biosignals are electroglottography and aerodynamic

measurements of oral and nasal airflow as well as intra-oral and sub-glottic pressure.

After dysarthria analysis, we will study speech impairments caused by respiratory deficits. The main goal here

will be to automatically identify respiratory patterns and to design features to quantify the impairments. The

developments will be evaluated using manual annotations, by an expert phonetician, of speech signals obtained

from patients with respiratory deficit and from healthy controls.

Depending on the work progress and time constraints, we may also explore sparsity beyond the linear prediction

model through existing nonlinear representations of speech. It is well known indeed that the linear source-filter

model of speech cannot capture several nonlinearities which exist in the speech production process, particularly

in disordered speech.

2. Machine learning for disease diagnosis and monitoring

Using the outcomes of the first part, the (experimental) objective of the second part is to apply basic machine

learning algorithms (LDA, logistic regression, decision trees, SVM…) using standard tools (such as Scikit-

Learn) to conceive robust algorithms that could help, first, in the discrimination between Parkinsonian disorders

and, second, in the monitoring of respiratory deficit.

3. Work synergy

- The postdoc will interact closely with an engineer who is developing an open-source software architecture

dedicated to pathological speech processing. The validated algorithms will be implemented in this architecture

by the engineer, under the co-supervision of the postdoc.

- Giving the multidisciplinary nature of the proposal, the postdoc will interact with the clinicians participating in

the two clinical studies.

References:

[1] J. Duffy. Motor Speech Disorders Substrates, Differential Diagnosis, and Management. Elsevier, 2013.

[2] J. Rusz et al. Speech disorders reflect differing pathophysiology in Parkinson's disease, progressive

supranuclear palsy and multiple system atrophy. Journal of Neurology, 262(4), 2015.

[3] H. Badri. Sparse and Scale-Invariant Methods in Image Processing. PhD thesis, University of Bordeaux,

France, 2015.

[4] D. Giacobello et al. Sparse Linear Prediction and Its Applications to Speech Processing. IEEE Transactions

on Audio Speech and Language Processing, (20)5, 2012.

Back  Top

6-29(2021-04-19) Technical engineer at ELDA, Paris

The European Language resources Distribution Agency (ELDA), a company specialized in Human Language Technologies within an international context, acting as the distribution agency of the European Language Resources Association (ELRA), is currently seeking to fill an immediate vacancy for a Technical Engineer position.

Job description
Under the supervision of the CEO, the responsibilities of the Technical Engineer include planning and implementing technical development of tools, software components or applications for language resource production and management.
He/she will be in charge of contributing in the current language resources production workflows and managing R&D projects while being also hands-on whenever required by the language resource production and management team. He/she will liaise with external partners at all phases of the projects (submission to calls for proposals, building and management of project teams) within the framework of international, publicly- or privately-funded projects.

This yields excellent opportunities for creative and motivated candidates wishing to participate actively to the Language Engineering field.

The position is based in Paris 13.

Salary: Commensurate with qualifications and experience (between 45-55K?).
Other benefits: complementary health insurance and meal vouchers.

Required profile
?    Master 2 or PhD in Computer Science, Natural Language Processing, or equivalent
?    Experience in Natural Language Processing (speech processing, data mining, machine translation, etc.)
?    Familiarity with open source and free software
?    Good level of English, with strong writing and documentation skills
?    Dynamic and communicative, flexible to work on different tasks in parallel
?    Ability to work independently and as part of a multidisciplinary team
?    Citizenship (or residency papers) of a European Union country
?    Proficiency in Python
?    Knowledge and hands-on in XML and Json
?    Proficiency in classic shell scripting in a Linux environment (POSIX tools, Bash, awk)

About
ELDA is a human-sized company (15 people) acting as the distribution agency of the European Language Resources Association (ELRA). ELRA was established in February 1995, with the support of the European Commission, to promote the development and exploitation of Language Resources (LRs). Language Resources include all data necessary for language engineering, such as monolingual and multilingual lexica, text corpora, speech databases and terminology. The role of this non-profit membership Association is to promote the production of LRs, to collect and to validate them and, foremost, make them available to users. The association also gathers information on market needs and trends.

For further information about ELDA/ELRA, visit: http://www.elda.org

Applicants should email a cover letter addressing the points listed above together with a curriculum vitae to:

ELDA
9 rue des Cordelières
75013 Paris FRANCE
Email: job@elda.org

Back  Top

6-30(2021-04-19) Web Developer at ELDA, Paris, France

The European Language resources Distribution Agency (ELDA), a company specialized in Human Language Technologies within an international context is currently seeking to fill an immediate vacancy for a permanent Web Developer position.

Job description
Under the supervision of the technical department manager, the responsibilities of the Web Developer consist in designing and developing web applications and software tools for linguistic data management.
Some of these software developments are carried out within the framework of European research and development projects and are published as free software.
Depending on the profile, the Web Developer could also participate in the maintenance and upgrading of the current linguistic data processing toolchains, while being hands-on whenever required by the language resource production and management team.

The position is based in Paris 13.

Salary: Commensurate with qualifications and experience (between 36-45K?).
Other benefits: complementary health insurance and meal vouchers

Required profile
?    Master (BAC + 5 or higher) in Computer Science or a related field (experience in natural language processing is a strong plus)
?    Proficiency in Python
?    Hands-on experience in Django
?    Hands-on knowledge of a distributed version control system (Git preferred)
?    Knowledge of SQL and of RDBMS (PostgreSQL preferred)
?    Basic knowledge of JavaScript and CSS
?    Basic knowledge of Linux shell scripting
?    Practice of free software
?    Proficiency in French and English
?    Curious, dynamic and communicative, flexible to work on different tasks in parallel
?    Ability to work independently and as part of a multidisciplinary team
?    Citizenship (or residency papers) of a European Union country

About
ELDA is a human-sized company (15 people) acting as the distribution agency of the European Language Resources Association (ELRA). ELRA was established in February 1995, with the support of the European Commission, to promote the development and exploitation of Language Resources (LRs). Language Resources include all data necessary for language engineering, such as monolingual and multilingual lexica, text corpora, speech databases and terminology. The role of this non-profit membership Association is to promote the production of LRs, to collect and to validate them and, foremost, make them available to users. The association also gathers information on market needs and trends.

For further information about ELDA/ELRA, visit: http://www.elda.org

Applicants should email a cover letter addressing the points listed above together with a curriculum vitae to:
ELDA
9 rue des Cordelières
75013 Paris FRANCE
Email: job@elda.org

Back  Top

6-31(2021-04-22) Post-doc at GIPSA-Lab Grenoble, France

Informations générales

Référence : UMR5216-ALLBEL-024
Lieu de travail : ST MARTIN D HERES
Date de publication : mardi 13 avril 2021
Type de contrat : CDD Scientifique
Durée du contrat : 12 mois
Date d'embauche prévue : 1 juin 2021
Quotité de travail : Temps complet
Rémunération : entre 3768? et 3938? bruts mensuels, selon expérience
Niveau d'études souhaité : Doctorat
Expérience souhaitée : 2 à 10 années

Missions

Ce post-doctorat fait partie du projet ANR GEPETO (GEstures and PEdagogy of InTOnation), dont le but est d'étudier l'utilisation de gestes manuels par le biais d'interfaces humain-machine, pour la conception d'outils et méthodes permettant l'apprentissage du contrôle de l'intonation (mélodie) dans la parole. 

En particulier, ce poste se place dans le contexte de la rééducation vocale, dans le cas de dégradation ou d'absence de vibration des plis vocaux chez des patients atteints de troubles du larynx. Les solutions médicales actuelles pour remplacer cette vibration consistent à injecter une source sonore artificielle dans le conduit vocal, directement par la bouche ou en transmission par les tissus du cou, grâce à un électrolarynx. Ce vibreur génère une source vocale de substitution sur laquelle l'utilisateur peut articuler normalement de la parole. Une alternative est de capter à l'aide d'un microphone la parole non-voisée produite par une personne en absence de vibration des plis vocaux (par exemple un chuchotement), et d'y ré-introduire le voisement en temps-réel par synthèse vocale. La voix reconstruite est alors jouée en temps-réel sur un haut-parleur. Aujourd'hui, l'ensemble de ces systèmes génèrent des signaux d'intonation (mélodie) relativement constante, conduisant à des voix très robotiques.

Le but du projet GEPETO à GIPSA-lab est d'ajouter à ces deux solutions un contrôle de l'intonation en temps-réel par le geste de la main, qui sera capté par diverses interfaces (tablette, accéléromètre, etc.), et d'étudier l'usage de tels systèmes dans des situations d'interactions orales.

 

Le post-doctorat se concentrera sur la solution de conversion chuchotement-parole qui est déjà disponible au laboratoire. Le travail sera divisé en deux tâches.

Dans un premier temps, il s'agira d'ajouter le contrôle gestuel de l'intonation au système de conversion chuchotement-parole. Celui-ci se fera dans l'environnement Max/MSP (langage C/C++), où différents modules sont déjà disponibles au laboratoire (gestion des interfaces, moteur de synthèse, analyse de la parole chuchotée). Diverses interfaces permettant de capter les gestes manuels dans différents espaces (trajectoire sur une surface, dans l'espace, pression, etc.) seront étudiées.

Dans un deuxième temps, nous chercherons à évaluer l'usage d'un tel système dans une application de suppléance vocale, et en particulier la coordination entre le contrôle manuel de l'intonation avec le contrôle naturel de l'articulation. 
D'abord, diverses stratégies de contrôle seront étudiées étant donnée les interfaces disponibles. Notamment, la question du contrôle du voisement (activation ou non de la source glottique) sera abordée. Cette première étape sera évaluée sur des tâches simples d'imitation de phrases, selon des critères de coordination rythmique entre contrôle de la source et de l'articulation, ainsi que de charge cognitive associée à la combinaison des deux contrôles.
Ensuite, nous travaillerons sur l'usage d'un tel système dans des situations de communication. Il s'agit d'un contexte où l'utilisateur doit produire des phrases intelligibles et expressives pour son interlocuteur, sans référence à imiter. Nous proposerons des stratégies d'apprentissage à l'utilisation d'un tel système, et les évaluerons sur plusieurs échelles temporelles (jours, semaines, mois). Ces stratégies seront développées selon des protocoles proposés par des partenaires du projet travaillant sur l'apprentissage du contrôle de l'intonation de langues étrangères.

Activités

- Prendre en main les différents modules pour la conversion chuchotement-parole disponibles au laboratoire (analyse du chuchotement, moteur de synthèse, gestion des interfaces) dans l'environnement Max/MSP
- Connecter les différents modules et développer le système de suppléance vocale contrôlé par le geste manuel, en testant divers contrôleurs gestuels pour le contrôle de l'intonation et du voisement
- Proposer un protocole d'évaluation de ces capteurs en termes de synchronisation rythmique des contrôles manuel et articulatoire, ainsi que de charge cognitive
- Évaluer ces capteurs sur un groupe d'utilisateurs
- Proposer des méthodes d'apprentissage pour l'usage d'un tel système
- Proposer un protocole d'évaluation de l'apprentissage sur plusieurs échelles temporelles (jours, semaines, mois)
- Évaluer l'apprentissage sur un groupe d'utilisateur

Compétences

- Langage C/C++ (connaissance approfondie)
- Matlab (connaissance approfondie)
- Programmation Max/MSP (connaissance souhaitée)
- Traitement du signal (connaissance générale)
- Traitement de la parole (connaissance souhaitée)
- Forte motivation pour la méthodologie et l'expérimentation
- Maîtrise du français (langue utilisée pour le développement et l'évaluation du système)

Expérience souhaitée:
Synthèse de la parole, codage temps-réel Max MSP, interfaces homme-machine, expériences cognitives

Contexte de travail

Gipsa-lab est une unité de recherche mixte du CNRS, de Grenoble INP, et de l'Université de Grenoble Alpes ; elle est conventionnée avec Inria et l'Observatoire des Sciences de l'Univers de Grenoble.
Fort de 350 personnes dont environ 150 doctorants, Gipsa-lab est un laboratoire pluridisciplinaire développant des recherches fondamentales et finalisées sur les signaux et systèmes complexes. Il est reconnu internationalement pour ses recherches en Automatique, Signal et Images, Parole et Cognition et développe des projets dans les domaines stratégiques de l'énergie, de l'environnement, de la communication, des systèmes intelligents, du vivant et de la santé et de l'ingénierie linguistique. 
De par la nature de ses recherches, Gipsa-lab maintient un lien constant avec le milieu économique via un partenariat industriel fort. 
Son potentiel d'enseignants-chercheurs et chercheurs est investi dans la formation au niveau des universités et écoles d'ingénieurs du site grenoblois (Université Grenoble Alpes).
Gipsa-lab développe ses recherches au travers de 16 équipes de recherche organisées en 4 pôles.
Elle compte 150 permanents et environ 250 non-permanents (doctorants, post-doctorants, chercheurs invités, étudiants stagiaires de master, etc.).

Le.a post-doctorant.e sera rattaché.e à l'équipe CRISSP (Cognitive Robotics, Interactive Systems, Speech Processing) du Pôle Parole et Cognition de GIPSA-lab.

Back  Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA