ISCA - International Speech
Communication Association

Previous

ISCApad Archive » 2023 » ISCApad #300 » Jobs

ISCApad #300

Saturday, June 10, 2023 by Chris Wellekens

6 Jobs

6-1

(2022-12-05) Permanent academic post in Speech Technology@ University of Edinburgh, Scotland, UK

The School of Informatics at the University of Edinburgh is recuiting for a permanent academic post in Speech Technology. The appointment will be at Lecturer or Reader grade (equivalent to US Assistant Professor/Associate Professor). You will contribute to research and teaching in the Centre for Speech Technology Research (CSTR) and the Institute for Language, Cognition, and Computation (ILCC). There is extensive scope to collaborate with other Institutes and Schools within the University.

The successful candidate will have (or be near to completing) a PhD, an established research agenda and the enthusiasm and ability to undertake original research, to lead a research group, and to engage with teaching and academic supervision. We are seeking current and future leaders in the field who are able to forge new collaborations both within the field and across disciplines. We are particularly looking for a candidate with potential to extend the breadth of our research beyond our traditional core strengths in speech recognition and synthesis towards emerging applications, for example in spoken dialogue systems; spoken language understanding; healthcare and assistive technology applications; explainable speech processing; human computer interaction; or autonomous systems.

For more details, including how to apply, view the full advert at https://elxw.fa.em3.oraclecloud.com/hcmUI/CandidateExperience/en/job/5973

Applications close on 12 January.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

6-2

(2022-12-08) Ph.D. Position in Cognitive Neuroscience@ GIPSA, Grenoble, France

The GIPSA-lab, Grenoble, is offering a

Ph.D. Position in Cognitive Neuroscience

Senses of confidence and effort in sensorimotor adaptation

(speech and reaching)

Application deadline: 31/12/22; Starting date: 1/04/23 at the latest

Context

Numerous studies have explored sensorimotor learning in hand movements and speech production. They showed how individuals adapt their gestures in a way that compensates partially for the perturbation induced on the visual, auditory or somatosensory feedback. Varying degrees of compensation were observed across individuals (1–3) – in particular for pathological populations (4,5), for different sensorimotor perturbations (e.g. pitch or formant shifted feedback (6,7)), different languages (8) or different tasks (e.g. including linguistic confusion or not) (9,10).

Scientific objectives

In complement to this existing literature (7,11), the current project aims at exploring in more detail the factors influencing this varying degree of compensation to a sensorimotor perturbation. We will explore, in particular, the hypothesis that it may be influenced by the relative attention and confidence given to our different sensory feedbacks (visual and proprioceptive feedbacks for hand movements, auditory and proprioceptive feedbacks for speech), and to the related sense of effort felt in the task.

To that goal, several experiments of both visuo-motor and audio-motor perturbation will be conducted (see Figure 1). In a first behavioral step, we will explore how the degree of arm or speech compensation varies with an increasing rotation of the visual feedback or an increasing pitch shift of the auditory feedback, how it is influenced by the location or the pitch level of the target, and how it may be affected by an increasing degree of visual blurring or auditory masking. We will pay attention, in particular, to possible reorganizations of the compensatory behavior, detected from discontinuities in the compensation/perturbation relationship.

Depending on the candidate’s interests and/or funding opportunities, a second step will explore further the neural correlates of these compensatory mechanisms (12–16) and of their possible re-organization with an increasing level of perturbation, using fMRI neuro-imaging; or the second step of the project will explore the possible impairment of these mechanisms in people who stutter, who demonstrate reduced degrees of compensation to an auditory perturbation (17–20), reduced tactile sensibility of the oral cavity (21–23), and increased sense of effort (24).

Required skills

We are searching for a highly motivated candidate with:

- a Master degree (M.Sc., M. Eng. or equivalent) in (neuro)cognitive sciences, computer science, or signal processing

- knowledge and interest in motor control, neurosciences and speech.
- good programming skills in Matlab, Python or R

- experimental skills and interests

Lab and supervision

The PhD candidate will be supervised by Maëva Garnier, Fabien Cignetti and Pascal Perrier, in collaboration between the GIPSA-lab and TIMC-IMAG in Grenoble. He/she will join the PCMD team of GIPSA-lab in Grenoble, composed of six PhD students and 12 researchers and engineers (http://www.gipsa-lab.grenoble-inp.fr/en/pcmd.php)

Application instructions

The application consists of a motivation letter, CV (with detailed list of courses related to computer science, signal processing, and neuro-cognitive science), names and contact details of two references, and transcripts of grades from under-graduate and graduate programs.

Contact

Maëva Garnier      Email:   maeva.garnier@gipsa-lab.fr                    Phone: (+33) 4 76 57 50 61

Fabien Cignetti     Email:   fabien.cignetti@univ-grenoble-alpes.fr   Phone: (+33) 4 76 63 71 10

Pascal Perrier       Email:   pascal.perrier@gipsa-lab.fr                    Phone: (+33) 4 76 57 48 25

References

1.    Ghosh SS, Matthies ML, Maas E, Hanson A, Tiede M, Ménard L, et al. An investigation of the relation between sibilant production and somatosensory and auditory acuity. J Acoust Soc Am. 2010;128(5):3079–87.

2.    Villacorta VM, Perkell JS, Guenther FH. Sensorimotor adaptation to feedback perturbations of vowel acoustics and its relation to perception. J Acoust Soc Am. 2007;122(4):2306–19.

3.    Savariaux C, Perrier P. Compensation strategies for the perturbation of the rounded vowel [u] using a lip tube: A study of the control space in speech production. J Acoust Soc Am. 1995;98(5):2428–42.

4.    Loucks T, Chon H, Han W. Audiovocal integration in adults who stutter: Audiovocal integration in stuttering. Int J Lang Commun Disord. 2012 Jul;47(4):451–6.

5.    Mollaei F, Shiller DM, Baum SR, Gracco VL. Sensorimotor control of vocal pitch and formant frequencies in Parkinson’s disease. Brain Res. 2016;1646:269–77.

6.    Jones JA, Munhall KG. Perceptual calibration of F0 production: evidence from feedback perturbation. J Acoust Soc Am. 2000 Sep;108(3 Pt 1):1246–51.

7.    MacDonald EN, Goldberg R, Munhall KG. Compensations in response to real-time formant perturbations of different magnitudes. J Acoust Soc Am. 2010 Feb 1;127(2):1059–68.

8.    Mitsuya T, MacDonald EN, Purcell DW, Munhall KG. A cross-language study of compensation in response to real-time formant perturbation. J Acoust Soc Am. 2011;130(5):2978–86.

9.    Bourguignon NJ, Baum SR, Shiller DM. Lexical-perceptual integration influences sensorimotor adaptation in speech. Front Hum Neurosci. 2014;8:208.

10. Frank AF. Integrating linguistic, motor, and perceptual information in language production. University of Rochester; 2011.

11. Liu H, Larson CR. Effects of perturbation magnitude and voice F0 level on the pitch-shift reﬂex. :8.

12. Behroozmand R, Korzyukov O, Sattler L, Larson CR. Opposing and following vocal responses to pitch-shifted auditory feedback: Evidence for different mechanisms of voice pitch control. J Acoust Soc Am. 2012 Oct;132(4):2468–77.

13. Parkinson AL, Flagmeier SG, Manes JL, Larson CR, Rogers B, Robin DA. Understanding the neural mechanisms involved in sensory control of voice production. Neuroimage. 2012;61(1):314–22.

14. Toyomura A, Fujii T, Kuriki S. Effect of external auditory pacing on the neural activity of stuttering speakers. NeuroImage. 2011 Aug 15;57(4):1507–16.

15. Zarate JM, Wood S, Zatorre RJ. Neural networks involved in voluntary and involuntary vocal pitch regulation in experienced singers. Neuropsychologia. 2010;48(2):607–18.

16. Zarate JM, Zatorre RJ. Experience-dependent neural substrates involved in vocal pitch regulation during singing. Neuroimage. 2008;40(4):1871–87.

17. Kim KS, Daliri A, Flanagan JR, Max L. Dissociated Development of Speech and Limb Sensorimotor Learning in Stuttering: Speech Auditory-motor Learning is Impaired in Both Children and Adults Who Stutter. Neuroscience. 2020 Dec 15;451:1–21.

18. Daliri A, Wieland EA, Cai S, Guenther FH, Chang SE. Auditory-motor adaptation is reduced in adults who stutter but not in children who stutter. Dev Sci. 2018;21(2):e12521.

19. Cai S, Beal DS, Ghosh SS, Tiede MK, Guenther FH, Perkell JS. Weak Responses to Auditory Feedback Perturbation during Articulation in Persons Who Stutter: Evidence for Abnormal Auditory-Motor Transformation. Larson CR, editor. PLoS ONE. 2012 Jul 23;7(7):e41830.

20. Sengupta R, Shah S, Gore K, Loucks T, Nasir SM. Anomaly in neural phase coherence accompanies reduced sensorimotor integration in adults who stutter. Neuropsychologia. 2016;93:242–50.

21. De Nil LF, Abbs JH. Kinaesthetic acuity of stutterers for oral and non-oral movements. Brain. 1991;114(5):2145–58.

22. Loucks TM, De Nil LF. Oral kinesthetic deficit in adults who stutter: a target-accuracy study. J Mot Behav. 2006;38(3):238–47.

23. Loucks TMJ, De Nil LF. Oral kinesthetic deficit in stuttering evaluated by movement accuracy and tendon vibration. Speech Mot Control Norm Disord Speech. 2001;307–10.

24. Ingham RJ, Warner A, Byrd A, Cotton J. Speech effort measurement and stuttering: Investigating the chorus reading effect. 2006;

25. Caudrelier T, Rochet-Capellan A. Changes in speech production in response to formant perturbations: An overview of two decades of research. 2019.

6-3

(2022-12-05) Real time speaker separation Master internship, Lille (France), 2022@SteelSeries France R&D team (former Nahimic R&D team), France

Real time speaker separation

Master internship, Lille (France), 2022

Advisors — Nathan Souviraà-Labastie, R&D Engineer, PhD, nathan.souviraa-labastie@steelseries.com — Damien Granger, R&D Engineer, damien.granger@steelseries.com

Company description About GN Group GN was founded 150 years ago with a truly innovative and global mindset. Today, we honour that legacy with world-leading expertise in the human ear, sound and video processing, wireless technology, miniaturization and collaborations with leading technology partners. GN’s solutions are marketed by the brands ReSound, Beltone, Interton, Jabra, BlueParrott, SteelSeries and FalCom in 100 countries. The GN Group employs 6,500 people and is listed on Nasdaq Copenhagen (GN.CO).

About SteelSeries SteelSeries is the worldwide leader in gaming and esports peripherals focused on premium quality, innovation, and functionality. SteelSeries’ family of professional and gaming enthusiasts are the driving force behind the company and help influence, design, and craft every single accessory and the brand’s software ecosystem, SteelSeries GG. In 2020, SteelSeries acquired Nahimic, the leader in 3D sound solutions for gaming. We are currently looking for a machine learning / audio signal processing intern to join the R&D team of SteelSeries’ Software & Services Business Unit in our French office (former Nahimic R&D team).

Internship subject Audio source separation consists in extracting the different sound sources present in an audio signal, in particular by estimating their frequency distributions and/or spatial positions. Many applications are possible from karaoke generation to speech denoising. In 2020, our separation approaches [1, 2] were equaling the state of the art [3, 4] on a music separation task. Since then our speech denoising product has hit the market [5] and the team continue to explore many tracks of improvements (see for instance the following project [6, 7]). Real time speaker separation This internship targets speaker separation which is formalized in the scientific community as the task of separately retrieving a given number of speech/speaker signals from a monaural mixture signal. Most of the scientific challenges [8] compare offline (not real-time) approaches. The objective of the internship is to address the following targets (more or less ordered) : — Based on our current speech denoising trainsets, the candidate will create a trainset for the speaker separation task that match the same in-house requirement. Indeed, most of the available datasets in the scientific community lack quantity, audio quality of the groundtruths, high sampling rate, diversity of speakers/noise type. In addition, for the SteelSeries use cases, the overlap in time of the different speech sources might be lower than in the scenarii used by the scientific community and it statistical distribution will need to be well identified/defined. — Once our offline and online baseline algorithm have been trained on such a trainset, the candidate could benchmark on different scenarii (number of speaker, signal ratio between speakers, effect of additional noise, various and mixed languages) to potentially fulfill the weakness of the trainset. — The first subjective listening could bring the candidate to design complementary metrics, for instance representing false positive in speaker attribution or representating the statistics about the time needed by real-time DNN to correctly attribute a signal to the correct speaker after some silence. 1 — While all the above could be done using state-of-the-art loss functions, the candidate could also adapt our internal loss to be permutation invariant [9]. — The scientific community is very active in proposing new DNN architectures (offline [10, 8] and online [11, 12]. The candidate could also re-implement or propose her/his own architecture. In particular, a multi-task approach where the DNN also outputs the number of active speakers would be of great interest.

Skills Who are we looking for ? Preparing an engineering degree or a master’s degree, you preferably have knowledge in the development and implementation of advanced machine learning algorithms. Digital audio signal processing skills is a plus. Whereas not mandatory, notions in the following additional various fields would be appreciated : Audio effects in general : compression, equalization, etc. - Statistics, probabilist approaches, optimization. - Programming language : Python, Pytorch, Keras, Tensorflow, Matlab. - Voice recognition, voice command. - Computer programming and development : Max/MSP, C/C++/C#. - Audio editing software : Audacity, Adobe Audition, etc. - Scientific publications and patent applications. - Fluent in English and French. - Demonstrate intellectual curiosity.

Références [1] I. Alaoui Abdellaoui et N. Souviraà-Labastie. « Blending the attention mechanism in TasNet ». working paper or preprint. Nov. 2020. [2] E. Pierson Lancaster et N. Souviraà-Labastie. « A frugal approach to music source separation ». working paper or preprint. Nov. 2020. [3] F.-R. Stöter, A. Liutkus et N. Ito. « The 2018 signal separation evaluation campaign ». In : International Conference on Latent Variable Analysis and Signal Separation. Springer. 2018, p. 293-305. [4] N. Takahashi et Y. Mitsufuji. « Multi-scale Multi-band DenseNets for Audio Source Separation ». In : 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 29 juin 2017. arXiv : 1706.09588. [5] ClearCast AI Noise Canceling - Promotion video. https : / / www . youtube . com / watch ? v = RD4eXKEw4Lg. [6] M. Vial et N. Souviraà-Labastie. Learning rate scheduling and gradient clipping for audio source separation. Rapp. tech. SteelSeries France, déc. 2022. [7] The torchcustoml rschedulersGitHubrepository. https : / / github . com / SteelSeries / torch _ custom_lr_schedulers. [8] Speech separation task referenced on the paperswithcode website. https://paperswithcode.com/ task/speech-separation. [9] X. Liu et J. Pons. « On permutation invariant training for speech source separation ». In : ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2021, p. 6-10. [10] Music separation task referenced on the paperswithcode website. https://paperswithcode.com/ sota/music-source-separation-on-musdb18. [11] DNS challenge on the paperswithcode website. https://paperswithcode.com/sota/speechenhancement-on-deep-noise-suppression. [12] H. Dubey et al. « Icassp 2022 deep noise suppression challenge ». In : ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2022, p. 9271-9275

6-4

(2022-12-12) Master internship @ LISN Lab, Orsay, Gif sur Yvette, France

Creation of a speech synthesis model from spontaneous speech

Keywords:

Speech synthesis, spontaneous speech, low ressource languages, Nigerian Pidgin Context Nigerian Pidgin is a large but under-resourced language that increasingly serves as the primary vernacular language of Africa’s most populous country. Once stigmatized as a “broken” variety of English spoken only by the uneducated, Nigerian Pidgin is now a source of pride for many speakers who view it as a home-grown vehicle for communication. It transcends class and ethnicity, lacking the tribal associations of indigenous languages and the colonial baggage associated with English. The language can now be seen and heard in college campuses, houses of worship, advertisements, Nigerian expat communities, and even on a local branch of the British Broadcasting Channel.

Objectives

Despite Nigerian Pidgin’s growing prestige and a pool of speakers rivaling those of major languages like Turkish or Korean, the grammatical and intonational properties of the language are comparatively understudied. This internship is the extension of an ongoing research project aimed at better understanding its linguistic properties through the development and adaptation of NLP technologies. This research’s principal aim is to produce a natural-sounding text-tospeech (TTS) model that will allow researchers to conduct perception tests to determine how intonation influences the interpretation of meaning. Thanks partly to the recent explosion of neural network-based speech technologies, researchers can now produce high-quality synthesis from relatively simple datasets using models like TacoTron 2, complementing classical approaches such as those based on Hidden Markov Models. Specifically, the intern will assist in developing a text-to-speech platform trained on an existing database of Nigerian Pidgin recordings. In addition to producing natural-sounding speech, a central goal of this project will be to build a TTS model that will allow for the direct modification of intonational patterns via explicit parameters provided by researchers. The intern’s work will contribute to the exploration of the language’s melodic and tonal properties by allowing researchers to produce variations of novel utterances differing only by their intonational patterns.

Primary tasks

• Surveying existing TTS models and selecting the most suitable approach

• Training a model on a corpus of Nigerian Pidgin • Optimizing and evaluating the model

Profile

A second-year master’s student with:

• A solid background in machine learning (speech synthesis is a plus)

• Good academic writing skills in English

• An strong interest in language and linguistics

Sous la tutelle de : www.lisn.upsaclay.fr | Twitter @LisnLab | LinkedIn LisnLab Site Belvédère : Campus Universitaire Bâtiment 507 Rue du Belvédère – 91405 Orsay Cedex Site Plaine : Campus Universitaire bâtiment 650 Rue Raimond Castaing – 91190 Gif-sur-Yvette M2-CS-Intenship 2022-2023

Modalities

The internship will take place from March 2023 for 5 to 6 months at the LISN lab in the Sciences and Language Technologies department, as well as in the MoDyCo lab at Paris Nanterre University (primarily at the location of shortest commute).

• The LISN’s Belvédère site is located in the plateau de Saclay: University campus building 507, rue du Belvédère, 91400 Orsay.

• The MoDyCo lab is located at the Université Paris Ouest Nanterre La Défense: Bâtiment A, 200, avenue de la République – 92001 Nanterre. The candidate will be supervised by Emmett Strickland (MoDyCo) and Marc Evrard (LISN). Allowance under the official standards (service-public.fr).

To apply

Please send a CV and brief cover letter highlighting your interest in the project to the following: • Emmett Strickland (emmett.strickland@parisnanterre.fr) • Marc Evrard (marc.evrard@lisn.upsaclay.fr)

Further reading

1. Tan, X., Qin, T., Soong, F., & Liu, T. Y. (2021). A survey on neural speech synthesis. arXiv preprint arXiv:2106.15561. https://arxiv.org/abs/2106.15561

2. Ning, Y., He, S., Wu, Z., Xing, C., & Zhang, L. J. (2019). A Review of Deep Learning Based Speech Synthesis. Applied Sciences (2076-3417), 9(19). https://www.mdpi.com/2076- 3417/9/19/4050

3.Bigi, B., Caron, B., & Abiola, O. S. (2017). Developing resources for automated speech processing of the african language naija (nigerian pidgin). In 8th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (pp. 441-445). https://hal.archives-ouvertes.fr/hal-01705707/document

6-5

(2022-12-23) Postdoctoral Research Fellows, National University of Singapore

Two full-time Postdoctoral Research Fellows in automatic lyrics generation and automatic singing voice/speech evaluation.

You can find the detailed job descriptions here:

https://smcnus.comp.nus.edu.sg/postdoct_job_description_2022

6-6

(2023-01-02) POSTDOC 21 MONTHS at GIPSA-Lab, Grenoble-France

POSTDOC 21 MONTHS at GIPSA-Lab, Grenoble-France
on the automatic evaluation of computer-assisted reading fluency of young French readers'

To have more info and apply: https://emploi.univ-grenoble-alpes.fr/offres/chercheur-chercheuse-post-doctoral-evaluation-automatique-de-la-fluence-pour-l-apprentissage-de-la-lecture-1164624.kjsp?RH=1135797159702996

Contact: Gerard Bailly at gerard.bailly@gipsa-lab.fr

6-7

(2023-01-03) Stage de recherche projet CAIBots: Conversational AI with teams of robots, LIA, Avignon, France

FORMULAIRE DE STAGE RECHERCHE Intitulé du projet CAIBots: Conversational AI with teams of robots

Encadrants Prof. Fabrice Lefèvre

Descriptif du stage : L'objectif du stage consiste à étudier la mise en place d’un dispositif robotique permettant la simulation en « conditions réelles » des IA conversationnelles (CAI) vocales. Entraîner puis tester de l’IA conversationnelle (chatbots, systèmes de dialogue) est couteux et complexe, nous souhaitons grandement réduire cette difficulté en fournissant une solution robotique physique autonome pour apprendre et évaluer de nouveaux modules pour la CAI avant de les utiliser avec de vrais utilisateurs humains. Dans un premier temps, il s’agira principalement de tester des solutions existantes et clefs en main pour les éléments de la chaîne de traitement du langage parlé et de vérifier leur niveau de performance en configuration robot-robot. Ensuite une recherche vers des solutions embarquées sera menée. Elle devra permettre d’améliorer la latence du dispositif mais aussi d’assurer une meilleure protection des données personnelles (en ôtant la nécessité du passage par des clouds propriétaires). Globalement le système d'interaction vocal mis en place devra permettant une discussion ouverte entre un humain et une machine sur des sujets généraux. Le cas d’usage envisagé se positionne donc dans la logique du challenge Amazon Alexa (https://developer.amazon.com/alexaprize) : développer un bot pouvant entretenir une conversation pendant quelques minutes. Il sera donc nécessaire de prévoir aussi un utilisateur simulé pour permettre une interaction robot-robot autonome (le cas de conversations multiparties humains-robots pourra aussi être testé, sans être un objectif prioritaire du stage). Il s'agira d’initier le dispositif, c'est à dire de mettre en place les composants en configuration de base, mais illustrant les capacités potentielles pouvant être atteintes avec un temps de développement plus conséquent. Les solutions robotiques et logicielles entrevues pour ce travail sont, par exemple : robot Pepper, Google Cloud ASR, SpeechBrain, RASA et/ou des modèles pré-entraînés (BERT, GPT, BlenderBot…) ... Il s’agit principalement de plateformes open-source, assez complètes. Le travail consistera à mettre en œuvre rapidement un système réel afin de pouvoir le faire progresser en configuration robot-robot puis le tester avec un panel représentatif d'utilisateurs potentiels. Si un intérêt pour l'apprentissage automatique et le traitement de la langue naturelle est essentiel, il est aussi attendu du stagiaire de bonnes capacités en développement logiciel. Le stage sera une occasion d'acquérir des compétences en traitement automatique de la langue dans un contexte d'expérimentation en robotique embarquée. Plusieurs pistes pour une prolongation en thèse sont ouvertes. Durée du stage 6 mois Rémunération Environ 540€ / mois Thématique associée au stage Systèmes de dialogue humain-machine, reconnaissance et compréhension de parole, interface cognitive, robotique

6-8

(2023-01-05) Post-doctoral and engineer positions@ LORIA-INRIA, Nancy, France

Automatic speech recognition for non-natives speakers in a noisy environment

Post-doctoral and engineer positions

Starting date: begin of 2023

Duration: 24 months for a post-doc position and 12 months for an engineer position

Supervisors: Irina Illina, Associate Professor, HDR Lorraine University LORIA-INRIA Multispeech Team, illina@loria.fr

Context

When a person has their hands busy performing a task like driving a car or piloting an airplane, voice is a fast and efficient way to achieve interaction. In aeronautical communications, the English language is most often compulsory. Unfortunately, a large part of the pilots are not native English and speak with an accent dependent on their native language and are therefore influenced by the pronunciation mechanisms of this language. Inside an aircraft cockpit, non-native voice of the pilots and the surrounding noises are the most difficult challenges to overcome in order to have efficient automatic speech recognition (ASR). The problems of non-native speech are numerous: incorrect or approximate pronunciations, errors of agreement in gender and number, use of non-existent words, missing articles, grammatically incorrect sentences, etc. The acoustic environment adds a disturbing component to the speech signal. Much of the success of speech recognition relies on the ability to take into account different accents and ambient noises into the models used by ARP.

Automatic speech recognition has made great progress thanks to the spectacular development of deep learning. In recent years, end-to-end automatic speech recognition, which directly optimizes the probability of the output character sequence based on the input acoustic characteristics, has made great progress [Chan et al., 2016; Baevski et al., 2020; Gulati, et al., 2020].

Objectives

The recruited person will have to develop methodologies and tools to obtain high-performance non-native automatic speech recognition in the aeronautical context and more specifically in a (noisy) aircraft cockpit.

This project will be based on an end-to-end automatic speech recognition system [Shi et al., 2021] using wav2vec 2.0 [Baevski et al., 2020]. This model is one of the most efficient of the current state of the art. This wav2vec 2.0 model enables self-supervised learning of representations from raw audio data (without transcription).

How to apply: Interested candidates are encouraged to contact Irina Illina (illina@loria.fr) with the required documents (CV, transcripts, motivation letter, and recommendation letters).

Requirements & skills:

- M.Sc. or Ph.D. degree in speech/audio processing, computer vision, machine learning, or in a related field,

- ability to work independently as well as in a team,

- solid programming skills (Python, PyTorch), and deep learning knowledge,

- good level of written and spoken English.

References

[Baevski et al., 2020] A. Baevski, H. Zhou, A. Mohamed, and M. Auli. Wav2vec 2.0: A framework for self-supervised learning of speech representations, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 2020.

[Chan et al., 2016] W. Chan, N. Jaitly, Q. Le and O. Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4960-4964, 2016.

[Chorowski et al., 2017] J. Chorowski, N. Jaitly. Towards better decoding and language model integration in sequence to sequence models. Interspeech, 2017.

[Houlsby et al., 2019] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly. Parameter-efficient transfer learning for NLP. International Conference on Machine Learning, PMLR, pp. 2790–2799, 2019.

[Gulati et al., 2020] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang. Conformer: Convolution-augmented transformer for speech recognition. Interspeech, 2020.

[Shi et al., 2021] X. Shi, F. Yu, Y. Lu, Y. Liang, Q. Feng, D. Wang, Y. Qian, and L. Xie. The accented english speech recognition challenge 2020: open datasets, tracks, baselines, results and methods. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6918–6922, 2021.

6-9

(2023-01-15) Stage M2, LIA, Avignon, France

Sujet de stage de M2 : Décodage des signaux EEG à l’aide des méthodes d’apprentissage automatique

Contexte L’EEG (électroencéphalographie) est une technique non invasive, qui permet de mesurer l’activité électrique du cerveau à l’aide d’électrodes placées sur la tête. Ces électrodes enregistrent l’activité électrique causées par les neurones. Les données recueillies sont enregistrées et peuvent être, à l’aide des méthodes d’apprentissage automatique, utilisées à diverses fins comme par exemple : analyse, classification ou interface neuronale directe [Cao20]. Dans le cadre du traitement automatique du langage et de la parole, des premiers travaux, avec des résultats préliminaires, sont apparus récemment (e.g. classification EEG avec une approche par Transformer [Sun21], des données EEG issues de la lecture de phrases [Hollenstein18] ou encore dans le cadre de méthodes combinant l’utilisation d’EEG avec des techniques de traitement du langage pour la détection de préférences utilisateur [Gauba17]).

Objectif L’objectif de ce stage consiste à reconnaître automatiquement des caractères dans un premier temps, puis des mots isolés, énoncés oralement via les signaux EEG. Les différentes étapes du stage peuvent se résumer comme suit : 1. Prise en main du casque EEG Emotiv EPOC-X (https://www.emotiv.com/epoc-x/). 2. Mettre en place un protocole expérimental et collecter un corpus permettant la mise en place des expériences. 3. Evaluer et choisir des algorithmes d’apprentissage automatique pour reconnaitre les caractères et/ou mots isolés à partir des signaux EEG du corpus collecté dans le cadre du stage.

Profil du candidat L’étudiant.e doit être en dernière année de diplôme d’ingénieur ou en Master 2 d’informatique. Il ou elle doit posséder des notions de programmation, maîtriser l’environnement Linux et les méthodes standard d’apprentissage automatique. Durée du stage 6 mois de février à mars 2023.

Lieu du stage Le stage aura lieu au sein du Laboratoire Informatique d’Avignon (LIA) à Avignon Université ou au sein du Laboratoire des Sciences du Numérique de Nantes (LS2N) à l’Université de Nantes.

Gratification Le stage sera gratifié selon le montant horaire en vigueur au 01/01/2023, considérant une convention de stage de 35 heures par semaine (4,05 euros soit ≃550 €/mois).

Comment postuler ? Merci d’envoyer par email à Mickael Rouvier (mickael.rouvier@univavignon.fr) et Richard Dufour (richard.dufour@univ-nantes.fr) les documents suivants : 1) CV ; 2) relevé de notes (licence et master) et 3) lettre de motivation.

Bibliographie

[Cao20] Cao, Z. (2020). A review of artificial intelligence for EEG-based brain− computer interfaces and applications. Brain Science Advances, 6(3), 162-170.

[Gauba17] Gauba, H., Kumar, P., Roy, P. P., Singh, P., Dogra, D. P., & Raman, B. (2017). Prediction of advertisement preference by fusing EEG response and sentiment analysis. Neural Networks, 92, 77-88.

[Hollenstein18] Hollenstein, N., Rotsztejn, J., Troendle, M., Pedroni, A., Zhang, C., & Langer, N. (2018). ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading. Scientific data, 5(1), 1-13.

[Sun21] Sun, J., Xie, J., & Zhou, H. (2021). EEG classification with transformer-based models. In 2021 IEEE 3rd Global Conference on Life Sciences and Technologies (LifeTech) (pp. 92-93).IEEE

6-10

(2023-01-15) Ref 6581/22, Postdoctoral Research Fellow, MARCS Institute for Brain, Behaviour & Development ,Canberra, Australia

Ref 6581/22, Postdoctoral Research Fellow, MARCS Institute for Brain, Behaviour & Development

About Western

Western Sydney University is a modern, forward-thinking, research-led university, located at the heart of Australia’s fastest-growing and economically significant region, Western Sydney. Boasting 11 campuses – many in Western Sydney CBD locations – and more than 200,000 alumni, 49,500 students and 3,500 staff, the University has 14 Schools with an array of well-designed programs and degrees carefully structured to meet the demands of future industry. The University is ranked in the top two per cent of universities worldwide, and as a research leader, over 85 per cent of the University’s assessed research is rated at ‘World Standard’ or above.

About the Role The MARCS

Institute for Brain, Behaviour and Development is seeking to appoint a Postdoctoral Research Fellow to join the Brain Sciences research program at Western Sydney University. This two year Postdoctoral Research Fellow position is funded by an ARC Discovery grant, “Investigating the characteristics of older adults' conversation behaviour” awarded to Chief Investigators (CIs) Professor Chris Davis and Professor Jeesun Kim at the MARCS Institute for Brain, Behaviour and Development, Western Sydney University, in collaboration with Partner Investigators (PI) Emeritus Professor Valerie Hazan at University College London. This project will investigate the production and perception of naturalistic conversations by young and older adults. In particular, the interest is in probing individualised semantic processing. The aim of the project is to understand factors that affect the engagement of older adults in conversations. In this role you will work with the team to develop, pilot, and apply new methods for probing individual-specific semantic processing and on the collection of sensory, perceptual, cognitive and electrophysiological (EEG) data. You will also implement procedures for analysing the data. The successful applicant will work closely with the above investigators and the rest of the research team, conducting studies and participating in supervising and/or training PhD students, Research Assistants, Honours students, and interns enlisted into the project. For further information about the role please refer to the attached Position

Description.

This is a full time, 2 year fixed term contract. Located in the new Westmead Innovation Quarter (WIQ) – a visionary research, health, education and business hub located in the Westmead Health Precinct. About MARCS The MARCS Institute for Brain, Behaviour and Development is an interdisciplinary research institute of Western Sydney. The vision for the Institute is to optimise human interaction and wellbeing across the lifespan. To strive to solve the problems that matter most through the themes: sensing and perceiving, interacting with each other, and technologies for humans. Researchers in MARCS come from many disciplines including cognitive science, developmental psychology, language science, music science, cognitive neuroscience, and biomedical, electrical, electronic and software engineering. Further information is available from our website - http://www.westernsydney.edu.au/marcs

About You

You will hold a relevant doctoral qualification, or substantial progress toward a PhD in Psychology, Cognitive Neuroscience, Computational linguistics or a related discipline. You must have experience in conducting neuroimaging and behavioural experiments.

Culture

Western Sydney University highly values equity and inclusiveness. We have a proud history of doing so and consider this an important part of our social and civic responsibilities as a University. We strive to contribute to tackling inequalities and promoting wellbeing within our own institution, the Greater Western Sydney region, nationally and internationally.

Remuneration Package:

Academic Level A: $103,304 to $124,808 p.a. (comprising Salary of $87,293 to $105,464 p.a., plus Superannuation and Leave Loading) Academic Level B: $131,136 to $154,738 p.a. (comprising Salary of $110,811 to $130,846 p.a., plus Superannuation and Leave Loading) Position Enquiries: Please contact Professor Chris Davis via email at chris.davis@westernsydney.edu.au

Closing Date:

8:30pm AEDT, Sunday, 12 February 2023

Immigration Sponsorship: Employer Visa sponsorship will be provided if required. Click here to view Position Description How to Apply: · Start your application by clicking the 'begin' button. · Login to an existing account or reset your password · Preview Application Form Western Sydney University is committed to diversity and social inclusion. Applications from people of culturally and linguistically diverse backgrounds; equity target groups including women, people with disabilities, people who identify as LGBTIQ, and people of Aboriginal and Torres Strait Islander descent are encouraged.

Professor Chris Davis, PhD

The MARCS Institute for Brain, Behaviour and Development

Western Sydney University

Westmead Innovation Quarter

Building U, Level 4

160 Hawkesbury Road (Corner of Farmhouse Road)

Westmead NSW 2145

<chris.davis@westernsydney.edu.au>

6-11

(2023-01-16) Research Fellow Chairs @MIAI,Grenoble Interdisciplinary Institute, France

MIAI, the Grenoble Interdisciplinary Institute in Artificial Intelligence (https://miai.univ-grenoble-alpes.fr/), is opening three research fellow chairs in AI reserved to persons who have spent most of their research career outside France (see below). MIAI is one of the four AI institutes created by the French government and is dedicated to AI for the human beings and the environment. Research activities in MIAI aim to cover all aspects of AI and applications of AI with a current focus on embedded and hardware architectures for AI, learning and reasoning, perception and interaction, AI & society, AI for health, AI for environment & energy, and AI for industry 4.0.

These research fellow chairs aim to to address important and ambitious research problems in AI-related fields and will partly pave the way for the future research to be conducted in MIAI. Successful candidates will be appointed by MIAI and will be allocated, for the whole duration of the chair, a budget of 250k€ covering PhD and/or postdoc salaries, internships, travels, … They will be part of MIAI and the French network of AI institutes (comprising, in addition to MIAI, the AI institutes in Paris, Toulouse and Nice) which provide a very dynamic environment for conducting research in AI.

Eligibility To be eligible, candidates must hold a PhD from a non-French university obtained after January 2014 for male applicants and after 2014-n, where n is the number of children, for female applicants. They must also have spent more than two thirds of their research career since the beginning of their PhD outside France. Lastly, they should be pursuing internationally recognized research in AI-related fields (including applications of AI to any research field).

To apply Interested candidates should first contact Eric Gaussier (eric.gaussier@univ-grenoble-alpes.fr) to discuss salary and application modalities. It is important to note that candidates should identify a local collaborator working in one of the Grenoble academic research labs with whom they will interact. If selected, they will join the research team of this collaborator. They should then send their application to Manel Boumegoura (manel.boumegoura@univ-grenoble-alpes.fr) and Eric Gaussier (eric.gaussier@univ-grenoble-alpes.fr) by March 11, 2023. Each application should comprise a 2-page CV, a complete list of publications, 2 reference letters, a letter from the local collaborator indicating the relevance and importance of the proposed project, and a 4-page description of the research project which can target any topic of AI or applications of AI. It is important to emphasize, in the description, the ambition, the originality and the potential impact of the research to be conducted, as well as the collaborations the candidate has or will develop with Grenoble researchers in order to achieve her or his research goals.

Starting date and duration Each chair is intended for 3 to 4 years, starting no later than September 2023.

Location The work will take place in Grenoble, in the research lab of the identified collaborator.

For any question, please contact Eric Gaussier (eric.gaussier@univ-grenoble-alpes.fr) or Manel Boumegoura (manel.boumegoura@univ-grenoble-alpes.fr).

*******

6-12

(2023-01-17) Two postdoctoral positions @ University of Cambridge, UK

Senior Postdoctoral Position in SLP

University of Cambridge, Department of Engineering (UK)

The ALTA Institute is looking for a senior postdoctoral researcher in spoken language processing to join our research team investigating L2 English speaking automated assessment and learning.

Website: https://www.jobs.cam.ac.uk/job/39114/

Postdoctoral Position in SLP

University of Cambridge, Department of Engineering (UK)

The ALTA Institute is looking for a postdoctoral researcher in spoken language processing to join our research team investigating L2 English speaking automated assessment and learning.

Website: https://www.jobs.cam.ac.uk/job/39023/

6-13

(2023-01-25) Master 2 internship @ LISN, Orsay, France

Creation of a speech synthesis model from spontaneous speech

Keywords: Machine learning, speech synthesis, low resource languages, Nigerian Pidgin

Objectives The main aim is to produce a natural-sounding text-to-speech (TTS) model allowing to perform perceptual tests for experimental linguistics. Thanks partly to the recent evolution of neural network-based speech technologies, researchers can now produce high-quality synthesis from relatively simple datasets using models like TacoTron 2, complementing classical approaches such as those based on Hidden Markov Models. Specifically, the intern will assist in developing a text-to-speech platform trained on an existing database of Nigerian Pidgin recordings. In addition to producing natural-sounding speech, a central goal of this project will be to build a TTS model that will allow for the direct modification of intonational patterns via explicit parameters provided by researchers. The intern’s work will contribute to the exploration of the language’s melodic and tonal properties by allowing researchers to produce variations of novel utterances differing only by their intonational patterns. Context This work is part of a larger project to study Nigerian Pidgin. It is a large but under-resourced language that increasingly serves as the primary vernacular language of Africa’s most populous country. Once stigmatized as a “broken” variety of English spoken only by the uneducated, Nigerian Pidgin is now a source of pride for many speakers who view it as a home-grown vehicle for communication. It transcends class and ethnicity, lacking the tribal associations of indigenous languages and the colonial baggage associated with English. The language can now be seen and heard in college campuses, houses of worship, advertisements, Nigerian expat communities, and even on a local branch of the BBC.

Primary tasks

• Surveying existing TTS models and selecting the most suitable approach

• Training a model on a corpus of Nigerian Pidgin

• Optimizing and evaluating the model Profile

A second-year master’s student with:

• A solid background in machine learning (speech synthesis is a plus)

• Good academic writing skills in English • An strong interest in language and linguistics

6-14

(2023-01-26) Poste de maître de conférences en informatique, Nantes, France

Nantes Université ouvre un poste de maître de conférences en informatique pour septembre 2023. L'enseignement sera effectué au sein de la Faculté des Langues et Cultures Etrangères (FLCE) et la recherche sera menée au sein du LS2N (Laboratoire des Sciences du numérique de Nantes).

Plusieurs profils de recherche sont possibles, dont celui du traitement automatique de données langagières, pour une intégration dans l'équipe TALN (Traitement Automatique du Langage Naturel).
Plus d'informations sur l'équipe sont disponibles en ligne (http://taln.ls2n.fr).

Concernant les attendus du poste, la fiche descriptive est consultable à l'adresse : https://uncloud.univ-nantes.fr/index.php/s/ERdm9t8WPNdCn8m

Contacts : Pascal.Andre@univ-nantes.fr (enseignement) / Emmanuel.Morin@ls2n.fr (recherche)

6-15

(2023-01-30) Vacancy for a university professor in computer science at Bordeaux INP, France

Call for applications:

:
- Research in computer music in the image and sound department of the LaBRI (www.labri.fr) and at SCRIME (scrime.u-bordeaux.fr)
- Teaching at ENSEIRB-MATMECA ( https://enseirb-matmeca.bordeaux-inp.fr/fr) in the computer science department
- Schedule: applications between February 23 2023, and March 30 2023, start September 2023

- More information : https://enseirb-matmeca.bordeaux-inp.fr/fr/enseignants

- Contact : myriam.desainte-catherine@labri.fr

Applicants must propose a research project that fits within the image and sound department of the LaBRI to work in particular with the Sound and Music Modeling group, and create links with the Manao team and the Analysis and Indexing group. Candidates must also propose a project for the SCRIME research platform (Studio de Recherche et de Création en Informatique et Musiques Expérimentales) following the departure of the current director. The research area is computer music (sound and music computation, musical interaction). The candidates must be involved in at least one of the following themes:
- computer processing of music and sound: analysis, transformation and generation of music and sound, including environmental sounds and soundtracks of ecological videos, by computational approaches (algorithms, signal processing, learning) in all dimensions of music and sound (timbre, pitch, dynamics and spatialization).
- Sound and music interaction: designing new Interfaces between users and computers to create new means of musical expression, through the design of virtual/mixed/augmented sound reality systems, and new models of musical scores and instruments, through interaction with images and other media, and through the use of sound as a means of information.
- understanding and modeling of sound and music: music information retrieval, computational musicology, computational approaches to music cognition, formal models and languages for music (time and space of sound and music parameters)
- design of new tools for sound and music creation, performance and pedagogy: development of tools to assist sound design and music composition, scenarization, sonification, spatialization (includes algorithmic composition, especially by learning techniques), includes research of software architectures and languages combining micro (sound) and macro (musical form) levels, frugality of computations and transfers of sound and music data, minimization of sound transmission delay, formal specifications for tool preservation.

6-16

(2023-02-01) Post-doctoral and engineer positions@ LORIA-INRIA, Nancy, France

Automatic speech recognition for non-natives speakers in a noisy environment

Post-doctoral and engineer positions

Starting date: begin of 2023

Duration: 24 months for a post-doc position and 12 months for an engineer position

Supervisors: Irina Illina, Associate Professor, HDR Lorraine University LORIA-INRIA Multispeech Team, illina@loria.fr

Context

When a person has their hands busy performing a task like driving a car or piloting an airplane, voice is a fast and efficient way to achieve interaction. In aeronautical communications, the English language is most often compulsory. Unfortunately, a large part of the pilots are not native English and speak with an accent dependent on their native language and are therefore influenced by the pronunciation mechanisms of this language. Inside an aircraft cockpit, non-native voice of the pilots and the surrounding noises are the most difficult challenges to overcome in order to have efficient automatic speech recognition (ASR). The problems of non-native speech are numerous: incorrect or approximate pronunciations, errors of agreement in gender and number, use of non-existent words, missing articles, grammatically incorrect sentences, etc. The acoustic environment adds a disturbing component to the speech signal. Much of the success of speech recognition relies on the ability to take into account different accents and ambient noises into the models used by ARP.

Automatic speech recognition has made great progress thanks to the spectacular development of deep learning. In recent years, end-to-end automatic speech recognition, which directly optimizes the probability of the output character sequence based on the input acoustic characteristics, has made great progress [Chan et al., 2016; Baevski et al., 2020; Gulati, et al., 2020].

Objectives

The recruited person will have to develop methodologies and tools to obtain high-performance non-native automatic speech recognition in the aeronautical context and more specifically in a (noisy) aircraft cockpit.

This project will be based on an end-to-end automatic speech recognition system [Shi et al., 2021] using wav2vec 2.0 [Baevski et al., 2020]. This model is one of the most efficient of the current state of the art. This wav2vec 2.0 model enables self-supervised learning of representations from raw audio data (without transcription).

How to apply: Interested candidates are encouraged to contact Irina Illina (illina@loria.fr) with the required documents (CV, transcripts, motivation letter, and recommendation letters).

Requirements & skills:

- M.Sc. or Ph.D. degree in speech/audio processing, computer vision, machine learning, or in a related field,

- ability to work independently as well as in a team,

- solid programming skills (Python, PyTorch), and deep learning knowledge,

- good level of written and spoken English.

References

[Baevski et al., 2020] A. Baevski, H. Zhou, A. Mohamed, and M. Auli. Wav2vec 2.0: A framework for self-supervised learning of speech representations, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 2020.

[Chan et al., 2016] W. Chan, N. Jaitly, Q. Le and O. Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4960-4964, 2016.

[Chorowski et al., 2017] J. Chorowski, N. Jaitly. Towards better decoding and language model integration in sequence to sequence models. Interspeech, 2017.

[Houlsby et al., 2019] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly. Parameter-efficient transfer learning for NLP. International Conference on Machine Learning, PMLR, pp. 2790–2799, 2019.

[Gulati et al., 2020] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang. Conformer: Convolution-augmented transformer for speech recognition. Interspeech, 2020.

[Shi et al., 2021] X. Shi, F. Yu, Y. Lu, Y. Liang, Q. Feng, D. Wang, Y. Qian, and L. Xie. The accented english speech recognition challenge 2020: open datasets, tracks, baselines, results and methods. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6918–6922, 2021.

6-17

(2023-02-01) Master or engineer internship at Loria, Nancy, France

Master or engineer internship at Loria (France)

Development of language model for business agreement use cases

Duration: 6 months, starting February or March 2023

Location: Loria (Nancy) and Station F, 5 Parvis Alan Turing, 75013, Paris

Supervision: Tristan Thommen (tristan@koncile.ai), Irina Illina (illina@loria.fr) and Jean-Charles Lamirel (jean-charles.lamirel@loria.fr)

Please apply by sending your CV and a short motivation letter directly to Tristan Thommen and Irina Illina.

Motivations and context

The usage of pre-trained models like Embeddings from Language Models (ELMo) (Peters et al., 2018), Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2019), Robustly optimized BERT approach (RoBERTa) (Liu et al., 2019c), Generative Pre-Trained Transformer (GPT) (Radford et al., 2018), etc. proved to be state-of-the-art for various Natural Language Model (NLP) tasks. These models are trained on a huge unlabeled corpus and can be easily fine-tuned to various downstream tasks using task-specific datasets. Fine-tuning involves adding new task-specific layers to the model and updating the pretrained model parameters along with learning new task-specific layers.

Objectives

The goal of the internship is to develop a language model specific to business agreement use cases. This model should be able to identify and extract non-trivial information from a large mass of procurement contracts, in English and in French. This information consists, on the one hand, of simple contract identification data such as signature date, name of the parties, contract title, signatories, and on the other hand, of more complex information to be deduced from clauses, in particular, price determination according to parameters such as date or volume, renewal or expiry, obligations for the parties as well as the conditions. The difficulty of this task is that all this information is not standardized and may be represented in different ways and in different places in an agreement. For instance, a price could be based on a formula defined in the articles of the agreement and an index defined in one of its appendices.

To develop this language model we propose to fine-tune a pre-trained language model using a business agreement use dataset. The intern will identify the relevant pre-trained language model, prepare the data for training and adjust the parameters of fine-tuning.

The particularity of the internship is to use case relevant information of management of business agreements. Datasets will be constituted by Koncile’s clients and partners and developed during this internship models will be directly put into practice and tested with end users.

Koncile (link) is a start-up based in Paris, founded in 2022, that tackles the issue of mismanagement of procurements agreements by companies. It intends to leverage natural

langage processing techniques to analyze supplier contracts and invoicing. Koncile is incubated by Entrepreneur First and hosted at Station F in Paris.

Additional information and requirements

A good practice in Python and basic knowledge about Natural Langage Processing techniques are required. Some notions of machine learning is a plus, both theoretical and practical (e.g., using PyTorch).

References

Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019c). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.

Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving language understanding by generative pre-training.

6-18

(2023-02-06) Associate professor (tenure position) @Telecom Paris in Machine Learning for Socia lComputing

Faculty position (Associate professor, tenure position) at Telecom Paris in

Machine-Learning for Social Computing.

Telecom Paris has a new permanent (tenure) faculty position (Associate
Professor/ “Maître de conférences”) in the area of **machine learning for
social computing**. Applicants from the following sub-research areas are
welcome:

   -

   Neural models for the recognition and generation of socio-emotional
   behaviors
   -

   Natural language and speech processing
   -

   Dialogue, conversational systems, and social robotics
   -

   Reinforcement learning for dialogue
   -

   Sentiment analysis in social interactions
   -

   Bias and explainability in AI
   -

   Model tractability, multi-task learning, meta-learning

Salary:

between 40,58 k€ and 58,67 k€ depending on profile and experience

Important Dates

   -

   March 20th, 2023: closing date for applications
   -

   April 20th, 2023: hearings of the preselected candidates

Context

Social Computing team [1] - S²A (machine learning, statistics and signal
processing) group [2] - LTCI (laboratoire de traitement et communication de
l’information) [3] - Telecom Paris [4] .

Ecosystem

Telecom Paris [4] is a founding member of the Institut Polytechnique de
Paris <https://www.ip-paris.fr/en/> (IP Paris), a world-class scientific
and technological institution. Located at the Plateau de Saclay close to
Paris-Saclay University, this Institution is a partnership between Ecole
Polytechnique, ENSTA Paris, ENSAE Paris, Télécom Paris, Télecom SudParis,
with HEC as a key partner.

Regularly ranked as one of the best engineering schools in France, Télécom
Paris is recognized for its excellent training, its very good employability
rate with high salaries, its high-level research, and its very close
proximity to companies. The THE (Times Higher Education) ranks Télécom
Paris 2nd best French engineering school, 5th better French university,
and 6th « best small university »
<https://www.telecom-paris.fr/times-higher-education-telecom-paris-6th-best-small-university>.
The newly created institution IP Paris was ranked in the top 50 best
universities in the QS world university ranking.

In the context of the Institut Polytechnique de Paris, the activities in
Data Science and AI of the team benefit from the center Hi!Paris (
https://www.hi-paris.fr), offering seminars, workshops, and fundings
through calls for project

Main missions/Research activities

   -

   Develop groundbreaking research in the field of machine learning applied
   to Social Computing, which includes: natural language and speech
   processing, dialogue, conversational systems, and social robotics,
   reinforcement learning for dialogue, sentiment analysis in social
   interactions, bias and explainability in AI, model tractability, multi-task
   learning, meta-learning
   -

   Develop both academic and industrial collaborations on the same topic,
   including collaborative activities with other Telecom Paris research
   departments and teams (including social sciences researchers of economics
   and social sciences department [6]), and research contracts with industrial
   players
   -

   Set up research grants and take part in national and international
   collaborative research projects
   -

   Publish high-quality research work in leading journals and conferences
   -

   Be an active member of the research community (serving on scientific
   committees and boards, organizing seminars, workshops, and special
   sessions...)

Main missions/Teaching activities

Participate in teaching activities at Telecom Paris and its partner
academic institutions (as part of joint Master programs), especially in
natural language processing, speech processing, machine learning, and Data
Science, including life-long training programs (e.g. the local “Mastères
Spécialisés”)

Candidate profile

As a minimum requirement, the successful candidate will have:

   -

   A Ph.D. degree
   -

   A track record of research and publication in one or more of the
   following areas: conversational artificial intelligence, machine learning,
   natural language processing, speech and signal processing, human-agent
   interactions, social robotics
   -

   Experience in teaching
   -

   An international postdoctoral experience is welcome but not mandatory
   -

   Excellent command of English

NOTE:

The candidate does *not* need to speak French to apply, just to be willing
to learn the language (teaching will be mostly given in English)

Other skills expected include:

• Capacity to work in a team and develop good relationships with
colleagues and peers

• Excellent writing and pedagogical skills

More about the position

• Place of work: Saclay (Paris outskirts)

How to apply?

Applications must be submitted via one of the following websites:

French Version:

https://institutminestelecom.recruitee.com/o/enseignantchercheur-en-machine-learning-pour-la-modelisation-des-comportements-socioemotionnels-a-telecom-paris-cdi

English Version:

https://institutminestelecom.recruitee.com/l/en/o/enseignantchercheur-en-machine-learning-pour-la-modelisation-des-comportements-socioemotionnels-a-telecom-paris-cdi

Applicants should submit a single PDF file that includes:

- cover letter,

- curriculum vitae,

- statements of research and teaching interests (4 pages)

- three publications

- contact information for two references

Contacts: == please do not hesitate to directly contact us before applying
==

Chloé Clavel (Coordinator of the Social Computing team)

Stéphan Clémençon (Head of the S²A group)

Florence d’Alché-Buc (Head of the IDS department)

[1]
https://www.telecom-paris.fr/en/research/laboratories/information-processing-and-communication-laboratory-ltci/research-teams/signal-statistics-learning/social-computing

[2]
https://www.telecom-paris.fr/en/research/laboratories/information-processing-and-communication-laboratory-ltci/research-teams/signal-statistics-learning

[3]
https://www.telecom-paris.fr/fr/lecole/departements-enseignement-recherche/image-donnees-signal

[4]
https://www.telecom-paris.fr/en/research/laboratories/information-processing-and-communication-laboratory-ltci

[5] https://www.telecom-paris.fr/en/home
[6]
https://www.telecom-paris.fr/en/the-school/teaching-research-departments/economics-and-social-sciences

6-19

(2023-02-08) Post-doctoral and engineer positions @ LORIA-INRIA Nancy, France

Automatic speech recognition for non-natives speakers in a noisy environment

Post-doctoral and engineer positions

Starting date: beginning of 2023

Duration: 24 months for a post-doc position and 12 months for an engineer position

Supervisors: Irina Illina, Associate Professor, HDR Lorraine University LORIA-INRIA Multispeech Team, illina@loria.fr

Context

When a person has their hands busy performing a task like driving a car or piloting an airplane, voice is a fast and efficient way to achieve interaction. In aeronautical communications, the English language is most often compulsory. Unfortunately, a large part of the pilots are not native English and speak with an accent dependent on their native language and are therefore influenced by the pronunciation mechanisms of this language. Inside an aircraft cockpit, non-native voice of the pilots and the surrounding noises are the most difficult challenges to overcome in order to have efficient automatic speech recognition (ASR). The problems of non-native speech are numerous: incorrect or approximate pronunciations, errors of agreement in gender and number, use of non-existent words, missing articles, grammatically incorrect sentences, etc. The acoustic environment adds a disturbing component to the speech signal. Much of the success of speech recognition relies on the ability to take into account different accents and ambient noises in the models used by ASR.

Automatic speech recognition has made great progress thanks to the spectacular development of deep learning. In recent years, end-to-end automatic speech recognition, which directly optimizes the probability of the output character sequence based on the input acoustic characteristics, has made great progress [Chan et al., 2016; Baevski et al., 2020; Gulati, et al., 2020].

Objectives

The recruited person will have to develop methodologies and tools to obtain high-performance non-native automatic speech recognition in the aeronautical context and more specifically in a (noisy) aircraft cockpit.

This project will be based on an end-to-end automatic speech recognition system [Shi et al., 2021] using wav2vec 2.0 [Baevski et al., 2020]. This model is one of the most efficient of the current state of the art. This wav2vec 2.0 model enables self-supervised learning of representations from raw audio data (without transcription).

How to apply: Interested candidates are encouraged to contact Irina Illina (illina@loria.fr) with the required documents (CV, transcripts, motivation letter, and recommendation letters).

Requirements & skills:

- M.Sc. or Ph.D. degree in speech/audio processing, computer vision, machine learning, or in a related field,

- the ability to work independently as well as in a team,

- solid programming skills (Python, PyTorch), and deep learning knowledge,

- good level of written and spoken English.

References

[Baevski et al., 2020] A. Baevski, H. Zhou, A. Mohamed, and M. Auli. Wav2vec 2.0: A framework for self-supervised learning of speech representations, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 2020.

[Chan et al., 2016] W. Chan, N. Jaitly, Q. Le and O. Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4960-4964, 2016.

[Chorowski et al., 2017] J. Chorowski, N. Jaitly. Towards better decoding and language model integration in sequence to sequence models. Interspeech, 2017.

[Houlsby et al., 2019] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly. Parameter-efficient transfer learning for NLP. International Conference on Machine Learning, PMLR, pp. 2790–2799, 2019.

[Gulati et al., 2020] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang. Conformer: Convolution-augmented transformer for speech recognition. Interspeech, 2020.

[Shi et al., 2021] X. Shi, F. Yu, Y. Lu, Y. Liang, Q. Feng, D. Wang, Y. Qian, and L. Xie. The accented English speech recognition challenge 2020: open datasets, tracks, baselines, results, and methods. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6918–6922, 2021.

6-20

(2023-02-15) Ingenieur chef de projet, CRI Nancy, France

Ingénieur chef de projet ressources et technologies linguistiques

Centre Inria : CRI Nancy - Grand Est

Ville : Nancy, France
Date de prise de fonction souhaitée : 2023-04-03
Type de contrat : CDD 4 ans
Niveau de diplôme exigé : BAC+5 ou équivalent
Niveau d’expérience souhaité : de 3 à 5 ans

Pour postuler : https://recrutement.inria.fr/public/classic/fr/offres/2023-05788
Pour plus d’informations, contacter : Slim.Ouni@loria.fr

Description complète du poste :
https://recrutement.inria.fr/public/classic/fr/offres/2023-05788

Poste : Ingénieur chef de projet ressources et technologies linguistiques

CONTEXTE

Ce poste se place dans le cadre du Défi Inria COLaF (Corpus et Outils pour les Langues de France), qui est une collaboration entre les équipes ALMAnaCH et MULTISPEECH. L’objectif du Défi est de développer et mettre à disposition des technologies numériques linguistiques pour la francophonie et les langues de France, en contribuant à la création de corpus de données inclusifs, de modèles, et de briques logicielles. L’équipe ALMAnaCH focalise sur le texte et l’équipe MULTISPEECH sur la parole multimodale. Les deux principaux objectifs de ce projet sont :

(1) La collecte de corpus de données francophones, massifs et inclusifs : Il s’agit de constituer de très grands corpus textuels et de parole, avec des métadonnées riches pour améliorer la robustesse des modèles face à la variation linguistique, avec une place particulière pour la variation géographico-dialectale dans le contexte de la francophonie, dont une partie pourra être multimodale (audio, image, vidéo), voire spécifique à la langue des signes française (LSF). Les données liées à la parole multimodale concerneront entre autres les dialectes, les accents, la parole des personnes âgées, des enfants et des adolescents, la LSF et les autres langues largement parlées en France.

La collecte de corpus sera basée prioritairement sur les données existantes. Ces données (parole multimodale) peuvent provenir des archives de l’INA et des radio-télévisions régionales ou étrangères, mais rarement sous une forme directement exploitable, ou bien auprès des spécialistes, mais sous forme de petits corpus dispersés. La difficulté consiste d’une part à identifier et pré-traiter les données pertinentes afin d’obtenir des corpus homogènes, et d’autre part à clarifier (et si possible assouplir) les contraintes légales et les contreparties financières régissant leur usage afin d’assurer l’impact le plus large possible. Lorsque les contraintes légales ne permettent pas d’utiliser les données existantes, un effort supplémentaire de collecte de données sera nécessaire. Ce sera probablement le cas des enfants (applications à l’éducation) et les personnes âgées (applications à la santé). Selon la situation, cet effort sera sous-traité à des linguistes de terrain ou mènera à une campagne à grande échelle. Cela sera conduit en collaboration avec Le VoiceLab et la DGLFLF.

(2) Le développement et la mise à disposition de technologies linguistiques inclusives : Les technologies linguistiques considérées dans ce projet par l’équipe MULTISPEECH sont la reconnaissance et la synthèse de la parole, et la génération de la langue des signes. De nombreuses technologies sont déjà commercialisées. Il s’agit donc de ne pas réinventer ces outils, mais leur apporter les modifications nécessaires, afin qu’ils puissent exploiter les corpus inclusifs créés. Les technologies qui seront utilisées dans le cadre de ce projet portent sur, y compris, mais sans s’y limiter, les tâches suivantes :

• Identification et prétraitement (semi-)automatique des données pertinentes au sein de masses de données existantes. Cela inclut la détection et le remplacement d’entités nommées à des fins d’anonymisation.

• Architectures neuronales et approches adaptées aux scénarios à faibles ressources (augmentation de données, apprentissage par transfert, apprentissage faiblement/non supervisé, apprentissage actif, et combinaison entre ces diverses formes d’apprentissage)

MISSIONS

L’ingénieur chef de projet aura deux missions principales :

• La gestion du projet et la coordination pratique de la contribution de l’équipe MULTISPEECH au Défi Inria. L’ingénieur chef de projet travaillera en étroite collaboration avec un ingénieur « junior », un chercheur et deux doctorants, tous travaillant dans le cadre de ce projet. Il assurera un encadrement rapproché de l’ingénieur « junior » et une interaction très fréquente avec le chercheur et les doctorants. Il sera en contact également avec les membres de l’équipe MULTISPEECH. Il y aura certainement une concertation et une collaboration solide avec son homologue au sein de l’équipe ALMAnaCH.

• La collecte de données et création de corpus de parole multimodale (cela comprend : certains dialectes, les accents, les personnes âgées, les enfants et adolescents, la LSF et certaines langues largement parlées en France autre que le français). Une grande partie de la collecte des données se fera auprès d’associations de locuteurs, des producteurs de contenus et tout partenaire pertinent pour la récupération de données. L’ingénieur chef de projet sera amené à discuter, notamment les aspects juridiques, avec nos interlocuteurs.

ACTIVITES PRINCIPALES

• Définition des différents types de corpus à collecter (identifier les corpus potentiellement exploitables, établir une priorité et un planning de collecte)

• Collecte de corpus de parole auprès de producteurs de contenus ou de tout autre partenaire. (s'assurer que les données respectent les normes et les standards de qualité)

• Négociation des contrats d'utilisation des données, en veillant à respecter les aspects juridiques (négocier les conditions d'utilisation des données avec les producteurs de contenus ou les partenaires, en veillant à ce que les droits de propriété intellectuelle soient respectés et que les aspects juridiques soient pris en compte).

• Création et mise à disposition des technologies linguistiques pour le traitement de ces corpus : une fois collectées, les données doivent être analysées et traitées de manière à en extraire des informations utiles. L’ingénieur chef de projet doit proposer des technologies et des outils parmi l’existant, nécessaires à cette analyse, et s'assurer qu'ils sont accessibles aux utilisateurs.

• Encadrement rapproché de l’ingénieur junior : accompagnement et conseil au niveau des choix techniques et stratégiques de développement.

• Concertation et animation des échanges entre les membres du projet : (1) avec le chercheur et les deux doctorants (réflexions et échanges sur les données, et leurs adéquations au Défi.) ; (2) coordination avec les membres du projet au sein de l’équipe ALMAnaCH.

• Veille technologique, en particulier dans le domaine du ce défi.

• Rédaction et présentation de documentation technique

Note : Il s’agît ici d’une liste indicative d’activités qui pourra être adaptée dans le respect de la mission telle que libellée plus haut.

COMPETENCES

PROFIL RECHERCHE :

• Diplômé en informatique, linguistique ou toute autre formation relevant du domaine du traitement automatique de la parole ou des langues

• Expérience confirmée en gestion de projet et en communication

• Connaissance approfondie des technologies linguistiques

• Capacité à travailler en équipe et à respecter les délais

• Bonne connaissance de l'anglais

SAVOIRS

• Capacité à rédiger, à publier et à présenter en français et en anglais

• Maitrise des techniques de conduite des projets et de négociation

• Bases juridiques (données personnelles, propriété intellectuelle, droit des affaires)

SAVOIR-FAIRE

• Capacités d'analyse, rédactionnelles et de synthèse

• Savoir accompagner et conseiller

• Savoir développer un réseau relationnel

• Savoir mener de front différents projets en même temps

• Capacités de négociation

SAVOIR-ÊTRE

• Sens des responsabilités et autonomie

• Sens du contact et goût pour le travail en équipe

• Rigueur, sens des priorités et du reporting

• Qualités relationnelles (écoute- diplomatie- pouvoir de conviction)

• Appétence pour la négociation (Le VoiceLab, DGLFLF, etc.)

• Capacité d’anticipation

• Esprit d’initiative et curiosité d’esprit

INFORMATIONS COMPLEMENTAIRES
Poste à temps complet, à pourvoir dès que possible.
Rémunération selon l’expérience.
Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.

A PROPOS D'INRIA

Inria est l’institut national de recherche en sciences et technologies du numérique. La recherche de rang mondial, l’innovation technologique et le risque entrepreneurial constituent son ADN. Au sein de 200 équipes-projets, pour la plupart communes avec les grandes universités de recherche, plus de 3 500 chercheurs et ingénieurs y explorent des voies nouvelles, souvent dans l’interdisciplinarité et en collaboration avec des partenaires industriels pour répondre à des défis ambitieux. Inria soutient la diversité des voies de l’innovation : de l’édition open source de logiciels à la création de startups technologiques (Deeptech).

A PROPOS DU CENTRE INRIA NANCY – GRAND EST

Le centre Inria Nancy – Grand-Est est un des huit centres d’Inria regroupant 400 personnes, réparties dans 22 équipes de recherche, et 8 services d’appui à la recherche. Toutes ces équipes de recherche sont communes avec des partenaires académiques, et trois d’entre elles sont basées à Strasbourg.

Ce centre de recherche est un acteur majeur et reconnu dans le domaine des sciences numériques. Il est au cœur d'un riche écosystème de R&D et d’innovation : PME fortement innovantes, grands groupes, start-up, incubateurs & accélérateurs, pôles de compétitivité, acteurs de la recherche et de l’enseignement supérieur, instituts de recherche technologique.

ENVIRONNEMENT DE TRAVAIL

L’ingénieur chef de projet travaillera au sein de l’équipe projet MULTISPEECH au Centre de recherche Inria Nancy. Les recherches de MULTISPEECH sont centrées sur la parole multimodale, notamment sur son analyse et sa génération dans le contexte de l'interaction homme-machine. Un point central de ces travaux est la conception de modèles et de techniques d'apprentissage automatique pour extraire des informations sur le contenu linguistique, l'identité et les états du locuteur, et l'environnement de la parole, et pour synthétiser la parole multimodale en utilisant des quantités limitées de données étiquetées.

Pour postuler
https://recrutement.inria.fr/public/classic/fr/offres/2023-05788

6-21

(20323-02-17) Internship Ingénieur.e de recherche NLP, LUNII, Paris

Lunii : site carrière

TECH · LUNII PARIS · TÉLÉTRAVAIL HYBRIDE

Ingénieur.e de recherche NLP - [Stage - 6 mois]

Lunii, c'est une aventure humaine et entrepreneuriale, lancée en août 2016 * Ma Fabrique à Histoires*, un objet littéraire, technologique et ludique pour les enfants de 3 à 8 ans.

Vos missions

Vous rejoindrez le Pôle Tech pour participer à un projet de recherche appliquée autour de la synthèse vocale narrative. La synthèse vocale a connu des avancées spectaculaires grâce à l’utilisation de réseaux de neurones profonds, mais les procédures de préparation et d’étiquetage des données d’apprentissage sont encore très chronophages. Pour répondre à cette problématique, vous contribuerez principalement à l’amélioration d’outils d’analyse et d’étiquetage automatique dans le cadre de la préparation d’un corpus de parole pour un système de synthèse vocale.

Vous aurez pour missions de :

👩‍🎓 Étudier et améliorer les phonétiseurs - aligneurs existants

Inventorier, tester, évaluer et comparer les phonétiseurs - aligneurs existants,
Constituer un corpus d’apprentissage précis destiné à la phonétisation automatique du texte accompagné de l’audio correspondant,
Adapter et entraîner un modèle neuronal pour la phonétisation et l’alignement texte / audio, dans le but d’améliorer l’existant,
Évaluer ce modèle et le comparer à l’existant,
Publier les résultats dans des articles de conférence.

🙋 Évaluer et comparer les méthodes d’analyse structurelle d’une histoire

Inventorier, tester, évaluer et comparer les méthodes et outils de détection des tours de parole à partir du texte,
Explorer la détection automatique des personnages et de leurs tours de parole respectifs,
Inventorier, tester, évaluer et comparer les méthodes et outils d’analyse structurelle d’une histoire à partir du texte.

💃 Constituer un corpus de parole narrative

Utiliser les outils développés pour procéder à la phonétisation, l’alignement texte / audio, et l’étiquetage automatique de livres audio,
Vérifier et corriger les erreurs d’alignement et de phonétisation avec les outils de corrections manuelles existants,
Évaluer le gain de temps, en termes de corrections manuelles, apporté par le nouveau modèle, comparé à l’existant.

Liste non exhaustive.

Lunii recrute et reconnaît tous les talents : nous sommes profondément attaché·e·s à la mixité et à la diversité, on vous attend !

Profil recherché

De fortes connaissances en NLP / Machine Learning / Traitement du signal.
D’excellentes capacités de programmation (Python)
Un intérêt prononcé pour les sciences de la parole.
Une familiarité avec un ou plusieurs frameworks de machine learning (TensorFlow, PyTorch, etc…)
Une personne dynamique et force de propositions.
De bonnes capacités de communication orale et écrite, en français comme en anglais.

Process de recrutement

Envoyez-nous votre CV, nous prenons ensuite le temps de bien étudier votre candidature et si elle correspond à l’offre, nos échanges (en visio ou présentiel) continuent :

D’abord avec Mélissa, notre HR Learning & Development, pour un premier échange,
Ensuite avec Mélissa, Samuel, chercheur en synthèse vocal et Ludi, CTO avec un cas pratique à l’honneur

Informations complémentaires

Contrat : Stage conventionné de 6 mois
Rémunération : 1000€ brut/mois
Titre de transport : 50% pris en charge par Lunii
Titres restaurant : 8,50€/jour
Télétravail hybride

6-22

(2023-02-19) Two professorships @ Technische Universität Darmstadt, Germany

Technische Universität Darmstadt is one of Germany’s leading technical universities with broad excellence in research, an interdisciplinary profile, and an explicit focus on engineering sciences.

The department of Electrical Engineering and Information Technology invites applications for a

Full Professorship (W3) or Assistant Professorship (W2 with Tenure Track) for Signal Theory and Statistical Learning (Code No 6)

We are looking for an excellent researcher with credentials in at least one of the following areas: •Theory and methods of statistical inference •Robust statistical signal processing • Statistical learning theory •Theoretical performance analysis and interpretability of signal processing methods •Guarantees and interpretability of statistical learning methods •New trends in statistical signal and learning theory Application deadline: March 15, 2023 All further information about the position and application process can be found under:

https://www.tu-darmstadt.de/universitaet/karriere_an_der_tu/stellenangebote/aktuelle_stellenangebote/stellenausschreibungen_detailansichten_1_502208.en.jsp

6-23

(2023-02-28) Maitre de conferences, Paris-Saclay/LISN, Orsay, France

L'Université Paris-Saclay recrute un·e Maître·sse de Conférences en informatique (27ème section) pour la rentrée 2023. La personne recrutée travaillera au LISN, le Laboratoire Interdisciplinaire des Sciences du Numériques de l'Université Paris-Saclay, dans le département Science et Technologies des Langues (STL). Le profil recherche porte sur le traitement la langue multimodale. L'enseignement se fera à l'UFR de Sciences.

N'hésitez pas à me contacter ou à contacter un membre du département Science et Technologies des Langues si vous avez des questions. Vous trouverez ci-dessous des détails sur les profils recherche et enseignement.

Cordialement,

Gilles Adda

Enseignement

La personne recrutée pourra enseigner dans toutes les filières relevant du département informatique de la Faculté des Sciences d?Orsay, au niveau Licence et Master (classique et en apprentissage). Elle devra enseigner dans les domaines à renforcer en base de donnée et sciences des données. Elle pourra enseigner dans ses domaines d?intérêts bien entendu.

L?enseignement constitue l?une des missions qui fonde l?université. Les questions de la qualité d?une formation dispensée et de la qualité des apprentissages des étudiants sont plus que jamais au c?ur des préoccupations de l?Université Paris Saclay. A ce titre, le profil enseignement de ce poste inclut une capacité à concevoir les séquences d?enseignement selon des objectifs d?apprentissage et des compétences explicites, et éventuellement à expérimenter des modalités pédagogiques innovantes.

La personne recrutée sera également amenée à participer rapidement à la vie de l?établissement (gestion de filière, implication dans l?une des structures de l?université,?). Une expérience en termes de responsabilités collectives est vivement souhaitée.

Le ou la candidate devra clairement indiquer son projet d'intégration en matière d'enseignement, dans le cadre de l'offre de formation de l'université et en accord avec le département Informatique de la Faculté des sciences.

Recherche

La candidate ou le candidat développera ses activités de recherche au sein du Laboratoire Interdisciplinaire des sciences du numérique (LISN - UMR9015) implanté sur l'université Paris-Saclay.

La candidate ou le candidat intégrera le département Sciences et Technologies des Langues (STL) et renforcera les activités orientées vers le traitement du langage multimodal.

Le langage naturel inclut les modalités écrites, parlées et signées ; il peut également s'accompagner d?attitudes sociales et de dimensions non verbales. Le traitement du langage relève alors du traitement conjoint de multiples canaux d?informations. De plus, il est souvent utilisé pour décrire des concepts et désigner des entités qui sont essentiellement multimodales (description d'une image, d'un événement, etc.). Les sujets intéressant le laboratoire autour de cette problématique sont :

le traitement automatique des langues des signes (modélisation, reconnaissance, génération, traduction, etc.) ;
le développement de modèles conjoints ou de transfert d?une modalité à une autre ;
les recherches en traitement automatique des langues prêtes à s?élargir pour tenir compte de plusieurs modalités comme par exemple :

la reconnaissance et la synthèse automatiques de la parole;
le dialogue, les systèmes de questions-réponses ;
la compréhension, la traduction et le résumé de documents ;

-- 

|Depuis le 1er janvier 2021, le LIMSI a fusionné avec le LRI et est devenu le LISN (Laboratoire Interdisciplinaire des Sciences du Numérique)
|Since January 1st 2021, LIMSI merged with the LRI lab and became the LISN (Interdisciplinary Computer Science Laboratory)
|
- 
|
|Gilles Adda
|  responsable du département Sciences et Technologies des Langues
|  head of Language Science and Technology Department
|  http://www.limsi.fr/Individu/gadda/
|

6-24

(2023-03-02) Full Professor in Computer Sciences @Grenoble-INP Phelma/Gipsa-Lab, Grenoble, France

Grenoble-INP Phelma is recruiting in 2023 a Full Professor in Computer Sciences (Section CNU 27). The host research laboratory will be GIPSA-lab (UMR 5216). The research profile is entitled 'Computer Science and Learning for Image and Signal Processing' and covers all the scientific themes of GIPSA-lab related to information processing, including automatic speech and language processing, for an affiliation to the « Speech and Cognition » group and the CRISSP (Cognitive Robotics, Interactive Systems and Speech Processing) team. The job description is available at https://phelma.grenoble-inp.fr/fr/l-ecole/concours-enseignants-chercheurs-2023

Contacts : Nicolas Marchand, Laurent Girin, Thomas Hueber (firstname.lastname@gipsa-lab.grenoble-inp.fr)

6-25

(2023-03-15) PhD student in Phonetics, Stockholm, Sweden

PhD student in Phonetics

Stockholm

Ref. No. SU FV-0793-23

at the Department of linguistics. Closing date: 15 april 2023.

The Department of Linguistics conducts research and offers education in a number of areas such as child language development, computational linguistics, general linguistics, multilingualism in the deaf and hard of hearing, phonetics and sign language. The department hosts three infrastructures, Språkstudion Language Learning Resource Centre, the Phonetics Laboratory and SUBIC Stockholm University Brain Imaging Center, and several research groups conduct research in experimental paradigms related to the mentioned infrastructures.

Project description
The position is linked to the research profile phonetics. This implies that the research plan attached to the application must have an experimental phonetics topic linked to research on acoustic and/or physiological (e.g. breathing, phonation, articulation) aspects of speech and conversation conducted at the Department of Linguistics.

Qualification requirements
In order to meet the general entry requirements, the applicant must have completed a second-cycle degree, completed courses equivalent to at least 240 higher education credits, of which 60 credits must be in the second cycle, or must have otherwise acquired equivalent knowledge in Sweden or elsewhere.

In order to meet the specific entry requirements for doctoral studies in linguistics, the general syllabus stipulates that an applicant must have received a passing grade on course work of at least 30 higher education credits from the second cycle in Linguistics, including a degree project of at least 15 credits on a topic relevant to the proposed research plan. In addition, the applicant is required to be proficient in the language of the proposed doctoral thesis (English, Swedish or Swedish Sign Language). Proficiency is demonstrated in the research plan and other relevant parts of the application (such as undergraduate theses, publications, grades, certificates and an interview).

The entry requirements can also be met by students who have acquired equivalent knowledge in Sweden or elsewhere. Assessment of eligibility is decided on in accordance with the department's local admission procedure and the department's decision and delegation procedure.

The qualification requirements must be met by the deadline for applications. Induvidual plan for studies in linguistics (in swesdish).

Selection
Selection among the eligible candidates will be based on their capacity to benefit from the training. Selection is made by the Department Board, applying the following assessment criteria:

Education in general. Assessment is made with regard to both depth and breadth in previous education
Scholarly production. On the basis of the applicant’s degree projects from the first and second cycle and, where applicable, other scholarly production, the applicant’s ability to benefit from the training is assessed according to the following criteria: critical ability, analytical skills, creativity, independence and scholarly precision. The time aspect is also considered, that is, to what extent the applicant has demonstrated an ability to complete previous academic projects within specified time limits. In addition, based on a comparison of previous academic output, an assessment is made of the applicant’s academic development
The applicant must describe his/her proposed field of research in a research plan. The dissertation project must focus on phonetics. The plan should not exceed 3 numbered pages A4 in Arial, font size 11, with single line spacing and 2.5 cm margins, references and any images and examples included. The research plan should contain one or more research problems and an outline of the research project. The research plan is assessed on the basis of: relevance, originality, and potential for completion within the specified time limits (i.e. a period equivalent to four years of full-time study for a doctoral degree)
Available supervisor resources
Teamwork skills. The applicant's ability to collaborate is assessed on the basis of, for example, references, certificates or interviews.

In selecting applicants for postgraduate education in linguistics, the department board must take into account rules and regulations of the Faculty of Humanities. In addition to the above selection criteria, the following will be of great importance in the assessment:

Relevance of the proposed research project to the department's research environment
Experience in research-related work within phonetics.

Admission Regulations for Doctoral Studies at Stockholm University are available at: www.su.se/rules and regulations.

Terms of employment
Only a person who will be or has already been admitted to a third-cycle programme may be appointed to a doctoral studentship.

The term of the initial contract may not exceed one year. The employment may be extended for a maximum of two years at a time. However, the total period of employment may not exceed the equivalent of four years of full-time study.

Doctoral students should primarily devote themselves to their own education, but may engage in teaching, research, and administration corresponding to a maximum of 20 % of a full-time position. For this particular position, the doctoral student is expected to perform departmental duties corresponding to 20 % of full time. Where applicable, the total time of the appointment is extended to correspond to a full-time doctoral programme for four years.

The proportion of departmental duties may be unevenly distributed across the duration of the doctoral programme.

Please note that admission decisions cannot be appealed.

Stockholm University strives to be a workplace free from discrimination and with equal opportunities for all.

Contact
For more information, please contact the Head of Department Mattias Heldner, telephone: +46 8 16 19 88, mattias.heldner@ling.su.se. For questions about the doctoral programme, contact the Director of Studies for postgraduate education, Bernhard Wälchli, +46 8 16 23 44, studierektorfu@ling.su.se.

Union representatives
Ingrid Lander (Saco-S), telephone: +46 708 16 26 64, saco@saco.su.se, Alejandra Pizarro Carrasco (Fackförbundet ST/OFR), telephone: +46 8 16 34 89, alejandra@st.su.se, seko@seko.su.se (SEKO), and PhD student representative, doktorandombud@sus.su.se.

Application
Apply for the PhD student position at Stockholm University's recruitment system. It is the responsibility of the applicant to ensure that the application is complete in accordance with the instructions in the job advertisement, and that it is submitted before the deadline.

Please include the following information with your application

Your contact details and personal data
Your highest degree
Your language skills
Contact details for 2–3 referees (please, specify their relationship to you as an applicant and what they are expected to testify to or comment on)

and, in addition, please include the following documents

Cover letter
CV – degrees and other completed courses, work experience and a list of degree projects/theses
Dissertation plan/research proposal, inclusive of the following:
- research question(s)
- brief background (research context and relationship to own interests/qualifications)
- method and data
- expected results (specify the scope, especially if the results are mainly descriptive)
- time plan

Note that the proposal must address the following questions: why your project is suitable to be carried out at the Department of Linguistics at Stockholm University, how you intend to contribute to the research environment at the department with your research project, what makes you particularly suitable (to carry out the proposed research project).

Degree certificates and grades confirming that you meet the general and specific entry requirements (no more than 6 files)
Degree projects/theses (no more than 6 files).

The instructions for applicants are available at: How to apply for a position.

You are welcome to apply!

Stockholm University contributes to the development of sustainable democratic society through knowledge, enlightenment and the pursuit of truth.

Closing date: 15/04/2023

URL to this page
https://www.su.se/english/about-the-university/work-at-su/available-jobs/phd-student-positions-1.507588?rmpage=job&rmjob=20262&rmlang=UK

6-26

(2023-03-06) 2 open postdoc position at the LISN (ex-LIMS), Paris, France

We have currently 2 open postdoc position at the LISN (ex-LIMS). You can apply online
and get more details here:
- multilingual ASR
https://emploi.cnrs.fr/Offres/CDD/UMR9015-LUCOND-003/Default.aspx?lang=EN
- ASR for low computational resource environment
https://emploi.cnrs.fr/Offres/CDD/UMR9015-LUCOND-002/Default.aspx?lang=EN

Feel free to contact me if you have any questions.

6-27

(2023-03-08) PhD in ML/Speech Processing @LIA, Avignon, France

PhD in ML/Speech Processing ? Speaker recognition systems against voice attacks : voices synthesis and voice transformation

Starting date: September 1st, 2023 (flexible)

Application deadline: July 10th, 2023

Interviews (tentative): July 15th, 2023

Salary: ~2000? gross/month (social security included)

Mission: research oriented (teaching possible but not mandatory)

Keywords: speech processing, automatic speaker recognition, anti-spoofing, deep neural network

CONTEXT

It is now widely accepted that automatic speaker recognition (ASV) systems are vulnerable not only to speech produced artificially by text-to-speech (TTS) [1], but also to other forms of attacks such as voice conversion (VC) and replay [2]. Voice conversion can be used to manipulate the voice identity of a speech signal, has progressed extremely rapidly in recent years [3], and has indeed become a serious threat.

The progress made in recent years in deep neural networks training has enabled spectacular advances in the fields of text-to-speech (TTS) and voice conversion (VC): DeepVoice, Tacotron 1 and 2 [4], Auto-VC [5,6]. Existing architectures now make possible producing synthesized or manipulated artificial voices with a realism close to or equal to that of human voices [4]. At the same time, voice conversion algorithms (from one speaker to another) have also made spectacular advances. It now becomes possible to clone a voice identity using a small amount of data. In the space of two years, extremely significant advances have been made [5,6,7]. The ability of these algorithms to forge voice identities capable of deceiving speaker recognition and counter-measure systems is an urgent topic of research.

Progress in terms of the fight against identity theft has been led by the initiative of the ASVspoof community, formed in 2013 and recognized as competent at the international level [8]. The most significant efforts have been made at the level of the acoustic parametrization (front-end) making it possible to better differentiate authentic (human) utterances from fraudulent utterances. The best performing system [9], which combines acoustic parameters based on Cepstrum-Mel, Cepstrum based on cochlear filters and instantaneous frequencies using a classifier based on a Gaussian mixture model, obtained the best performance.

For the past years, research efforts have focused on the back-end. As in speaker recognition research, the anti-spoofing community has embraced the power of deep learning and, unsurprisingly, the neural architectures used are almost the same. Advances in anti-spoofing have followed the rapid advances in TTS and VC. The best anti-spoofing system again used traditional acoustic parameters, with a classifier based on ResNet-18 [10].

SCIENTIFIC OBJECTIVES

As part of this thesis, the robustness of existing countermeasures against new forms of adversarial attacks designed specifically to deceive them will be assessed. One of the advances expected in this thesis will focus on the design of new countermeasures to detect such emerging, increasinly adversarial attacks. To do this, two avenues will be explored. The first is to redesign front-end feature extraction to capture cues that characterize adversarial attacks, then use them with re-trained classifiers. As it is not always easy to identify reliable characteristics, the second direction will aim at the adoption of end-to-end architectures able to learn characteristics automatically. Although these advances improve robustness to adversarial attacks, it will be important to ensure that the resulting countermeasures remain robust to previous attacks. This is known as the problem of the generalization. An effective anti-spoofing countermeasure must reliably detect any form of attack it encounters, not just the specific attacks it is trained to detect. Finally, improving adversarial attack detection performance should not come at the cost of increased false positives (genuine speech labeled as spoofed speech), which can hurt usability and convenience. The progress and results targeted in this thesis will therefore be countermeasures capable of defending speaker recognition systems against adversarial and non-adversarial attacks.

In parallel to this competition between research teams specializing in attacks and research teams specializing in counter-attacks, the speaker recognition community is focused on the creation and design of high-performance systems that are robust to acoustic variability. Recognition systems are trained to recognize speakers in increasingly difficult conditions (presence of several types of noise: additive, reverb, etc.). This robustness against difficult acoustic conditions can lead to weakness against recordings of attacks that were not taken into account during training. Of course this vulnerability can be reduced by using countermeasures (CM) systems. This approach can impact the usability of ASV systems since the countermeasures can also reject genuine clients (authentic users). This thesis will therefore go beyond the state of the art by optimizing both the ASV and the CM system, so that they work together to achieve the best possible compromise between security and usability/convenience.

REQUIRED SKILLS

- Master 2 in speech processing, computer science or data science

- Good mastering of Python programming and deep learning framework

- Previous experience in bias in machine learning would be a plus

- Good communication skills in English

- Good command of French would be a plus but is not mandatory

LAB AND SUPERVISION

The PhD position will be co-supervised by Nicholas Evans from EURECOM and Driss Matrouf from LIA-Avignon. Joint meetings are planned on a regular basis and the student is expected to spend time in LIA-Avignon. The students, along with the partners (IRCAM specialized in attack generation and EURECOM specialized in countermeasures) will closely collaborate.

INSTRUCTIONS FOR APPLYING

Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Driss Matrouf (driss.matrouf@univ-avignon.fr), Mickael Rouvier (mickael.rouvier@univ-avignon.fr) and Nicholas Evans (evans@eurecom.fr).

REFERENCES

[1] https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-1567157402

[2] N. Evans, T. Kinnunen and J. Yamagishi, ?Spoofing and countermeasures for automatic speaker verification? in Proc. Interspeech 2013 Aug 25 (pp. 925-929).

[3] Z. Yhi et al. (2020) Voice Conversion Challenge 2020- Intra-lingual semi-parallel and cross-lingual voice conversion . SCA Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020.

[4] J. Shen et al, Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Prediction, ICASSP,2018.

[5] Qian, K., Zhang, Y., Chang, S., Yang, X., and Hasegawa- Johnson, M. AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss. In International Conference on Machine Learning (ICML), pp. 5210?5219, 2019

[6] Zhang, J.-X., Ling, Z.-H., and Dai, L.-R. Non-Parallel Sequence-to-Sequence Voice Conversion With Disentan- gled Linguistic and Speaker Representations. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 28:540?552, 2020.

[7] Jia, Y., Zhang, Y., Weiss, R., Wang, Q., Shen, J., Ren, F., ... & Wu, Y. (2018). Transfer learning from speaker verification to multispeaker text-to-speech synthesis. In Advances in neural information processing systems (pp. 4480-4490).

[8] N. Evans, T. Kinnunen and J. Yamagishi, ?Spoofing and countermeasures for automatic speaker verification? in Proc. Interspeech 2013,

pp. 925-929, 2013

[9] T. B. Patel, H. A. Patil, ?Combining Evidences from Mel Cepstral, Cochlear Filter Cepstral and Instantaneous Frequency Features for Detection of Natural vs. Spoofed Speech?, in Proc. INTERSPEECH 2015, pp. 2062-2066, 2015

[10] X. Cheng, M. Xu, and T. F. Zheng, ?Replay detection using CQTbased modified group delay feature and ResNeWt network in ASVs poof 2019?, in Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 540?545, 2019

6-28

(2023-03-09) Doctoral position : Acoustic to Articulatory Inversion by using dynamic MRI images @LORIA, Nancy, France

Doctoral position : Acoustic to Articulatory Inversion by using dynamic MRI images

Loria ?Lorraine Research Laboratory in Computer Science and its Applications? is a research unit common to CNRS, the Université de Lorraine and INRIA. Loria gathers 450 scientists and its missions mainly deal with fundamental and applied research in computer sciences, especially the MultiSpeech Team which focuses automatic speech processing, audiovisual speech and speech production. IADI is a research unit common to Inserm the Université de Lorraine whose specialty is developing various techniques and methods to improve imaging of moving organs via the acquisition of MR images.

This PhD project founded by LUE (Lorraine Université d?Excellence) associates the Multispeech team and the IADI laboratory.

Start date is (expected to be) 1st septembre 2023 or as soon as possible thereafter.

Supervisors

Yves Laprie, email yves.laprie@loria.fr

Pierre-André Vuissoz, email pa.vuissoz@chru-nancy.fr

The project

Articulatory synthesis mimics the speech production process by first generating the shape of the vocal tract from the sequence of phonemes to be pronounced, then the acoustic signal by solving the aeroacoustic equations. Compared to other approaches to speech synthesis which offer a very high level of quality, the main interest is to control the whole production process, beyond the acoustic signal alone.

The objective of this PhD is to succeed in the inverse transformation, called acoustic to articulatory inversion, in order to recover the geometric shape of the vocal tract from the acoustic signal. A simple voice recording will allow the dynamics of the different articulators to be followed during the production of the sentence.

Beyond its interest in terms of scientific challenge, articulatory acoustic inversion has many potential applications. Alone, it can be used as a diagnostic tool to evaluate articulatory gestures in an educational or medical context.

Description of work

The objective is the inversion of the acoustic signal to recover the temporal evolution of the medio-sagittal slice. Indeed, dynamic MRI provides two-dimensional images in the medio-sagittal plane at 50Hz of very good quality and the speech signal acquired with an optical microphone can be very efficiently deconstructed with the algorithms developed in the MultiSpeech team (examples available on https://artspeech.loria.fr/resources/). We plan to use corpora already acquired or in the process of being acquired. These corpora represent a very large volume of data (several hundreds of thousands of images) and an approach for tracking the contours of articulators in MRI images which gives very good results was developed to process corpora. The automatically tracked contours can therefore be used to train the inversion. The goal is to perform the inversion using the LSTM approach on data from a small number of speakers for which sufficient data exists. This approach will have to be adapted to the nature of the data and to be able to identify the contribution of each articulator. In itself, successful inversion to recover the shape of the vocal tract in the medio-sagittal plane will be a remarkable success since the current results only cover a very small part of the vocal tract (a few points on the front part of the vocal tract). However, it is important to be able to transpose this result to any subject, which raises the question of speaker adaptation, which is the second objective of the PhD.

What we offer

A position funded by LUE (Lorraine Université d?Excellence) at a leading technical university that generates knowledge and skills for a sustainable futur.
A very complementary scientific environment of the two teams (MultiSpeech and IADI) in all fields of MRI and anatomy in the IADI laboratory and in deep learning and speec processing in the MultiSpeech team of Lori.
Engaged and ambitious colleagues along with a creative, international and dynamic working environmen.
At Loria, there are lively research groups in a number of areas, for example natural language processing, deep learning, computer graphics, robotics? At the moment, there are about 150 PhD students at Loria and IADI.
Works in the very center of Europe in close proximity to nature.
Help to relocate and be settled in France and at Université de Lorraine.

Supervisors

Yves Laprie, email yves.laprie@loria.fr

Pierre-André Vuissoz, email pa.vuissoz@chru-nancy.fr

Application

Your application including all attachments must be in English and submitted electronically by clicking APPLY NOW below.

Please include:

Motivated letter of application (max. one page)
Your motivation for applying for the specific PhD project
Curriculum vitae including information about your education, experience, language skills and other skills relevant for the position
Publication list (if possible)
Reference letters (if available)

The deadline for applications is April 15 2023, 23:59 GMT +2.

log into Inria?s recruitment system(https://jobs.inria.fr/public/classic/en/offres/2023-05790) in order to apply to this position.

6-29

(2023-03-10) Ingénieur d'étude, GIPSA LAB, Grenoble, France

Le service Plateformes du laboratoire GIPSA-LAB (CNRS/G-INP/UGA, Grenoble) recrute un.e ingénieur.e d?étude en instrumentation.
Il/Elle interviendra en support pour les plateformes expérimentales du laboratoire, et en particulier celles en lien avec l'aéro-acoustique, l'automatique ou la robotique. Il/elle aura notamment en charge la responsabilité technique de la plateforme AERO, dédiée à l'étude des phénomènes aéro-acoustiques de la production de parole : réalisation de l'interfaçage d'instrumentations, gestion et entretien du parc d'appareils de mesures, préparation et participation aux expériences scientifiques.

Pour plus d'information, la fiche de poste détaillée est disponible ici : https://filesender.renater.fr/?s=download&token=2ae39938-121d-44c4-b6ce-385029ecf331

N'hésitez pas à me contacter si besoin d'information complémentaire.

Coriandre VILAIN

-- 
Coriandre Vilain
Ingénieur de Recherche UGA
Equipe PCMD, Pôle Parole & Cognition, GIPSA-LAB 
--
Responsable Service Plateformes GIPSA
--
04 76 82 77 80
www.gipsa-lab.fr/~coriandre.vilain

6-30

(2023-03-12) Associate-Assistant Professor in artificial intelligence for semantic and multi-modal multimedia analysis. Telecom Sud Paris, France

Dear All,

Telecom SudParis welcomes applications for a permanent position of Associate-Assistant Professor in artificial intelligence for semantic and multi-modal multimedia analysis. Telecom SudParis is a public graduate school for engineering, which has been recognized on the highest level in the domain of digital technology. Telecom SudParis is co-founder member of the Institut Polytechnique de Paris and part of the Institut Mines-Telecom, the number one group of engineering schools in France. The recruited assistant/associate professor will join the ARTEMIS (Advanced Research and TEchniques for Multidimensional Imaging Systems) Department of Télécom SudParis and the SAMOVAR laboratory. The targeted research theme concerns the field of artificial intelligence, applied to the semantic analysis of massive, distributed and heterogeneous multimedia data. This concerns the automatic, multi-modal interpretation of complex audio-visual documents, computer vision, multimedia indexation, knowledge extraction and machine learning methodologies. The expected contributions will focus on deep neural network learning methods and target the entire multimedia content processing chain. Detailed information can be found at the following URL: https://institutminestelecom.recruitee.com/l/en/o/maitre-de-conferences-en-intelligence-artificielle-pour-lanalyse-semantique-de-donnees-multimedia-cdi The application deadline is March 31, 2023. Please do not hesitate to contact me for any further information.

Best regards,

Titus ZAHARIA, Professor

Head of the ARTEMIS Department Télécom SudParis

Institut Polytechnique de Paris

titus.zaharia@telecom-sudparis.eu

6-31

(2023-03-16) PhD Position in Deep Cascaded Representation Learning for Speech Modelling, Univ.Sheffield, UK

Title of Project: Deep Cascaded Representation Learning for Speech Modelling

Supervisor:Professor Thomas Hain

Deadline for Applications:13th April 2023

The LivePerson Centre for Speech and Language offers a 3 year fully funded PhD studentship
covering standard maintenance, fees and travel support, to work on cascaded deep learning structures
to model speech. The Centre is connected with the Speech and Hearing (SpandH) and the Natural
Language Processing (NLP) research groups in the Department of Computer Science at the University
of Sheffield.
Auto-encoding is a powerful concept that allows us to compress signals and find essential
representations.Th econcept was expanded to include context, which is usually referred to as
self-supervised learning. On very large amounts of speech data this has led to very successful
methods and models for representing speech data, for a wide range of downstream processes.
Examples of such models are Wave2Vec or WaveLM. Use of their representations often requires
fine-tuning to a specific task, with small amounts of data. When encoding speech, it is desirable to
represent a range of attributes at different temporal specificity. Such attributes often reflect a hierarchy
of information.
The aim in this PhD project is to explore the use of knowledge about natural hierarchies in speech in
cascaded auto- and contextual encoder/decoder models. The objective is to describe a structured way
to understand such hierarchies. The successful candidate is expected to propose methods to combine
different kinds of supervision (auto, context, label) and build hierarchies of embeddings extractions.
These propositions may have to be seen in the context of data availability and complexity. All
proposals are to be implemented and tested on speech data. Experiments should be conducted on a
range of speech data sets with different speech types and data set size.
The student will join a world-leading team of researchers in speech and language technology. The
LivePerson Centre for Speech and Language Technology was established in 2017 with the aim to
conduct research into novel methods for speech recognition and general speech processing, including
end to end modelling, direct waveform modelling and new approaches to modelling of acoustics and
language. It has recently extended its research remit to spoken and written dialogue. The Centre hosts
severa lResearch Associates, PhD researchers,graduate and undergraduate project students,
Researchers and Engineers from LivePerson, and academic visitors. Being fully connected with
SpandH brings collaboration, and access to a wide range of academic research and opportunities for
collaboration inside and outside of the University. The Centre has access to extensive dedicated
computing resources (GPU, large storage) and local storage of over 60TB of raw speech data.

The successful applicant will work under the supervision of Prof. Hain who is the Director of the
LivePerson Centre and also Head of the SpandH research group. SpandH was and is involved in a
large number of national and international projects funded by national bodies and EU sources as well
as industry. Prof. Hain also leads the UKRI Centre for Doctoral Training In Speech and Language
Technologies and their Applications (https://slt-cdt.ac.uk/) - a collaboration between the NLP research
group and SpandH. Jointly, NLP and SpandH host more than 110 active researchers in these fields.
This project will start as soon as possible.
If English is not your first language, you must have an IELTS score of 6.5 overall, with no less than 6.0
in each component.

How to Apply:All applications must be made directly to the University of Sheffield using the
Postgraduate Online Application Form.
Information on what documents are required and a link to the application form can
be found here - https://www.sheffield.ac.uk/postgraduate/phd/apply/applying
On your application, please name Prof. Thomas Hain as your proposed supervisor
and include the title of the studentship you wish to apply for.
Your research proposal should:
●Be no longer than 4 A4 pages, including references
●Outline your reasons for applying for this studentship
●Explain how you would approach the research, including details of your
skills and experience in the topic area
If you have any queries, please contact phd-compsci@sheffield.ac.uk

Funding Details:
This position is fully funded by LivePerson, covering all tuition fees and a stipend at
the standard UKRI rate

6-32

(2023-03-15) Research Associate in Integrated Multitask Neural Speech Labelling, Univ.Sheffield, UK

Job Title:Research Associate in Integrated Multitask Neural Speech Labelling
For further information and the link to apply please visit:
https://www.jobs.ac.uk/job/CXH168/research-associate-in-integrated-multitask-neural-speech-
labelling
Deadline for
Applications:15th March 2023
We are seeking an outstanding Research Associate in Integrated Multitask Neural Speech
Labelling, to join the LivePerson Centre for Speech and Language Technology based at the
University of Sheffield, which is linked with the Speech and Hearing (SpandH) research group
in the Department of Computer Science.
Youare applying to join a world-leading team of researchers in speech and language
technology to work on new ways to integrate a variety of speech technology labelling,
clustering or segmentatio ntasks into a single algorithm or process, in the context o fdeep
neural networks.Even end to end(E2E) automatic speech recognition is typically
considered as a standalone process, independent of other speech audio technology tasks
such as diarisation, acoustic event detection or intent recognition.
The LivePerson Centre for Speech and Language Technology was established in 2017 with
the aim of conducting research into novel methods for speech recognition and general speech
processing, including end to end modelling, direct waveform modelling and new approaches
to modelling of acoustics and language. It has recently extended its research remit to spoken
and written dialogue. The Centre hosts several Research Associates, PhD researchers,
graduate and undergraduate project students, Researchers and Engineers from LivePerson,
and academic visitors, which will provide avibrant work environmentwithin the University.
Being fully connected with SpandH brings collaboration, and access to a wide range of
academic research and opportunities for collaboration inside and outside of the University.
Thepostholder wil lwork closely with Prof.Thomas Hain who is the Director of the
LivePerson Centre and also Head of the SpandH group. Prof. Hain also leads the UKRI
Centre for Doctora lTraining InSpeech and LanguageTechnologies and their
Applications(slt-cdt.ac.uk) - a collaboration between the Natural Language Processing
(NLP) research group and SpandH. Jointly, NLP and SpandH host more than 110 active
researchers in these fields.

We’re one of the best not-for-profit organisations to work for in the UK. The University’s Total
Reward Package includes a competitive salary, a generous Pension Scheme and annual
leave entitlement, as well as access to a range of learning and development courses to
support your personal and professional development.
We build teams of people from different heritages and lifestyles from across the world, whose
talent and contributions complement each other to the greatest effect. We believe diversity in
all its forms delivers greater impact through research, teaching and student experience.
To find out what makes the University of Sheffield a remarkable place to work, watch this short
film: www.youtube.com/watch?v=7LblLk18zmo, and follow @sheffielduni and @ShefUniJobs
on Twitter for more information.

6-33

(2023-03-15) PhD Position in Adaptive Deep Learning for Speech and Language, Univ.Sheffield, UK

Title of Project: PhD Position in Adaptive Deep Learning for Speech and Language

Supervisor:Professor Thomas Hain
Deadline for
Applications:13th April 2023
The LivePerson Centre for Speech and Language offers a 3 year fully funded PhD studentship
covering standard maintenance, fees and travel support, to work on deep neural network adaptive
learning modules for speech and language. The Centre is connected with the Speech and Hearing
(SpandH) and the Natural Language Processing(NLP) researchg roups at the Department of
Computer Science at the University of Sheffield.
Domain mismatch remains a key issue for speech and language technologies for which traditional
solutions are transfer learning and adaptation. The latter was widely used for modelling of speech in
the context of generative models, however less so with modern neural network approaches. Such
adaptation targeted features or models and was often informed by previous model output and
estimates of latent factors. These approaches were often informed by observations on human abilities
to adapt and adjust to new acoustic or semantic situations. Adaptation in neural networks is model
based and often implicit - through attention or dynamic convolution. However, these methods to date
still fail to reproduce the rapid learning and adaptation that humans exhibit when being exposed to new
contexts.
The objective in this project is to conduct research into neural network structures that are capable of
rapidly adjusting to a change in latent factors and at the same time allow for robust control. This will
require rapid feedback mechanisms on the mismatch between the observed data and the model
expectation. A range of strategies may be applied - through instantaneous feedback or through control
of transformational model parameters. All proposals are to be implemented and tested on speech, and
where suitable, also language data. Experiments should be conducted on a range of tasks of different
complexity in the context of different data types.
The student will join a world-leading team of researchers in speech and language technology. The
LivePerson Centre for Speech and Language Technology was established in 2017 with the aim to
conduct research into novel methods for speech recognition and general speech processing, including
end-to-end modelling, direct waveform modelling and new approaches to modelling of acoustics and
language. It has recently extended its research remit to spoken and written dialogue. The Centre hosts
severalResearchAssociates,PhDresearchers,graduateandundergraduateprojectstudents,
ResearchersandEngineers from LivePerson, and academic visitors. Being fully connected with
SpandH brings collaboration, and access to a wide range of academic research and opportunities for
collaboration inside and outside of the University. The Centre has access to extensive dedicated
computing resources (GPU, large storage) and local storage of over 60TB of raw speech data

The successful applicant will work under the supervision of Prof. Hain who is the Director of the
LivePerson Centre and also Head of the SpandH research group. SpandH was and is involved in a
large number of national and international projects funded by national bodies and EU sources as well
as industry. Prof. Hain also leads the UKRI Centre for Doctoral Training In Speech and Language
Technologies and their Applications (https://slt-cdt.ac.uk/) - a collaboration between the NLP research
group and SpandH. Jointly, NLP and SpandH host more than 110 active researchers in these fields.
This project will start as soon as possible.
If English is not your first language, you must have an IELTS score of 6.5 overall, with no less than 6.0
in each component.
How to Apply:All applications must be made directly to the University of Sheffield using the
Postgraduate Online Application Form.
Information on what documents are required and a link to the application form can
be found here - https://www.sheffield.ac.uk/postgraduate/phd/apply/applying
On your application, please name Prof. Thomas Hain as your proposed supervisor
and include the title of the studentship you wish to apply for.
Your research proposal should:
●Be no longer than 4 A4 pages, including references
●Outline your reasons for applying for this studentship
●Explain how you would approach the research, including details of your
skills and experience in the topic area
If you have any queries, please contact phd-compsci@sheffield.ac.uk
Funding
Details:
This position is fully funded by LivePerson, covering all tuition fees and a stipend at
the standard UKRI rate.

6-34

(2023-03-17) Deux postes de MCF en phonétique au concours à l'Université Paul-Valéry Montpellier 3 ,France

Deux postes de MCF en phonétique sont ouverts au concours à l'Université Paul-Valéry Montpellier 3 cette année :

Phonétique générale et traitement outillé de l'oral (UMR 5267 Praxiling) : https://www.galaxie.enseignementsup-recherche.gouv.fr/ensup/ListesPostesPublies/ANTEE/2023_1/0341089Z/FOPC_0341089Z_4352.pdf

Phonétique et Didactique de l'oral en FLE (acquisition / appropriation des langues) : https://www.galaxie.enseignementsup-recherche.gouv.fr/ensup/ListesPostesPublies/ANTEE/2023_1/0341089Z/FOPC_0341089Z_4353.pdf

Date limite : 30 mars 2023

6-35

(2023-03-20) 2 postdocs for project ASTRID DeTOX @IRCAM Paris and EURECOM Sophia Antipolis , France

Dans le cadre du projet ASTRID DeTOX sur la lutte contre les vidéos hyper-truquées de personnalités françaises,

deux postes sont à pourvoir :

- Un post-doc de 15 mois à l?IRCAM sur la génération de deep fakes audio-visuels

https://www.ircam.fr/job-offer/chargee-de-recherche-et-developpement-generation-de-deep-fakes-audio-visuel

- Un post-doc de 18 mois ou une thèse de 36 mois à EURECOM sur la détection de deep fakes audio-visuels

https://www.eurecom.fr/en/job/deepfake-detection-post-doc

https://www.eurecom.fr/en/job/deepfake-detection-phd

6-36

(2023-03-15) PhD student in Phonetics, Stockholm University, Sweden

PhD student in Phonetics

Stockholm

Ref. No. SU FV-0793-23

at the Department of linguistics. Closing date: 15 april 2023.

The Department of Linguistics conducts research and offers education in a number of areas such as child language development, computational linguistics, general linguistics, multilingualism in the deaf and hard of hearing, phonetics and sign language. The department hosts three infrastructures, Språkstudion Language Learning Resource Centre, the Phonetics Laboratory and SUBIC Stockholm University Brain Imaging Center, and several research groups conduct research in experimental paradigms related to the mentioned infrastructures.

Project description
The position is linked to the research profile phonetics. This implies that the research plan attached to the application must have an experimental phonetics topic linked to research on acoustic and/or physiological (e.g. breathing, phonation, articulation) aspects of speech and conversation conducted at the Department of Linguistics.

Qualification requirements
In order to meet the general entry requirements, the applicant must have completed a second-cycle degree, completed courses equivalent to at least 240 higher education credits, of which 60 credits must be in the second cycle, or must have otherwise acquired equivalent knowledge in Sweden or elsewhere.

In order to meet the specific entry requirements for doctoral studies in linguistics, the general syllabus stipulates that an applicant must have received a passing grade on course work of at least 30 higher education credits from the second cycle in Linguistics, including a degree project of at least 15 credits on a topic relevant to the proposed research plan. In addition, the applicant is required to be proficient in the language of the proposed doctoral thesis (English, Swedish or Swedish Sign Language). Proficiency is demonstrated in the research plan and other relevant parts of the application (such as undergraduate theses, publications, grades, certificates and an interview).

The entry requirements can also be met by students who have acquired equivalent knowledge in Sweden or elsewhere. Assessment of eligibility is decided on in accordance with the department's local admission procedure and the department's decision and delegation procedure.

The qualification requirements must be met by the deadline for applications. Induvidual plan for studies in linguistics (in swesdish).

Selection
Selection among the eligible candidates will be based on their capacity to benefit from the training. Selection is made by the Department Board, applying the following assessment criteria:

Education in general. Assessment is made with regard to both depth and breadth in previous education
Scholarly production. On the basis of the applicant’s degree projects from the first and second cycle and, where applicable, other scholarly production, the applicant’s ability to benefit from the training is assessed according to the following criteria: critical ability, analytical skills, creativity, independence and scholarly precision. The time aspect is also considered, that is, to what extent the applicant has demonstrated an ability to complete previous academic projects within specified time limits. In addition, based on a comparison of previous academic output, an assessment is made of the applicant’s academic development
The applicant must describe his/her proposed field of research in a research plan. The dissertation project must focus on phonetics. The plan should not exceed 3 numbered pages A4 in Arial, font size 11, with single line spacing and 2.5 cm margins, references and any images and examples included. The research plan should contain one or more research problems and an outline of the research project. The research plan is assessed on the basis of: relevance, originality, and potential for completion within the specified time limits (i.e. a period equivalent to four years of full-time study for a doctoral degree)
Available supervisor resources
Teamwork skills. The applicant's ability to collaborate is assessed on the basis of, for example, references, certificates or interviews.

In selecting applicants for postgraduate education in linguistics, the department board must take into account rules and regulations of the Faculty of Humanities. In addition to the above selection criteria, the following will be of great importance in the assessment:

Relevance of the proposed research project to the department's research environment
Experience in research-related work within phonetics.

Admission Regulations for Doctoral Studies at Stockholm University are available at: www.su.se/rules and regulations.

Terms of employment
Only a person who will be or has already been admitted to a third-cycle programme may be appointed to a doctoral studentship.

The term of the initial contract may not exceed one year. The employment may be extended for a maximum of two years at a time. However, the total period of employment may not exceed the equivalent of four years of full-time study.

Doctoral students should primarily devote themselves to their own education, but may engage in teaching, research, and administration corresponding to a maximum of 20 % of a full-time position. For this particular position, the doctoral student is expected to perform departmental duties corresponding to 20 % of full time. Where applicable, the total time of the appointment is extended to correspond to a full-time doctoral programme for four years.

The proportion of departmental duties may be unevenly distributed across the duration of the doctoral programme.

Please note that admission decisions cannot be appealed.

Stockholm University strives to be a workplace free from discrimination and with equal opportunities for all.

Contact
For more information, please contact the Head of Department Mattias Heldner, telephone: +46 8 16 19 88, mattias.heldner@ling.su.se. For questions about the doctoral programme, contact the Director of Studies for postgraduate education, Bernhard Wälchli, +46 8 16 23 44, studierektorfu@ling.su.se.

Union representatives
Ingrid Lander (Saco-S), telephone: +46 708 16 26 64, saco@saco.su.se, Alejandra Pizarro Carrasco (Fackförbundet ST/OFR), telephone: +46 8 16 34 89, alejandra@st.su.se, seko@seko.su.se (SEKO), and PhD student representative, doktorandombud@sus.su.se.

Application
Apply for the PhD student position at Stockholm University's recruitment system. It is the responsibility of the applicant to ensure that the application is complete in accordance with the instructions in the job advertisement, and that it is submitted before the deadline.

Please include the following information with your application

Your contact details and personal data
Your highest degree
Your language skills
Contact details for 2–3 referees (please, specify their relationship to you as an applicant and what they are expected to testify to or comment on)

and, in addition, please include the following documents

Cover letter
CV – degrees and other completed courses, work experience and a list of degree projects/theses
Dissertation plan/research proposal, inclusive of the following:
- research question(s)
- brief background (research context and relationship to own interests/qualifications)
- method and data
- expected results (specify the scope, especially if the results are mainly descriptive)
- time plan

Note that the proposal must address the following questions: why your project is suitable to be carried out at the Department of Linguistics at Stockholm University, how you intend to contribute to the research environment at the department with your research project, what makes you particularly suitable (to carry out the proposed research project).

Degree certificates and grades confirming that you meet the general and specific entry requirements (no more than 6 files)
Degree projects/theses (no more than 6 files).

The instructions for applicants are available at: How to apply for a position.

You are welcome to apply!

Stockholm University contributes to the development of sustainable democratic society through knowledge, enlightenment and the pursuit of truth.

Closing date: 15/04/2023

URL to this page
https://www.su.se/english/about-the-university/work-at-su/available-jobs/phd-student-positions-1.507588?rmpage=job&rmjob=20262&rmlang=UK

6-37

(2023-03-20) PhD student position in experimental phonetics @ Stockholm University, Sweden

The Department of Linguistics at Stockholm University invites applications for a PhD student position in experimental phonetics, including (but not limited to) topics in prosody. For details, see: https://www.su.se/english/about-the-university/work-at-su/available-jobs/phd-student-positions-1.507588?rmpage=job&rmjob=20262&rmlang=UK .

6-38

(2023-03-20) PhD student, Bielefeld University, Germany

The Digital Linguistics Lab at Bielefeld University (head: JProf. Dr.-Ing. Hendrik Buschmeier) is seeking to fill a research position (PhD-student, E13 TV-L, 100%, fixed-term) in the area of multimodal human-robot interaction in the research project ?Hybrid Living?.

Join us to work in an interdisciplinary team on research questions in the intersection of human-robot interaction and computational linguistics. Specifically, you will work (1) on the use of multimodal communication (verbal and nonverbal) to situatively instruct a service robot, (2) on making the robot's behaviour transparent to its users, and (3) on models for solving human-robot interaction problems through communication.

The formal job advertisement, with information on how to apply, can be found here: https://uni-bielefeld.hr4you.org/job/view/2265/research-position-in-multimodal-human-robot-interaction?page_lang=en

Questions? Don't hesitate to get in touch: hbuschme@uni-bielefeld.de

Hendrik Buschmeier

6-39

(2023-03-24) Research assistant, McGill University, Montreal, Canada

We are seeking a multimodal designer to take on a central role in the Shared Reality Lab?s open source IMAGE project (image.a11y.mcgill.ca), focused on making photos, charts, and maps available to people who are blind or low vision. We are currently operating under two grants, focused on integrating haptic force feedback and pin array devices into IMAGE. You will work with a multidisciplinary team of user experience researchers, designers, and developers who will support you in designing and releasing multimodal audio and haptic experiences that will delight our end users. The primary requirement is a strong background and passion for owning both design and iterative testing of combined audio/haptic end-user experiences. Since the goal of IMAGE is to release a practical solution that can be used on a daily basis, the candidate will work directly with developers and the rest of the team to make sure that ideas and designs get translated into implementable requirements, then deployed into production.

Requirements:

Demonstrated experience designing and creating multimodal experiences, including audio and haptics. Please include portfolio, or links to projects.
Ability to work with user experience researchers to clearly define user needs and specific implementation requirements to meet them.
Comfortable working directly with developers to make sure the experiences are implementable within the project timeline and available resources.
Prototyping experience.
Designing and carrying out methodologically sound user testing in consultation with our existing user studies lead.
Strong plus: Sensitization to development issues, software release cycles, software architecture, client-server tradeoffs, audio synthesis.
Bonus: French, spoken and/or written.
Bonus: Experience working with the blind/low vision community.

Other useful skills (not required). If the candidate has the desire and capability, they are also welcome to participate in software architecture and implementation, for example:

Audio synthesis experience. (IMAGE currently uses SuperCollider)
Software architecture, design, and documentation, including creating usable solutions for developers and content creators in a client/server model.
Mentoring junior designers/developers.

Candidates applying as a research assistant must be eligible to work in Canada.

Those applying as postdoctoral candidates should refer to McGill?s requirements on postdoctoral appointments, www.mcgill.ca/gps/postdocs, for conditions and additional information on the status of the position. (In Quebec, a Postdoctoral Fellow is a full-time student status and trainee category, and the Ministère de l'Education, Enseignement Supérieur et Recherche (MESRS) stipulates that all postdocs must be registered on a university student registration system.) Please note that non-Canadian postdoctoral fellows must obtain valid Citizenship or an Immigration Canada (CIC) work permit to legally work in Canada.

To apply for the position, please email the following items to Jeremy Cooperstock (srl-jobs@cim.mcgill.ca):

A brief letter of application, describing your qualifications and relevant experience to the position of interest, along with your dates of availability.
Detailed CV with links to online papers and/or project portfolios.
Two (2) reference letters (sent separately).

The position is available immediately, with an initial appointment of up to one year. Informal inquiries are welcome.

About us: The Shared Reality Lab conducts research in audio, video, and haptic technologies, building systems that leverage their capabilities to facilitate and enrich both human-computer and computer-mediated human-human interaction. The lab is part of the Centre for Intelligent Machines and Department of Electrical and Computer Engineering of McGill University. McGill, one of Canada's most prestigious universities, is located in Montreal, a top city to live in, especially for students.

McGill University is committed to equity in employment and diversity. It welcomes applications from Aboriginal persons, persons with disabilities, ethnic minorities, persons of minority sexual orientation or gender identity, visible minorities, women and others who may contribute to diversification. All qualified applicants are encouraged to apply, although Canadians and permanent residents will be given priority.

6-40

(2023-04-07) Researcher position at the School of Informatics, Kyoto University , Japan

Researcher position at the School of Informatics, Kyoto University, Japan

Job description:
Research & Development in the Moonshot program 'Avatar Symbiotic Society',
in particular spoken dialogue design and implementation for semi-autonomous avatars

Expert area:
Spoken Dialogue Systems OR Human-Robot Interaction

Qualifications:
- Ph.D degree related with the above expert area.
- Programming skill (python)
- Fluent in English
- At least beginner level of Japanese language

Work Place:
Kyoto University, School of Informatics, Speech and Audio Processing Lab.
Sakyo-ku, Kyoto, Japan
http://sap.ist.i.kyoto-u.ac.jp/EN

Work Hours:
Discretionary work system (7 hour 45 min. standard)
Monday to Friday except for national holidays and summer holidays

Salary:
Determined based on the work experiences and the guideline of University

Starting Date:
As early as possible

Employment Term:
Can be renewed every year and until November, 2025.

Contact:
Tatsuya Kawahara, Professor
School of Informatics, Kyoto University
Sakyo-ku, Kyoto 606-8501, JAPAN
E-mail: kawahara@i.kyoto-u.ac.jp

Documents to be submitted:
- Resume (CV)
- List of publications
- List of reference persons

Application Deadline:
Closed when an appropriate person is found.

6-41

(2023-04-08) PhD Position @ SPEAC, Radboud University, Nijmegen, The Netherlands

We have an open 4yr PhD position in the SPEAC labhttps://hrbosker.github.io (Speech Perception in Audiovisual Communication) at the Donders Institute, Radboud University, Nijmegen, The Netherlands.

The position is funded through an ERC Starting Grant (HearingHands, 101040276) awarded to Dr. Hans Rutger Bosker. We are looking for candidates with a strong background in speech perception and an interest in audiovisual prosody and gesture-speech integration.

You will work closely with Dr. Hans Rutger Bosker (PI) and Prof. James McQueen (promotor). The PhD project aims to determine how and when the timing of seemingly meaningless up-and-down hand gestures influences audiovisual speech perception, specifically targeting more naturalistic communicative settings. You will use virtual avatars, allowing careful control of their gestural movements, to establish which kinematic and communicative properties of hand gestures influence low-level speech perception. You will assess how challenging listening conditions impact the perceptual weighting of visual, auditory and audiovisual cues to prominence, as well as determine the time-course of these cues using eye-tracking. Finally, you will design training studies to test how humans adjust their perception to between-talker variation in gesture-speech alignment.

- 4 year contract, 1.0 FTE

- gross monthly salary: ? 2,541 - ? 3,247 (scale P)

- application deadline: May 22, 2023

- preferred starting date: September 1, 2023

More details about the project, profile, and what we have to offer is available through the link below. If you have any questions, do get in touch at HansRutger.Bosker@donders.ru.nl

https://www.ru.nl/en/working-at/job-opportunities/phd-candidate-audiovisual-speech-perception-at-the-donders-centre-for-cognition

6-42

(2023-04-09) Postdoc @ University of Washington, USA

University of Washington, Seattle, WA, USA

Laboratory for Speech Physiology and Motor Control

Post-doctoral position in speech sensorimotor learning in typical adults,

DBS patients, and adults who stutter

The Laboratory for Speech Physiology and Motor Control (PI Ludo Max, Ph.D.) at the University of Washington (Seattle) is seeking to fill a post-doctoral position in the areas of sensorimotor integration and sensorimotor learning for speech production. The position will involve experimental work on sensorimotor adaptation, sensory prediction, and error evaluation in typical adults, Parkinson’s and essential tremor patients with deep brain stimulation implants (DBS), and adults who stutter. The lab is located in the University of Washington's Department of Speech and Hearing Sciences and has additional affiliations with the Graduate Program in Neuroscience, the Department of Bioengineering, and the Department of Linguistics.

The successful candidate will use speech sensorimotor adaptation paradigms (with digital signal processing perturbations applied to the real-time auditory feedback or mechanical loads applied to the jaw by a robotic device) to investigate various aspects of motor learning and control. Data collection will involve acoustic, kinematic, and neural data to investigate auditory-motor interactions during speech movement planning and execution.

The appointment is initially for one year, with renewal possible contingent upon satisfactory performance and productivity. We are looking for a candidate available to start in the summer of 2023, and applicants should have completed all requirements for their Ph.D. degree by that time. Review of applications will begin immediately. Candidates with a Ph.D. degree in neuroscience, cognitive/behavioral neuroscience, motor control/kinesiology, biomedical engineering, communication disorders/speech science, and related fields, are encouraged to apply.

For more information, please contact lab director Ludo Max, Ph.D. (LudoMax@uw.edu). Applications can be submitted to the same e-mail address. Interested candidates should submit (a) a cover letter describing their research experiences, interests, and goals, (b) a curriculum vitae, (c) the names and contact information of three individuals who can serve as references, and (d) reprints of relevant journal publications.

6-43

(2023-04-12) 3 Postdocs @ IRIT, Toulouse, France

Dans le cadre d'un projet PIA, nous avons trois offres de post-doc d'une durée de 2 ans,
ouvertes sur le site toulousain : reconnaissance automatique de la parole, analyse de
sentiments et modélisation de préférences, pour un assistant vocal embarqué dans les
véhicules.

Plus d'info ici : https://www.irit.fr/~Thomas.Pellegrini/pdf/postdoc_3offers_2023.pdf

6-44

(2023-04-15) Chargé.e de recherche et développement (H/F) projet Bruel, IRCAM, Paris, France

Offre d’emploi : 1 Chargé.e de recherche et développement (H/F) Conversion neuronale de l’identité vocale pour la réalisation d’attaques adverses

Disponibilité et durée : 18 mois, de préférence à partir du 01 juin 2023

Description du poste: Dans le cadre du projet ANR BRUEL (2022-2026), l’équipe Analyse et Synthèse des sons recherche un.e chargé.e de recherche pour la conception, l’implémentation, et l’apprentissage d’algorithmes de conversion neuronale de l’identité vocale pour la création d’attaques d’usurpation d’identité. A partir d’un ensemble de scénarios d’attaques envisagées pour réaliser ces attaques en fonction des moyens et ressources disponibles (expertise, algorithmes, données), les travaux consisteront dans un premier temps à réaliser l’implémentation d’un banc d’essais d’algorithmes pour évaluer la robustesse des systèmes d’authentification et de détection face à ces attaques. Les travaux porteront dans un second temps sur l’une ou plusieurs des problématiques suivantes : - L’apprentissage de la conversion d’identité à partir de données de qualité hétérogène et dégradée (compression, bruits, etc…) librement accessibles (par exemple sur internet), et le transfert d’identité à partir de peu de données par des stratégies d’adaptation neuronale à partir de peu d’exemples; - La génération de conversions avec un contrôle de l’emprunte acoustique pour que l’attaque soit adaptée à l’environnement sonore et au canal de communication en fonction des scénarios envisagés (depuis des conditions professionnelles jusqu’à des conditions dégradées de communication téléphonique ou internet). L’ensemble des travaux réalisés seront évalués selon les protocoles usuels en conversion d’identité vocale, mais également en relation avec les partenaires du projet pour mesurer les performances des systèmes d’authentification/détection en fonction des scénarios envisagés. Les avancées réalisées seront intégrées au système de conversion neuronale de l'identité vocale de l’Ircam et évaluées in situ dans le cadre de productions professionnelles et/ou artistiques réalisées à l’Ircam. Le.a chargé.e de recherche collaborera également avec l’équipe de développement et participera aux activités du projet (évaluation des algorithmes, réunions, spécifications, livrables, rapports).

Présentation du projet BRUEL Le projet ANR BRUEL (ElaBoRation d’Une méthodologie d’EvaLuation des systèmes d’identification par la voix) concerne l’évaluation/certification des systèmes d’identification par la voix face aux attaques adverses. En effet, les systèmes de reconnaissance automatique du locuteur sont vulnérables non seulement à la parole produite artificiellement par synthèse vocale, mais aussi à d'autres formes d'attaques telles que la conversion d’identité vocale et la relecture. Les artefacts créés lors de la création ou la manipulation de ces attaques frauduleuses constituent les marques laissées dans le signal par les algorithmes de synthèse vocale permettant ainsi de distinguer la voix réelle originale d’une voix usurpée. Dans ces conditions, la détection de l'usurpation d'identité requiert d'évaluer les contre-mesures d'usurpation d'identité en même temps que les systèmes de reconnaissance du locuteur. Le projet BRUEL ambitionne de proposer la première méthodologie d’évaluation/certification des systèmes d'identification par la voix basée sur une approche Critères Communs.

Contexte de travail Le travail sera effectué à l’IRCAM au sein de l’équipe Analyse et Synthèse des sons encadré par Nicolas Obin et Axel ROEBEL (SU, CNRS, IRCAM). Le travail pourra être mené partiellement à distance, avec la nécessité d’une participation aux réunions d’avancement du projet. L'Ircam est une association à but non lucratif, associée au Centre National d'Art et de Culture Georges Pompidou, dont les missions comprennent des activités de recherche, de création et de pédagogie autour de la musique du XXème siècle et de ses relations avec les sciences et technologies. Au sein de l'unité mixte de recherche, UMR 9912 STMS (Sciences et Technologies de la Musique et du Son) commune à l’Ircam, à Sorbonne Université, au CNRS, et au Ministère de la Culture et de la Communication, des équipes spécialisées mènent des travaux de recherche et de développement informatique dans les domaines de l'acoustique, du traitement des signaux sonores, des sciences cognitives, des technologies d’interaction, de l’informatique musicale et de la musicologie.

L'Ircam est situé au centre de Paris à proximité du Centre Georges Pompidou au 1, Place Stravinsky 75004 Paris.

Expérience et compétences requises: Nous recherchons un.e candidat.e spécialisé.e en apprentissage de réseaux de neurones profonds et en traitement automatique de la parole ou en vision, de préférence en deep fakes. Le·a candidate devra avoir une thèse de doctorat en sciences informatiques dans les domaines de l’apprentissage par réseaux de neurones profonds, ainsi que des publications dans des conférences et revues reconnues dans le domaine. Le·a candidat·e idéal·e aura:

• Une solide expertise en apprentissage machine, et en particulier en réseaux de neurones profonds.

• Une bonne expérience en traitement automatique de la parole ; de préférence dans le domaine de la génération ou des deep-fakes;

• Maîtrise du traitement du signal audio-vidéo numérique;

• Une excellente maîtrise du langage de programmation Python, de l’environnement TensorFlow pour l’apprentissage de réseaux de neurones, et du calcul distribué sur des serveurs GPUs

• Excellente maîtrise de l’anglais scientifique parlé et écrit

• Autonomie, travail en équipe, productivité, rigueur et méthodologie

Salaire Selon formation et expérience professionnelle

Candidatures Prière d'envoyer une lettre de motivation et un CV détaillant le niveau d'expérience/expertise dans les domaines mentionnés ci-dessus (ainsi que tout autre information pertinente) à Nicolas.Obin@ircam.fr et Axel.Roebel@ircam.fr Date limite de candidature 31 mai 2023

6-45

(2023-04-15) Chargé.e de recherche et développement (H/F) projet DeTOX, IRCAM, Paris, France

Offre d’emploi : 1 Chargé.e de recherche et développement (H/F) Génération de deep fakes audio-visuel

Disponibilité et durée : 15 mois, de préférence à partir du 01 juin 2023

Description du poste Dans le cadre du projet ASTRID DeTOX (2023-2025), l’équipe Analyse et Synthèse des sons recherche un.e chargé.e de recherche pourl’implémentation et l’apprentissage d’algorithmes pour la génération de deep fakes audio-visuels avec les missions suivantes : - Collection, implémentation, et apprentissage d’algorithmes représentatifs de l’état-del’art pour la génération de deep fakes audio et visuel - Implémentation d’un nouvel algorithme de génération de deep fakes audio-visuel avec synchronisation des deux modalités en particulier pour assurer la cohérence de la parole et du mouvement du lèvre et du bas du visage - Construction de bases de données audio-visuel des personnes ciblées et apprentissage de modèles de génération de deep fakes pour ces personnes Le.a chargé.e de recherche collaborera également avec l’équipe de développement et participera aux activités du projet (évaluation des algorithmes, réunions, spécifications, livrables, rapports). Présentation du projet DeTOX Les récents challenges ont montré qu'il était extrêmement difficile de mettre au point des détecteurs universels de vidéos hyper-truquées - à l'exemple des 'deep fakes' utilisés pour contrefaire l'identité d'une personne. Lorsque les détecteurs sont exposés à des vidéos générées par un algorithme nouveau, c'est-à-dire inconnu lors de la phase d'apprentissage, les performances sont encore extrêmement limitées. Pour la partie vidéo, les algorithmes examinent les images une par une, sans tenir compte de l'évolution de la dynamique faciale au cours du temps. Pour la partie vocale, la voix est générée de manière indépendante de la vidéo ; en particulier, la synchronisation audio-vidéo entre la voix et les mouvements des lèvres n'est pas prise en compte. Ceci constitue un point faible important des algorithmes de génération de vidéos hyper-truquées. Le présent projet vise à implémenter et à apprendre des algorithmes de détection de deepfakes personnalisés sur des individus pour lesquels on peut disposer et/ou fabriquer de nombreuses séquences audio-vidéo réelles et falsifiées. En se basant sur des briques technologiques de base en audio et vidéo récupérées de l'état de l'art, le projet se concentrera sur la prise en compte de l'évolution temporelle des signaux audio-visuels et de leur cohérence pour la génération et la détection. Nous souhaitons ainsi démontrer qu'en utilisant simultanément l’audio et la vidéo et en se focalisant sur une personne précise lors de l'apprentissage et de la détection, il est possible de concevoir des détecteurs efficaces même face à des générateurs encore non répertoriés. De tels outils permettront de scruter et de détecter sur le web d'éventuelles vidéos hyper-truquées de personnalités françaises importantes (président de la république, journalistes, chef d'étatmajor des armées, ...) et ce dès leur publication.

Contexte de travail Le travail sera effectué à l’IRCAM au sein de l’équipe Analyse et Synthèse des sons encadré par Nicolas Obin et Axel ROEBEL (SU, CNRS, IRCAM). Le travail pourra être mené partiellement à distance, avec la nécessité d’une participation aux réunions d’avancement du projet. L'Ircam est une association à but non lucratif, associée au Centre National d'Art et de Culture Georges Pompidou, dont les missions comprennent des activités de recherche, de création et de pédagogie autour de la musique du XXème siècle et de ses relations avec les sciences et technologies. Au sein de l'unité mixte de recherche, UMR 9912 STMS (Sciences et Technologies de la Musique et du Son) commune à l’Ircam, à Sorbonne Université, au CNRS, et au Ministère de la Culture et de la Communication, des équipes spécialisées mènent des travaux de recherche et de développement informatique dans les domaines de l'acoustique, du traitement des signaux sonores, des sciences cognitives, des technologies d’interaction, de l’informatique musicale et de la musicologie.

L'Ircam est situé au centre de Paris à proximité du Centre Georges Pompidou au 1, Place Stravinsky 75004 Paris.

Expérience et compétences requises Nous recherchons un.e candidat.e spécialisé.e en apprentissage de réseaux de neurones profonds et en traitement automatique de la parole ou en vision, de préférence en deep fakes. Le·a candidate devra avoir une thèse de doctorat en sciences informatiques dans les domaines de l’apprentissage par réseaux de neurones profonds, ainsi que des publications dans des conférences et revues reconnues dans le domaine. Le·a candidat·e idéal·e aura:

• Une solide expertise en apprentissage machine, et en particulier en réseaux de neurones profonds.

• Une bonne expérience en traitement automatique de la parole ou en vision ; de préférence dans le domaine des deep-fakes;

• Maîtrise du traitement du signal audio-vidéo numérique;

• Une excellente maîtrise du langage de programmation Python, de l’environnement TensorFlow pour l’apprentissage de réseaux de neurones, et du calcul distribué sur des serveurs GPUs

• Excellente maîtrise de l’anglais scientifique parlé et écrit

• Autonomie, travail en équipe, productivité, rigueur et méthodologie

Salaire Selon formation et expérience professionnelle

Candidatures Prière d'envoyer une lettre de motivation et un CV détaillant le niveau d'expérience/expertise dans les domaines mentionnés ci-dessus (ainsi que tout autre information pertinente) à Nicolas.Obin@ircam.fr et Axel.Roebel@ircam.fr Date limite de candidature 31 mai 2023

6-46

(2023-05-09) PhD position @ IMT Brest France and Instituto Superior Técnico Lisbon, Portugal

PhD Title: SUMMA-Sound : SUMMarization of Activities of daily living using Sound-based activity recognition Partnership:

IMT Atlantique : Campus ☒ Brest ☐ Nantes ☐ Rennes Laboratory : Lab-STICC Doctoral school : ☒ SPIN ☐ 3MG Funding: IMT Atlantique, co-tutelle with Instituto Superior Técnico

Context : IMT Atlantique, internationally recognised for the quality of its research, is a leading general engineering school under the aegis of the French Ministry of Industry and Digital Technology, ranked in the three main international rankings (THE, SHANGHAI, QS). Located on three campuses, Brest, Nantes and Rennes, IMT Atlantique aims to combine digital technology and energy to transform society and industry through training, research and innovation. It aims to be the leading French higher education and research institution in this field on an international scale. With 290 researchers and permanent lecturers, 1000 publications and 18 M€ of contracts, it supervises 2300 students each year and its training courses are based on cutting-edge research carried out within 6 joint research units: GEPEA, IRISA, LATIM, LABSTICC, LS2N and SUBATECH. The proposed thesis is part of the research activities of the team RAMBO (Robot interaction, Ambient systems, Machine learning, Behaviour, Optimization) and of the laboratory Lab-STICC and the department of Computer Science of IMT Atlantique. Scientific context: The objective of this thesis is to develop a method for collecting and summarizing domestic health-related data relevant for medical diagnosis, in a non-intrusive manner using audio information. This research addresses the lack of existing practical tools for providing high-level succinct information to medical staff on the evolution of patients they follow for health diagnostic purposes. This research is based on the assumption that valuable diagnostic data can be collected by observing short- and long-term lifestyle changes and behavioural anomalies. It relies on the latest advances in the domains of audio-based activity recognition, summarization of human activity, and health diagnosis. Research on health diagnosis in domestic environments has already explored a variety of sensors and modalities for gathering data on human health indicators [5]. Nevertheless, audio-based activity recognition is notable for its less intrusive nature. Employing state-of-the-art sound-based activity recognition models [2] to monitor domestic human activity, the thesis will investigate and develop methods for summarization of human activity [3] in a human-understandable language, in order to produce easily interpretable data by doctors who, remotely, monitor their patients [4]. This work continues and fosters the research of the RAMBO team at IMT Atlantique on ambient systems, enabling well ageing at home for the elderly adults or dependent populations [1]. We expect this thesis to provide technology likely to relieve the burden on gerontologists and elderly-care facilities, and alleviate the caregiver shortage by offering some automatic support to the task of monitoring elderly or handicapped people, enabling them to age-at-home while still being followed by medical specialists using automated means. Expected contributions of the thesis Scientific goals: (1) Determine the set of human activities relevant for health diagnosis, (2) Implement a state-of-the-art model for audio-based activity recognition and validate its function by clinicians, (3) Develop a model for summarizing the evolution of human activity over time intervals of arbitrary duration (typically spanning from days to months and possibly years). Expected outcomes of the PhD: (1) A model for semantic summarization of human activity, based on sound recognition of activities of daily living. (2) A proof of concept for this model Candidate profile and required skills: • Master Degree in Computer Science (or equivalent) • Programming and Software Engineering skills (Python, Git, Software Architecture Design) • Data science skills • Machine learning skills • English speaking and writing skills References: [1] Damien Bouchabou. “Human activity recognition in smart homes : tackling data variability using context-dependent deep learning, transfer learning and data synthesis”. Theses. Ecole nationale supérieure Mines-Télécom Atlantique, May 2022. url: https://theses.hal.science/tel-03728064. [2] Detection and Classification of Acoustic Scenes and Events (DCASE). url: https://dcase.community/challenge2022/task-soundevent-detection-in-domestic-environments (visited on 07/01/2022). [3] P Durga et al. “When less is better: A summarization technique that enhances clinical effectiveness of data”. In: Proceedings of the 2018 International Conference on Digital Health. 2018, pp. 116–120. [4] Akshay Jain et al. “Linguistic summarization of in-home sensor data”. In: Journal of Biomedical Informatics 96 (2019), p. 103240. issn: 1532-0464. [5] Mostafa Haghi Kashani et al. “A systematic review of IoT in healthcare: Applications, techniques, and trends”. In: Journal of Network and Computer Applications 192 (2021), p. 103164. Work Plan: The thesis will be organised in the following steps: (1) Definition of pertinent sounds and activities for health diagnosis, (2) Hardware set-up, (3) Dataset constitution, (4) Activity recognition, (5) Diarization of activities, (6) Summarization, (7) Validation in a real environment. Application: To apply for this position, please send an email with your Curriculum Vitae, a document with your academic results (if possible), and a couple of lines describing your motivation to pursue a PhD to mihai[dot]andries[at]imt-atlantique[dot]fr before 16 May 2023. Additional Information :  Application deadline : 16 May 2023  Start date : Fall 2023  Contract duration: 36 months  Localisation - Location : Brest (France) and Lisbon (Portugal)  Contact(s) : Mihai ANDRIES (mihai[dot]andries[at]imt-atlantique.fr) Plinio Moreno (plinio[at]isr.tecnico.ulisboa.pt)

6-47

(2023-05-11) PhD position @ ISIR and IRCAM Paris France

Multimodal behavior generation and style transfer for virtual agent animation

Catherine Pelachaud, Nicolas Obin catherine.pelachaud@isir.upmc.fr, nicolas.obin@ircam.fr

Humans communicate through speech but also through their hand gestures, body posture, facial expression, gaze, touch, speech prosody, etc, a wide range of multimodal signals. Verbal and non-verbal behavior play a crucial role in sending and in perceiving new information in human-human interaction. Depending on the context of communication and the audience, a person is continuously adapting its style during interaction. This stylistic adaptation implies verbal and nonverbal modalities, such as language, speech prosody, facial expressions, hand gesture, and body posture. Virtual agents, also called Embodied Conversational Agents (ECAs- see [B] for an overview), are entities that can communicate verbally and nonverbally with human interlocutors. Their roles can vary depending on the applications. They can act as a tutor, an assistant or even a companion. Matching agent’s behavior style with their interaction context ensures better engagement and adherence of the human users. A large number of generative models were proposed in the past few years for synthesizing gestures of ECAs. Lately, style modeling and transfer have been receiving an increase of attention in order to adapt the behavior of the ECA to its context and audience. The latest research proposes neural architectures including a content and a style encoders and a decoder conditioned so as to generate the ECA gestural behavior corresponding to the content and with the desired style. While the first attempts focused on modeling the style of a single speaker [4, 5, 7], there is a rapidly increasing effort towards multi-speaker and multi-style modeling and transfer [1,2]. In particular, few-shots style transfer architectures attempt to generate a gestural behavior in a certain style with the minimum amount of data of the desired style and with the minimum requirement in terms of further training or fine-tuning.

Objectives and methodology:

The aim of this PhD is to generate human-like gestural behavior in order to empower virtual agents to communicate verbally and nonverbally with different styles - extending previous thesis accomplished by Mireille Fares [A]. We view behavioral style as being pervasive while speaking; it colors the communicative behaviors while content is carried by multimodal signals but mainly expressed through text semantics. The objective is to generate ultra-realistic verbal and nonverbal behaviors (text style, prosody, facial expression, body gestures and poses) corresponding to a given content (mostly driven by text and speech), and to adapt it to a specific style. This raises methodological and fundamental challenges in the fields of machine learning and human-computer interaction: 1) How to define content and style; which modalities are involved and with which proportion in the gestural expression of content and style? 2) How do we implement efficient neural architectures to disentangle content and style information from multimodal human behavior (text, speech, gestures)? The proposed directions will leverage on the cutting-edge research in neural networks such as multimodal modeling and generation [8], information disentanglement [6], and text prompt generation as popularized by DALL-E or Chat-GPT [9].

The research questions can be summarized as follows:

· What is a multimodal style?: What are the style cues in each modality (verbal, prosody, and nonverbal behavior)? How to fuse each modality cues to build a multimodal style?

· How to control the generation of verbal and nonverbal cues using a multimodal style? How to transfer a multimodal style into generative models? How to integrate style-oriented prompts/instructions into multimodal generative models by keeping the underlying intentions to be conveyed by the agent?

· How to evaluate the generation?: How to measure the content preservation and the style transfer? How to design evaluation protocols with real users?

The PhD candidate will elaborate contributions in the field of neural multimodal behavior generation of virtual agents with a particular focus on multimodal style generation and controls:

· Learning disentangled content and style encodings from multimodal human behavior using adversarial learning, bottleneck learning, and cross-entropy / mutual information formalisms.

· Generating expressive multimodal behavior using prompt-tuning, VAE-GAN, and stable diffusion algorithms. To accomplish those objectives, we propose the following steps:

· Analyzing corpus to identify style and content cues in different modalities.

· Proposing generative models for multimodal style transfer according to different control levels (human mimicking or prompts/instructions)

· Evaluating the proposed models with dedicated corpus (e.g. PATS) and with real users. Different criterias will be evaluated: content preservation, style transfer, coherence of the ECA overall modalities. When evaluated with a human user, we envision measuring the user’s engagement, their knowledge memorization and preferences.

Supervision team

Catherine Pelachaud is director of research CNRS at ISIR working on embodied conversational agent, affective computing and human-machine interaction.
[A] M. Fares, C. Pelachaud, N. Obin (2022) Transformer Network for Semantically-Aware and Speech-Driven Upper-Face Generation, in EUSIPCO [B] C. Pelachaud, C. Busso, D. Heylen (2021), Multimodal behavior modeling for socially interactive agents. The Handbook on Socially Interactive Agents: 20 Years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, and Social Robotics Volume 1: Methods, Behavior, Cognition
Nicolas Obin is associate professor at Sorbonne Université and research scientist at Ircam in human speech generation, vocal deep fake, and multimodal generation.
[C] L. Benaroya, N. Obin, A. Roebel (2023). Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations. In Entropy 25 (2), 375
[D] F. Bous, L. Benaroya, N. Obin, A. Roebel (2022) Voice Reenactment with F0 and timing constraints and adversarial learning of conversions The supervision team is used to publish in high-venue conferences and journals in machine learning (e.g., AAAI, ICLR, DMKD), natural language processing & information access (e.g., EMNLP, SIGIR), agents (e.g., AAMAS), and speech (Interspeech).

Required Experiences and Skills

· Master or engineering degree in Computer Science or Applied Mathematics /knowledge in deep learning

· Very proficient in Python (NumPy, SciPy), TensorFlow/Pytorch environment, and distributed computation (GPU)

· High productivity, capacity for methodical and autonomous work, good communication skills.

Environment

The PhD will be hosted by two laboratories ISIR and IRCAM experts in the fields of machine learning, natural language / speech/ human behavior processing, and virtual agents with the support of the Sorbonne Center for Artificial Intelligence (SCAI). The PhD candidate is expected to publish in the most prominent conferences and journals in the domain (such as: ICML, EMNLP, AAAI, IEEE TAC, AAMAS, IVA, ICMI, etc...). SCAI is equipped with a cluster of 30 nodes: 100 GPU cards and a processor of 1800 TFLOPS / FP32. The candidate can also use the Jean Zay cluster hosted by the CNRS-IDRIS.

References

[1] C. Ahuja, D. Won Lee, and L.-P. Morency. 2022. Low-Resource Adaptation for Personalized Co-Speech Gesture Generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] S. Alexanderson, G. Eje Henter, T. Kucherenko, and J. Beskow. 2020. Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows. In Computer Graphics Forum, Vol. 39. 487–496.
[3] P. Bordes, E. Zablocki, L. Soulier, B. Piwowarski, P. Gallinari (2019). Incorporating Visual Semantics into Sentence Representations within a Grounded Space. In EMNLP/IJCNLP
[4] D. Cudeiro, T. Bolkart, C. Laidlaw, A. Ranjan, A., and M.J. Black. (2019). Capture, learning, and synthesis of 3D speaking styles. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10101–10111
[5] S. Ginosar, A. Bar, G. Kohavi, C., Chan, A., Owens and J. Malik. 2019. Learning individual styles of conversational gesture. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
[6] S. Subramanian, G. Lample, E.M. Smith, L. Denoyer, M.'A. Ranzato, Y-L. Boureau (2018). Multiple-Attribute Text Style Transfer. CoRR abs/1811.00552
[7] T. Karras, T. Aila, S. Laine, A. Herva, and J. Lehtinen. 2017. Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Transactions on Graphics (TOG) 36, 1–12
[8] C. Rebuffel, M. Roberti, L. Soulier, G. Scoutheeten, R. Cancelliere, P. Gallinari (2022). Controlling hallucinations at word level in data-to-text generation. In Data Min. Knowl. Discov. 36(1): 318-354
[9] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, Y. Shao, W. Zhang, M-H Yang, B Cui (2022). Diffusion Models: A Comprehensive Survey of Methods and Applications. CoRR abs/2209.00796

6-48

(2023-05-11) PhD-student position in artificial intelligence, human-robot interaction @ KTH, Stockholm, Sweden

We are looking for a PhD student who is interested in Artificial Intelligence, Machine Learning, Natural Language Processing and Human-Robot Interaction. The doctoral student will work in a newly funded project at the Department of Speech, Music and Hearing within the School of Electrical Engineering and Computer Science at KTH. The project is financed by the Swedish AI-program WASP (Wallenberg AI, Autonomous Systems and Software Program) which offers a graduate school with research visits, partner universities, and visiting lecturers.

The newly started project is titled 'Anticipatory Control in Conversational Human-Robot Interaction'. The aim of the project is to use self-supervised learning to develop generic language models for human-robot interaction and explore how such models can be used in real time to predict and anticipate human behavior and thereby improve the interaction. Whereas traditional language models in NLP (such as BERT, GPT) have focused on written language, we want to model multi-modal conversation, where aspects such as engagement, turn-taking, and incremental processing are of importance. This means that the models will have to process both text, audio and video, including aspects such as how the human users move and their facial expressions.

In collaboration with industry and other projects, we will then explore how such models can be used for social robotic applications. Another important focus will be on model analysis and visualization.

For more information about the position, see https://www.kth.se/en/om/work-at-kth/lediga-jobb/what:job/jobID:623390/where:4/

If you have any questions, don’t hesitate to contact Prof. Gabriel Skantze

(skantze@kth.se)

6-49

(2023-05-15) Post-doctoral and engineer positions@ LORIA-INRIA, Nancy, France

Automatic speech recognition for non-natives speakers in a noisy environment

Post-doctoral and engineer positions

Starting date: July-September of 2023

Duration: 24 months for a post-doc position and 12 months for an engineer position

Supervisors: Irina Illina, Associate Professor, HDR Lorraine University LORIA-INRIA Multispeech Team, illina@loria.fr

Emmanuel Vincent, Senion Research Scientist & Head of Science, INRIA Multispeech Team, emmanuel.vincent@inria.fr

http://members.loria.fr/evincent/

Cons: the application must meet the requirements of the French Directorate General of Armament (Direction générale de l'armement, DGA).

Context

When a person has their hands busy performing a task like driving a car or piloting an airplane, voice is a fast and efficient way to achieve interaction. In aeronautical communications, the English language is most often compulsory. Unfortunately, a large part of the pilots are not native English and speak with an accent dependent on their native language and are therefore influenced by the pronunciation mechanisms of this language. Inside an aircraft cockpit, non-native voice of the pilots and the surrounding noises are the most difficult challenges to overcome in order to have efficient automatic speech recognition (ASR). The problems of non-native speech are numerous: incorrect or approximate pronunciations, errors of agreement in gender and number, use of non-existent words, missing articles, grammatically incorrect sentences, etc. The acoustic environment adds a disturbing component to the speech signal. Much of the success of speech recognition relies on the ability to take into account different accents and ambient noises into the models used by ARP.

Automatic speech recognition has made great progress thanks to the spectacular development of deep learning. In recent years, end-to-end automatic speech recognition, which directly optimizes the probability of the output character sequence based on the input acoustic characteristics, has made great progress [Chan et al., 2016; Baevski et al., 2020; Gulati, et al., 2020].

Objectives

The recruited person will have to develop methodologies and tools to obtain high-performance non-native automatic speech recognition in the aeronautical context and more specifically in a (noisy) aircraft cockpit.

This project will be based on an end-to-end automatic speech recognition system [Shi et al., 2021] using wav2vec 2.0 [Baevski et al., 2020]. This model is one of the most efficient of the current state of the art. This wav2vec 2.0 model enables self-supervised learning of representations from raw audio data (without transcription).

How to apply: Interested candidates are encouraged to contact Irina Illina (illina@loria.fr) with the required documents (CV, transcripts, motivation letter, and recommendation letters).

Requirements & skills:

- Ph.D. degree in speech/audio processing, computer vision, machine learning, or in a related field,

- ability to work independently as well as in a team,

- solid programming skills (Python, PyTorch), and deep learning knowledge,

- good level of written and spoken English.

References

[Baevski et al., 2020] A. Baevski, H. Zhou, A. Mohamed, and M. Auli. Wav2vec 2.0: A framework for self-supervised learning of speech representations, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 2020.

[Chan et al., 2016] W. Chan, N. Jaitly, Q. Le and O. Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4960-4964, 2016.

[Chorowski et al., 2017] J. Chorowski, N. Jaitly. Towards better decoding and language model integration in sequence to sequence models. Interspeech, 2017.

[Houlsby et al., 2019] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly. Parameter-efficient transfer learning for NLP. International Conference on Machine Learning, PMLR, pp. 2790–2799, 2019.

[Gulati et al., 2020] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang. Conformer: Convolution-augmented transformer for speech recognition. Interspeech, 2020.

[Shi et al., 2021] X. Shi, F. Yu, Y. Lu, Y. Liang, Q. Feng, D. Wang, Y. Qian, and L. Xie. The accented english speech recognition challenge 2020: open datasets, tracks, baselines, results and methods. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6918–6922, 2021.

6-50

(2023-05-18) PhD position @ GIPSA Lab, Grenoble, France

Nous proposons une offre de thèse en acoustique-aérodynamique-mécatronique de la parole, dans le cadre du projet ANR AVATARS (« Artificial Voice production: control of bio-inspired port-HAmilToniAn numeRical and mechatronic modelS », 2023-2027). Le sujet porte sur la 'Caractérisation du comportement vocal humain dans la parole et dans le chant sur banc mécatronique robotisé. Application au développement de plis vocaux biomimétiques.'

Pour plus d'information et pour candidater :
https://www.gipsa-lab.grenoble-inp.fr/~nathalie.henrich/docs/PhDposition_ANR-AVATARS_GIPSA_2023.pdf

6-51

(2023-05-20) Poste d' enseignant chercheur, Grenoble, France

Nous recherchons pour l'année 2023-2024 une personne pour un CDD 50% enseignement et
recherche en Phonétique.

Les infos sont à retrouver ici:
https://emploi.univ-grenoble-alpes.fr/offres/enseignants-enseignants-chercheurs-contractuels/enseignant-e-chercheur-e-en-phonetique-1240660.kjsp?RH=TLK_CDD_ENS

6-52

(2023-05-22) PhD position @Lille and Grenoble, France

Nous recherchons un·e candidat·e pour une thèse sur la modélisation computationnelle du
lien perception-production en parole. La thèse sera codirigée par A. Basirat
(https://scalab.univ-lille.fr) et J. Diard (https://lpnc.univ-grenoble-alpes.fr/). Les
informations concernant le projet, les compétences attendues et la modalité de
candidature sont disponibles à l’adresse ci-dessous :

https://emploi.cnrs.fr/Offres/Doctorant/UMR9193-ANABAS-001/Default.aspx

6-53

(2023-05-22) PhD Causal Machine Learning Applied to NLP and the Study of Large Language Models, Grenoble, France

Job Offer: PhD Causal Machine Learning Applied to NLP and the Study of Large Language
Models.
Starting date: November 1st, 2023 (flexible)
Application deadline: From now until the position is filled
Interviews (tentative): Beginning of June and latter if the position is still open
Salary: ~2000€ gross/month (social security included)
Mission: research oriented (teaching possible but not mandatory)
Place of work (no remote): Laboratoire d'Informatique de Grenoble, CNRS, Grenoble, France

Keywords: natural language processing, causal machine learning, interpretability,
analysis, robustness, large language models, controllability

Description:
Natural language processing (NLP) has undergone a paradigm shift in recent years, owing
to the remarkable breakthroughs achieved by large language models (LLMs). Despite being
purely 'correlation machines' [CorrelationMachine], these models have completely altered
the landscape of NLP by demonstrating impressive results in language modeling,
translation, and summarization. Nonetheless, the use of LLMs has also surfaced crucial
questions regarding their reliability and transparency. As a result, there is now an
urgent need to gain a deeper understanding of the mechanisms governing the behavior of
LLMs, to interpret their decisions and outcomes in principled and scientifically grounded
ways.

A promising direction to carry out such analysis comes from the fields of causal analysis
and causal inference [CausalAbstraction]. Examining the causal relationships between the
inputs, outputs, and hidden states of LLMs, can help to build scientific theories about
the behavior of these complex systems. Furthermore, causal inference methods can help
uncover underlying causal mechanisms behind the complex computations of LLMs, giving hope
to better interpret their decisions and understand their limitations [Rome].

Thus, the use of causal analysis in the study of LLMs is a promising research direction
to gain deeper insights into the workings of these models.
As a Ph.D student working on this project, you will be expected to develop a strong
understanding of the principles of causal inference and their application to machine
learning, see for example the invariant language model framework [InvariantLM]. You will
have the opportunity to work on cutting-edge research projects in NLP, contributing to
the development of more reliable and interpretable LLMs. It is important to note that the
Ph.D. research project should be aligned with your interests and expertise. Therefore,
the precise direction of the research can and will be influenced by the personal taste
and research goals of the students. It is encouraged that you bring your unique
perspective and ideas to the table.

SKILLS
Master degree in Natural Language Processing, computer science or data science.
Mastering Python programming and deep learning frameworks.
Experience in causal inference or working with LLMs
Very good communication skills in English, (French not needed).

SCIENTIFIC ENVIRONMENT
The thesis will be conducted within the Getalp teams of the LIG laboratory
(https://lig-getalp.imag.fr/). The GETALP team has a strong expertise and track record in
Natural Language Processing. The recruited person will be welcomed within the team which
offer a stimulating, multinational and pleasant working environment.
The means to carry out the PhD will be provided both in terms of missions in France and
abroad and in terms of equipment. The candidate will have access to the cluster of GPUs
of both the LIG. Furthermore, access to the National supercomputer Jean-Zay will enable
to run large scale experiments.
The Ph.D. position will be co-supervised by Maxime Peyrard and François Portet.
Additionally, the Ph.D. student will also be working with external academic collaborators
at EPFL and Idiap (e.g., Robert West and Damien Teney)

INSTRUCTIONS FOR APPLYING
Applications must contain: CV + letter/message of motivation + master notes + be ready to
provide letter(s) of recommendation; and be addressed to Maxime Peyrard
(maxime.peyrard@epfl.ch) and François Portet (francois.Portet@imag.fr)

[InvariantLM] Peyrard, Maxime and Ghotra, Sarvjeet and Josifoski, Martin and Agarwal,
Vidhan and Patra, Barun and Carignan, Dean and Kiciman, Emre and Tiwary, Saurabh and
West, Robert, 'Invariant Language Modeling' Conference on Empirical Methods in Natural
Language Processing (2022): 5728–5743

[CorrelationMachine] Feder, Amir and Keith, Katherine A. and Manzoor, Emaad and Pryzant,
Reid and Sridhar, Dhanya and Wood-Doughty, Zach and Eisenstein, Jacob and Grimmer, Justin
and Reichart, Roi and Roberts, Margaret E. and Stewart, Brandon M. and Veitch, Victor and
Yang, Diyi, 'Causal Inference in Natural Language Processing: Estimation, Prediction,
Interpretation and Beyond' Transactions of the Association for Computational Linguistics
(2022), 10:1138–1158.

[CausalAbstraction] Geiger, Atticus and Wu, Zhengxuan and Lu, Hanson and Rozner, Josh and
Kreiss, Elisa and Icard, Thomas and Goodman, Noah and Potts, Christopher, 'Inducing
Causal Structure for Interpretable Neural Networks' Proceedings of Machine Learning
Research (2022): 7324-7338.

[Rome] Meng, Kevin, et al. 'Locating and Editing Factual Associations in GPT.' Advances
in Neural Information Processing Systems 35 (2022): 17359-17372.

6-54

(2023-05-27) PhD position@ EECS-KTH, Stockholm, Sweden

The School of Electrical Engineering and Computer Science (EECS) at the KTH Royal Institute of Technology has an open Ph.D position in Social Robotics at the division of Robotics, Perception and Learning (RPL).

ABOUT KTH

KTH Royal Institute of Technology in Stockholm has grown to become one of Europe’s leading technical and engineering universities, as well as a key center of intellectual talent and innovation. We are Sweden’s largest technical research and learning institution and home to students, researchers and faculty from around the world. Our research and education covers a wide area including natural sciences and all branches of engineering, as well as in architecture, industrial management, urban planning, history and philosophy.

PROJECT DESCRIPTION

This project addresses the challenge of how to enable robots to learn in a scalable and cost-efficient manner by gradually acquiring new knowledge from non-expert, semi-situated teachers. To achieve this, computational methods will be developed for robots to query the semi-situated teachers (e.g. crowd workers) and incorporate the newly acquired knowledge into their existing decision-making to further use in situ. This project is funded by the Swedish Foundation for Strategic Research.

The starting date for the positions is flexible, but preferably during the fall of 2023.

QUALIFICATIONS

The candidate must have a degree in Computer Science or related fields. Documented written and spoken English and programming skills are required. Experience with robotics, human-robot interaction, human-computer interaction, multimodal interaction and machine learning are important.

HOW TO APPLY

The application should include:

1. Curriculum vitae.

2. Transcripts from University/College.

3. Brief description of why the applicant wishes to become a doctoral student.

The application documents should be uploaded using the KTH's recruitment system. More information here:

https://www.kth.se/en/om/work-at-kth/lediga-jobb/what:job/jobID:625926/where:4/

The application deadline is ** June 2, 2023 **

6-55

(2023-05-28) Ph.D. position: Automatic speech recognition for non-natives speakers in a noisy environment, LORIA-INRIA, Nancy, France

Automatic speech recognition for non-natives speakers in a noisy environment

Ph.D. position

Starting date: September-October 2023

Duration: 36 months

Supervisors: Irina Illina, Associate Professor, HDR Lorraine University LORIA-INRIA Multispeech Team, illina@loria.fr

Emmanuel Vincent, Senior Research Scientist & Head of Science, INRIA Multispeech Team, emmanuel.vincent@inria.fr

http://members.loria.fr/evincent/

Cons: the application must meet the requirements of the French Directorate General of Armament (Direction générale de l'armement, DGA).

Context

When a person has their hands busy performing a task like driving a car or piloting an airplane, voice is a fast and efficient way to achieve interaction. In aeronautical communications, the English language is most often compulsory. Unfortunately, a large part of the pilots are not native English and speak with an accent dependent on their native language and are therefore influenced by the pronunciation mechanisms of this language. Inside an aircraft cockpit, the non-native voice of the pilots and the surrounding noises are the most difficult challenges to overcome in order to have efficient automatic speech recognition (ASR). The problems of non-native speech are numerous: incorrect or approximate pronunciations, errors of agreement in gender and number, use of non-existent words, missing articles, grammatically incorrect sentences, etc. The acoustic environment adds a disturbing component to the speech signal. Much of the success of speech recognition relies on the ability to take into account different accents and ambient noises in the models used by ASR.

Automatic speech recognition has made great progress thanks to the spectacular development of deep learning. In recent years, end-to-end automatic speech recognition, which directly optimizes the probability of the output character sequence based on the input acoustic characteristics, has made great progress [Chan et al., 2016; Baevski et al., 2020; Gulati, et al., 2020].

Objectives

The recruited person will have to develop methodologies and tools to obtain high-performance non-native automatic speech recognition in the aeronautical context and more specifically in a (noisy) aircraft cockpit.

This project will be based on an end-to-end automatic speech recognition system [Shi et al., 2021] using wav2vec 2.0 [Baevski et al., 2020]. This model is one of the most efficient of the current state of the art. This wav2vec 2.0 model enables self-supervised learning of representations from raw audio data (without transcription).

How to apply: Interested candidates are encouraged to contact Irina Illina (illina@loria.fr) with the required documents (CV, transcripts, motivation letter, and recommendation letters).

Requirements & skills:

- Master's degree in speech/audio processing, computer vision, machine learning, or in a related field,

- ability to work independently as well as in a team,

- solid programming skills (Python, PyTorch), and deep learning knowledge,

- good level of written and spoken English.

References

[Baevski et al., 2020] A. Baevski, H. Zhou, A. Mohamed, and M. Auli. Wav2vec 2.0: A framework for self-supervised learning of speech representations, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 2020.

[Chan et al., 2016] W. Chan, N. Jaitly, Q. Le and O. Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4960-4964, 2016.

[Chorowski et al., 2017] J. Chorowski, N. Jaitly. Towards better decoding and language model integration in sequence to sequence models. Interspeech, 2017.

[Houlsby et al., 2019] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly. Parameter-efficient transfer learning for NLP. International Conference on Machine Learning, PMLR, pp. 2790–2799, 2019.

[Gulati et al., 2020] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang. Conformer: Convolution-augmented transformer for speech recognition. Interspeech, 2020.

[Shi et al., 2021] X. Shi, F. Yu, Y. Lu, Y. Liang, Q. Feng, D. Wang, Y. Qian, and L. Xie. The accented english speech recognition challenge 2020: open datasets, tracks, baselines, results and methods. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6918–6922, 2021.

6-56

(2023-05-30) PhD position in NLP, @ Jozef International postgraduate School (Slovenia) and La Rochelle University (France).

We are looking for candidates for a fully funded research Ph.D. position in the field of news analysis. The candidate will be enrolled in a joint doctoral programme between Jozef Stefan International Postgraduate School (Slovenia) and La Rochelle University (France). The candidate will be co-supervised by asst. Prof. Dr. Senja Pollak and Prof. Dr. Antoine Doucet. Call open until: June 12, 2023.

The candidates will:

Stay 18 months in Slovenia and 18 months in France
Will receive full fellowship for 3 years
Will be part of research groups at Jožef Stefan Institute (Dept. of Knowledge Technologies) and L3i (La Rochelle University)

Possible topics:

News analysis
Opinion mining
Historical document processing
Diachronic analysis
Cross-lingual analysis
Other related topics

The doctoral candidate will benefit from the context of two active research groups, involving several other PhD students and postdoctoral fellows on each site. The collaboration builds up on two recent Horizon 2020 projects coordinated in Ljubljana and La Rochelle (Embeddia and NewsEye, respectively).

The application should contain:

A CV, including a list of past publications if available, grade from MSc studies, computational knowledge including natural language processing experience if available
Motivation letter (1 A4 page)
Contact of 2 referees
Transcript of MSc grades

The candidates are expected to have:

Excellent knowledge of the English language
Completed second-cycle university study programme or comparable education by September 1st 2023
Interest in scientific research
Good programming skills
Experience in natural language processing would be a plus

Apply by June 12 2023 by mail: senja.pollak@ijs.si, antoine.doucet@univ-lr.fr with subject “Slovenian-French PhD fellowship”

6-57

(2023-06-01) PhD position @ Computer Science Lab in Bordeaux, France (LaBRI) and the LORIA (Nancy, France)

In the framework of the PEPR Santé numérique “Autonom-Health” project (Health, behaviors and autonomous digital technologies), the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) and the LORIA (Nancy, France) are looking for candidates for a fully funded PhD position (36 months).

The « Autonom-Health » project is a collaborative project on digital health between SANPSY, LaBRI, LORIA, ISIR and LIRIS. The abstract of the « Autonom-Health » project can be found at the end of this email.

The missions that will be addressed by the retained candidates are among these tasks, according to the profile of the candidate:

- Data collection tasks:
- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)
- Collection of patient/doctor interactions during clinical interviews
- ASR-related tasks
- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,
- Adaptation of the ASR system to clinical interviews domain,
- Automatic phonetic transcription / alignment using end2end architectures
- Adapting ASR transcripts to be used with semantic analysis tools developed at LORIA
- Speech analysis tasks
- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.

The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the LORIA teams : « Multispeech » (contact: Emmanuel Vincent emmanuel.vincent@inria.fr) and/or the « Sémagramme » (contact: Maxime Amblard maxime.amblard@loria.fr).

Gross salary: approx. 2044 €/month

Starting date: October 2023
Required qualifications: Master in Signal processing / speech analysis / computer science Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, excellent command of French (interactions with French patients and clinicians), good level of scientific English. Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design. Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.

Applications:
To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV, cover letter (describing your personal qualifications, research interests and motivation for applying), contact information of two referees and academic certificates (Master, Bachelor certificates).

——
Abstract of the « Autonom-Health » project:

Western populations face an increase of longevity which mechanically increases the number of chronic disease patients to manage. Current healthcare strategies will not allow to maintain a high level of care with a controlled cost in the future and E health can optimize the management and costs of our health care systems. Healthy behaviors contribute to prevention and optimization of chronic diseases management, but their implementation is still a major challenge. Digital technologies could help their implementation through numeric behavioral medicine programs to be developed in complement (and not substitution) to the existing care in order to focus human interventions on the most severe cases demanding medical interventions.

However, to do so, we need to develop digital technologies which should be: i) Ecological (related to real-life and real-time behavior of individuals and to social/environmental constraints); ii) Preventive (from healthy subjects to patients); iii) Personalized (at initiation and adapted over the course of treatment) ; iv) Longitudinal (implemented over long periods of time) ; v) Interoperated (multiscale, multimodal and high-frequency); vi) Highly acceptable (protecting users’ privacy and generating trustability).

The above-mentioned challenges will be disentangled with the following specific goals: Goal 1: Implement large-scale diagnostic evaluations (clinical and biomarkers) and behavioral interventions (physical activities, sleep hygiene, nutrition, therapeutic education, cognitive behavioral therapies...) on healthy subjects and chronic disease patients. This will require new autonomous digital technologies (i.e. virtual Socially Interactive Agents SIAs, smartphones, wearable sensors). Goal 2: Optimize clinical phenotyping by collecting and analyzing non-intrusive data (i.e. voice, geolocalisation, body motion, smartphone footprints, ...) which will potentially complement clinical data and biomarkers data from patient cohorts. Goal 3: Better understand psychological, economical and socio-cultural factors driving acceptance and engagement with the autonomous digital technologies and the proposed numeric behavioral interventions. Goal 4: Improve interaction modalities of digital technologies to personalize and optimize long-term engagement of users. Goal 5: Organize large scale data collection, storage and interoperability with existing and new data sets (i.e, biobanks, hospital patients cohorts and epidemiological cohorts) to generate future multidimensional predictive models for diagnosis and treatment.

Each goal will be addressed by expert teams through complementary work-packages developed sequentially or in parallel. A first modeling phase (based on development and experimental testings), will be performed through this project. A second phase funded via ANR calls will allow to recruit new teams for large scale testing phase.

This project will rely on population-based interventions in existing numeric cohorts (i.e KANOPEE) where virtual agents interact with patients at home on a regular basis. Pilot hospital departments will also be involved for data management supervised by information and decision systems coordinating autonomous digital Cognitive Behavioral interventions based on our virtual agents. The global solution based on empathic Human-Computer Interactions will help targeting, diagnose and treat subjects suffering from dysfunctional behavioral (i.e. sleep deprivation, substance use...) but also sleep and mental disorders. The expected benefits from such a solution will be an increased adherence to treatment, a strong self-empowerment to improve autonomy and finally a reduction of long-term risks for the subjects and patients using this system. Our program should massively improve healthcare systems and allow strong technological transfer to information systems / digital health companies and the pharma industry.

6-58

(2023-06-02) Open faculty position at KU Leuven, Belgium: junior professor in Synergistic Processing of Multisensory Data for Audio-Visual Understanding

Open faculty position at KU Leuven, Belgium: junior professor in Synergistic Processing
of Multisensory Data for Audio-Visual Understanding

KU Leuven's Faculty of Engineering Science has an open position for a junior professor
(tenure track) in the area of audiovisual understanding. The successful candidate will
conduct research on synergetic processing of multisensory data for audio-visual
understanding, teach courses in the Master of Engineering Science and supervise students
in the Master and PhD programs. The candidate will be embedded in the PSI research
division of the Department of Electrical Engineering. More information is available at
https://www.kuleuven.be/personeel/jobsite/jobs/60193566?hl=en&lang=en . The deadline for
applications is September 29, 2023.

KU Leuven is committed to creating a diverse environment. It explicitly encourages
candidates from groups that are currently underrepresented at the university to submit
their applications.

6-59

(2023-06-04) PhD in ML/NLP @ Dauphine Université PSL, Paris and Université Grenoble Alpes, France

PhD in ML/NLP – Fairness and self-supervised learning for speech processing
Starting date: October 1st, 2023 (flexible)
Application deadline: June 9th, 2023
Interviews (tentative): June 14th, 2023

Salary: ~2000€ gross/month (social security included)

Mission: research oriented (teaching possible but not mandatory)

Keywords: speech processing, fairness, bias, self-supervised learning, evaluation metrics

CONTEXT

This thesis is in the context of the ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive and Innovative Speech Technologies). Self-supervised learning (SSL) has recently emerged as one of the most promising artificial intelligence (AI) methods as it becomes now feasible to take advantage of the colossal amounts of existing unlabeled data to significantly improve the performances of various speech processing tasks.

PROJECT OBJECTIVES

Speech technologies are widely used in our daily life and are expanding the scope of our action, with decision-making systems, including in critical areas such as health or legal aspects. In these societal applications, the question of the use of these tools raises the issue of the possible discrimination of people according to criteria for which society requires equal treatment, such as gender, origin, religion or disability... Recently, the machine learning community has been confronted with the need to work on the possible biases of algorithms, and many works have shown that the search for the best performance is not the only goal to pursue [1]. For instance, recent evaluations of ASR systems have shown that performances can vary according to the gender but these variations depend both on data used for learning and on models [2]. Therefore such systems are increasingly scrutinized for being biased while trustworthy speech technologies definitely represents a crucial expectation.

Both the question of bias and the concept of fairness have now become important aspects of AI, and we now have to find the right threshold between accuracy and the measure of fairness. Unfortunately, these notions of fairness and bias are challenging to define and their
meanings can greatly differ [3].

The goals of this PhD position are threefold:

- First make a survey on the many definitions of robustness, fairness and bias with the aim of coming up with definitions and metrics fit for speech SSL models

- Then gather speech datasets with high amount of well-described metadata

- Setup an evaluation protocol for SSL models and analyzing the results.

SKILLS

Master 2 in Natural Language Processing, Speech Processing, computer science or data science.
Good mastering of Python programming and deep learning framework.
Previous experience in bias in machine learning would be a plus
Very good communication skills in English
Good command of French would be a plus but is not mandatory

SCIENTIFIC ENVIRONMENT

The PhD position will be co-supervised by Alexandre Allauzen (Dauphine Université PSL, Paris) and Solange Rossato and François Portet (Université Grenoble Alpes). Joint meetings are planned on a regular basis and the student is expected to spend time in both places. Moreover, two other PhD positions are open in this project. The students, along with the partners will closely collaborate. For instance, specific SSL models along with evaluation criteria will be developed by the other PhD students. Moreover, the PhD student will collaborate with several team members involved in the project in particular the two other PhD candidates who will be recruited and the partners from LIA, LIG and Dauphine Université PSL, Paris. The means to carry out the PhD will be provided both in terms of missions in France and abroad and in terms of equipment. The candidate will have access to the cluster of GPUs of both the LIG and Dauphine Université PSL. Furthermore, access to the National supercomputer Jean-Zay will enable to run large scale experiments.

INSTRUCTIONS FOR APPLYING

Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Alexandre Allauzen (alexandre.allauzen@espci.psl.eu), Solange Rossato (Solange.Rossato@imag.fr) and François Portet (francois.Portet@imag.fr). We celebrate diversity and are committed to creating an inclusive environment for all employees.

REFERENCES:

[1] Mengesha, Z., Heldreth, C., Lahav, M., Sublewski, J. & Tuennerman, E. “I don’t Think These Devices are Very Culturally Sensitive.”—Impact of Automated Speech Recognition Errors on African Americans. Frontiers in Artificial Intelligence 4. issn: 2624-8212. https://www.frontiersin.org/article/10.3389/frai.2021.725911 (2021).

[2] Garnerin, M., Rossato, S. & Besacier, L. Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech in Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing (2021), 86–92.
[3] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A Survey on Bias and Fairness in Machine Learning. ACMComput. Surv. 54. issn: 0360-0300. https://doi.org/10.1145/3457607 (July 2021).

6-60

(2023-06-06) Postdoc in recognition and translation @LABRI, Bordeaux, France

In the framework of the European FETPROACT « Fvllmonti » project and the PEPR Santé numérique “Autonom-Health” project, the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) is looking for candidates for a 24-months post-doctoral position.

The « Fvllmonti » project is a collaborative project on new transistor architectures applied to speech recognition and machine translation between IMS, LaBRI, LAAS, INL, EPFL, GTS and Namlab. More information on the project is available at www.fvllmonti.eu

The « Autonom-Health » project is a collaborative project on digital health between SANPSY, LaBRI, LORIA, ISIR and LIRIS. The abstract of the « Autonom-Health » project can be found at the end of this email.

The missions that will be addressed by the retained candidate are among these selected tasks, according to the profile of the candidate:

- Data collection tasks:

- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)

- ASR-related tasks

- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,

- Automatic phonetic transcription / alignment using end2end architectures

- Speech analysis tasks:

- Automatic social affect/emotion/attitudes recognition on speech samples

- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.

The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the « Multispeech » (contact: Emmanuel Vincent) and/or the « Sémagramme » (contact: Maxime Amblard) teams at LORIA.

Gross salary: approx. 2686 €/month

Starting data: As soon as possible

Required qualifications: PhD in Signal processing / speech analysis / computer science / language sciences

Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, good command of French (interactions with French patients and clinicians), good level of scientific English.

Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design.

Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.

Applications:

To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV (including publication list), cover letter (describing your personal qualifications, research interests and motivation for applying), evidence for software development experience (active Github/Gitlab profile or similar), two of your key publications, contact information of two referees and academic certificates (PhD, Diploma/Master, Bachelor certificates).

——
Abstract of the « Autonom-Health » project:

Western populations face an increase of longevity which mechanically increases the number of chronic disease patients to manage. Current healthcare strategies will not allow to maintain a high level of care with a controlled cost in the future and E health can optimize the management and costs of our health care systems. Healthy behaviors contribute to prevention and optimization of chronic diseases management, but their implementation is still a major challenge. Digital technologies could help their implementation through numeric behavioral medicine programs to be developed in complement (and not substitution) to the existing care in order to focus human interventions on the most severe cases demanding medical interventions.

However, to do so, we need to develop digital technologies which should be: i) Ecological (related to real-life and real-time behavior of individuals and to social/environmental constraints); ii) Preventive (from healthy subjects to patients); iii) Personalized (at initiation and adapted over the course of treatment) ; iv) Longitudinal (implemented over long periods of time) ; v) Interoperated (multiscale, multimodal and high-frequency); vi) Highly acceptable (protecting users’ privacy and generating trustability).

The above-mentioned challenges will be disentangled with the following specific goals: Goal 1: Implement large-scale diagnostic evaluations (clinical and biomarkers) and behavioral interventions (physical activities, sleep hygiene, nutrition, therapeutic education, cognitive behavioral therapies...) on healthy subjects and chronic disease patients. This will require new autonomous digital technologies (i.e. virtual Socially Interactive Agents SIAs, smartphones, wearable sensors). Goal 2: Optimize clinical phenotyping by collecting and analyzing non-intrusive data (i.e. voice, geolocalisation, body motion, smartphone footprints, ...) which will potentially complement clinical data and biomarkers data from patient cohorts. Goal 3: Better understand psychological, economical and socio-cultural factors driving acceptance and engagement with the autonomous digital technologies and the proposed numeric behavioral interventions. Goal 4: Improve interaction modalities of digital technologies to personalize and optimize long-term engagement of users. Goal 5: Organize large scale data collection, storage and interoperability with existing and new data sets (i.e, biobanks, hospital patients cohorts and epidemiological cohorts) to generate future multidimensional predictive models for diagnosis and treatment.

Each goal will be addressed by expert teams through complementary work-packages developed sequentially or in parallel. A first modeling phase (based on development and experimental testings), will be performed through this project. A second phase funded via ANR calls will allow to recruit new teams for large scale testing phase.

This project will rely on population-based interventions in existing numeric cohorts (i.e KANOPEE) where virtual agents interact with patients at home on a regular basis. Pilot hospital departments will also be involved for data management supervised by information and decision systems coordinating autonomous digital Cognitive Behavioral interventions based on our virtual agents. The global solution based on empathic Human-Computer Interactions will help targeting, diagnose and treat subjects suffering from dysfunctional behavioral (i.e. sleep deprivation, substance use...) but also sleep and mental disorders. The expected benefits from such a solution will be an increased adherence to treatment, a strong self-empowerment to improve autonomy and finally a reduction of long-term risks for the subjects and patients using this system. Our program should massively improve healthcare systems and allow strong technological transfer to information systems / digital health companies and the pharma industry.

In the framework of the European FETPROACT « Fvllmonti » project and the PEPR Santé numérique “Autonom-Health” project, the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) is looking for candidates for a 24-months post-doctoral position.

The « Fvllmonti » project is a collaborative project on new transistor architectures applied to speech recognition and machine translation between IMS, LaBRI, LAAS, INL, EPFL, GTS and Namlab. More information on the project is available at www.fvllmonti.eu

The « Autonom-Health » project is a collaborative project on digital health between SANPSY, LaBRI, LORIA, ISIR and LIRIS. The abstract of the « Autonom-Health » project can be found at the end of this email.

The missions that will be addressed by the retained candidate are among these selected tasks, according to the profile of the candidate:

- Data collection tasks:

- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)

- ASR-related tasks

- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,

- Automatic phonetic transcription / alignment using end2end architectures

- Speech analysis tasks:

- Automatic social affect/emotion/attitudes recognition on speech samples

- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.

The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the « Multispeech » (contact: Emmanuel Vincent) and/or the « Sémagramme » (contact: Maxime Amblard) teams at LORIA.

Gross salary: approx. 2686 €/month

Starting data: As soon as possible

Required qualifications: PhD in Signal processing / speech analysis / computer science / language sciences

Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, good command of French (interactions with French patients and clinicians), good level of scientific English.

Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design.

Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.

Applications:

To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV (including publication list), cover letter (describing your personal qualifications, research interests and motivation for applying), evidence for software development experience (active Github/Gitlab profile or similar), two of your key publications, contact information of two referees and academic certificates (PhD, Diploma/Master, Bachelor certificates).

——
Abstract of the « Autonom-Health » project:

Western populations face an increase of longevity which mechanically increases the number of chronic disease patients to manage. Current healthcare strategies will not allow to maintain a high level of care with a controlled cost in the future and E health can optimize the management and costs of our health care systems. Healthy behaviors contribute to prevention and optimization of chronic diseases management, but their implementation is still a major challenge. Digital technologies could help their implementation through numeric behavioral medicine programs to be developed in complement (and not substitution) to the existing care in order to focus human interventions on the most severe cases demanding medical interventions.

However, to do so, we need to develop digital technologies which should be: i) Ecological (related to real-life and real-time behavior of individuals and to social/environmental constraints); ii) Preventive (from healthy subjects to patients); iii) Personalized (at initiation and adapted over the course of treatment) ; iv) Longitudinal (implemented over long periods of time) ; v) Interoperated (multiscale, multimodal and high-frequency); vi) Highly acceptable (protecting users’ privacy and generating trustability).

The above-mentioned challenges will be disentangled with the following specific goals: Goal 1: Implement large-scale diagnostic evaluations (clinical and biomarkers) and behavioral interventions (physical activities, sleep hygiene, nutrition, therapeutic education, cognitive behavioral therapies...) on healthy subjects and chronic disease patients. This will require new autonomous digital technologies (i.e. virtual Socially Interactive Agents SIAs, smartphones, wearable sensors). Goal 2: Optimize clinical phenotyping by collecting and analyzing non-intrusive data (i.e. voice, geolocalisation, body motion, smartphone footprints, ...) which will potentially complement clinical data and biomarkers data from patient cohorts. Goal 3: Better understand psychological, economical and socio-cultural factors driving acceptance and engagement with the autonomous digital technologies and the proposed numeric behavioral interventions. Goal 4: Improve interaction modalities of digital technologies to personalize and optimize long-term engagement of users. Goal 5: Organize large scale data collection, storage and interoperability with existing and new data sets (i.e, biobanks, hospital patients cohorts and epidemiological cohorts) to generate future multidimensional predictive models for diagnosis and treatment.

Each goal will be addressed by expert teams through complementary work-packages developed sequentially or in parallel. A first modeling phase (based on development and experimental testings), will be performed through this project. A second phase funded via ANR calls will allow to recruit new teams for large scale testing phase.

This project will rely on population-based interventions in existing numeric cohorts (i.e KANOPEE) where virtual agents interact with patients at home on a regular basis. Pilot hospital departments will also be involved for data management supervised by information and decision systems coordinating autonomous digital Cognitive Behavioral interventions based on our virtual agents. The global solution based on empathic Human-Computer Interactions will help targeting, diagnose and treat subjects suffering from dysfunctional behavioral (i.e. sleep deprivation, substance use...) but also sleep and mental disorders. The expected benefits from such a solution will be an increased adherence to treatment, a strong self-empowerment to improve autonomy and finally a reduction of long-term risks for the subjects and patients using this system. Our program should massively improve healthcare systems and allow strong technological transfer to information systems / digital health companies and the pharma industry.

In the framework of the European FETPROACT « Fvllmonti » project and the PEPR Santé numérique “Autonom-Health” project, the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) is looking for candidates for a 24-months post-doctoral position.

The « Fvllmonti » project is a collaborative project on new transistor architectures applied to speech recognition and machine translation between IMS, LaBRI, LAAS, INL, EPFL, GTS and Namlab. More information on the project is available at www.fvllmonti.eu

The « Autonom-Health » project is a collaborative project on digital health between SANPSY, LaBRI, LORIA, ISIR and LIRIS. The abstract of the « Autonom-Health » project can be found at the end of this email.

The missions that will be addressed by the retained candidate are among these selected tasks, according to the profile of the candidate:

- Data collection tasks:

- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)

- ASR-related tasks

- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,

- Automatic phonetic transcription / alignment using end2end architectures

- Speech analysis tasks:

- Automatic social affect/emotion/attitudes recognition on speech samples

- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.

The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the « Multispeech » (contact: Emmanuel Vincent) and/or the « Sémagramme » (contact: Maxime Amblard) teams at LORIA.

Gross salary: approx. 2686 €/month

Starting data: As soon as possible

Required qualifications: PhD in Signal processing / speech analysis / computer science / language sciences

Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, good command of French (interactions with French patients and clinicians), good level of scientific English.

Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design.

Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.

Applications:

To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV (including publication list), cover letter (describing your personal qualifications, research interests and motivation for applying), evidence for software development experience (active Github/Gitlab profile or similar), two of your key publications, contact information of two referees and academic certificates (PhD, Diploma/Master, Bachelor certificates).

——
Abstract of the « Autonom-Health » project:

Western populations face an increase of longevity which mechanically increases the number of chronic disease patients to manage. Current healthcare strategies will not allow to maintain a high level of care with a controlled cost in the future and E health can optimize the management and costs of our health care systems. Healthy behaviors contribute to prevention and optimization of chronic diseases management, but their implementation is still a major challenge. Digital technologies could help their implementation through numeric behavioral medicine programs to be developed in complement (and not substitution) to the existing care in order to focus human interventions on the most severe cases demanding medical interventions.

However, to do so, we need to develop digital technologies which should be: i) Ecological (related to real-life and real-time behavior of individuals and to social/environmental constraints); ii) Preventive (from healthy subjects to patients); iii) Personalized (at initiation and adapted over the course of treatment) ; iv) Longitudinal (implemented over long periods of time) ; v) Interoperated (multiscale, multimodal and high-frequency); vi) Highly acceptable (protecting users’ privacy and generating trustability).

The above-mentioned challenges will be disentangled with the following specific goals: Goal 1: Implement large-scale diagnostic evaluations (clinical and biomarkers) and behavioral interventions (physical activities, sleep hygiene, nutrition, therapeutic education, cognitive behavioral therapies...) on healthy subjects and chronic disease patients. This will require new autonomous digital technologies (i.e. virtual Socially Interactive Agents SIAs, smartphones, wearable sensors). Goal 2: Optimize clinical phenotyping by collecting and analyzing non-intrusive data (i.e. voice, geolocalisation, body motion, smartphone footprints, ...) which will potentially complement clinical data and biomarkers data from patient cohorts. Goal 3: Better understand psychological, economical and socio-cultural factors driving acceptance and engagement with the autonomous digital technologies and the proposed numeric behavioral interventions. Goal 4: Improve interaction modalities of digital technologies to personalize and optimize long-term engagement of users. Goal 5: Organize large scale data collection, storage and interoperability with existing and new data sets (i.e, biobanks, hospital patients cohorts and epidemiological cohorts) to generate future multidimensional predictive models for diagnosis and treatment.

Each goal will be addressed by expert teams through complementary work-packages developed sequentially or in parallel. A first modeling phase (based on development and experimental testings), will be performed through this project. A second phase funded via ANR calls will allow to recruit new teams for large scale testing phase.

This project will rely on population-based interventions in existing numeric cohorts (i.e KANOPEE) where virtual agents interact with patients at home on a regular basis. Pilot hospital departments will also be involved for data management supervised by information and decision systems coordinating autonomous digital Cognitive Behavioral interventions based on our virtual agents. The global solution based on empathic Human-Computer Interactions will help targeting, diagnose and treat subjects suffering from dysfunctional behavioral (i.e. sleep deprivation, substance use...) but also sleep and mental disorders. The expected benefits from such a solution will be an increased adherence to treatment, a strong self-empowerment to improve autonomy and finally a reduction of long-term risks for the subjects and patients using this system. Our program should massively improve healthcare systems and allow strong technological transfer to information systems / digital health companies and the pharma industry.

In the framework of the European FETPROACT « Fvllmonti » project and the PEPR Santé numérique “Autonom-Health” project, the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) is looking for candidates for a 24-months post-doctoral position.

The « Fvllmonti » project is a collaborative project on new transistor architectures applied to speech recognition and machine translation between IMS, LaBRI, LAAS, INL, EPFL, GTS and Namlab. More information on the project is available at www.fvllmonti.eu

The « Autonom-Health » project is a collaborative project on digital health between SANPSY, LaBRI, LORIA, ISIR and LIRIS. The abstract of the « Autonom-Health » project can be found at the end of this email.

The missions that will be addressed by the retained candidate are among these selected tasks, according to the profile of the candidate:

- Data collection tasks:

- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)

- ASR-related tasks

- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,

- Automatic phonetic transcription / alignment using end2end architectures

- Speech analysis tasks:

- Automatic social affect/emotion/attitudes recognition on speech samples

- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.

The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the « Multispeech » (contact: Emmanuel Vincent) and/or the « Sémagramme » (contact: Maxime Amblard) teams at LORIA.

Gross salary: approx. 2686 €/month

Starting data: As soon as possible

Required qualifications: PhD in Signal processing / speech analysis / computer science / language sciences

Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, good command of French (interactions with French patients and clinicians), good level of scientific English.

Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design.

Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.

Applications:

To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV (including publication list), cover letter (describing your personal qualifications, research interests and motivation for applying), evidence for software development experience (active Github/Gitlab profile or similar), two of your key publications, contact information of two referees and academic certificates (PhD, Diploma/Master, Bachelor certificates).

——
Abstract of the « Autonom-Health » project:

Western populations face an increase of longevity which mechanically increases the number of chronic disease patients to manage. Current healthcare strategies will not allow to maintain a high level of care with a controlled cost in the future and E health can optimize the management and costs of our health care systems. Healthy behaviors contribute to prevention and optimization of chronic diseases management, but their implementation is still a major challenge. Digital technologies could help their implementation through numeric behavioral medicine programs to be developed in complement (and not substitution) to the existing care in order to focus human interventions on the most severe cases demanding medical interventions.

However, to do so, we need to develop digital technologies which should be: i) Ecological (related to real-life and real-time behavior of individuals and to social/environmental constraints); ii) Preventive (from healthy subjects to patients); iii) Personalized (at initiation and adapted over the course of treatment) ; iv) Longitudinal (implemented over long periods of time) ; v) Interoperated (multiscale, multimodal and high-frequency); vi) Highly acceptable (protecting users’ privacy and generating trustability).

The above-mentioned challenges will be disentangled with the following specific goals: Goal 1: Implement large-scale diagnostic evaluations (clinical and biomarkers) and behavioral interventions (physical activities, sleep hygiene, nutrition, therapeutic education, cognitive behavioral therapies...) on healthy subjects and chronic disease patients. This will require new autonomous digital technologies (i.e. virtual Socially Interactive Agents SIAs, smartphones, wearable sensors). Goal 2: Optimize clinical phenotyping by collecting and analyzing non-intrusive data (i.e. voice, geolocalisation, body motion, smartphone footprints, ...) which will potentially complement clinical data and biomarkers data from patient cohorts. Goal 3: Better understand psychological, economical and socio-cultural factors driving acceptance and engagement with the autonomous digital technologies and the proposed numeric behavioral interventions. Goal 4: Improve interaction modalities of digital technologies to personalize and optimize long-term engagement of users. Goal 5: Organize large scale data collection, storage and interoperability with existing and new data sets (i.e, biobanks, hospital patients cohorts and epidemiological cohorts) to generate future multidimensional predictive models for diagnosis and treatment.

Each goal will be addressed by expert teams through complementary work-packages developed sequentially or in parallel. A first modeling phase (based on development and experimental testings), will be performed through this project. A second phase funded via ANR calls will allow to recruit new teams for large scale testing phase.

This project will rely on population-based interventions in existing numeric cohorts (i.e KANOPEE) where virtual agents interact with patients at home on a regular basis. Pilot hospital departments will also be involved for data management supervised by information and decision systems coordinating autonomous digital Cognitive Behavioral interventions based on our virtual agents. The global solution based on empathic Human-Computer Interactions will help targeting, diagnose and treat subjects suffering from dysfunctional behavioral (i.e. sleep deprivation, substance use...) but also sleep and mental disorders. The expected benefits from such a solution will be an increased adherence to treatment, a strong self-empowerment to improve autonomy and finally a reduction of long-term risks for the subjects and patients using this system. Our program should massively improve healthcare systems and allow strong technological transfer to information systems / digital health companies and the pharma industry.

6-61

(2023-06-02) Transcriptors for ELDA Paris France

ELDA (Evaluations and Language resources Disctribution Agency) looks for full/part time transcriptors for transcription of phone calls in the financial domain.

Location: ELDA (Paris-France)

Latest starting date: July 2023

Languages and mission details

German - 21h of speech to be transcribed + phonetisation task;
Spanish - 1h of speech to be transcribed;
Italian - 1h of speech to be transcribed;
Japanese - 21h of speech to transcribe + phonetisation task;
Polish - number of hours vary as it is still being tested.

Profile

Native expertise in the selected language with a very good level of spelling and grammar;
Good knowledge of French and/or English;
Good computer skills;
Ability to integrate and scrupulously follow transcription rules.

Salary and duration

Starting from SMIC depending on skills;
Approximately 3 months (FT for the longer assignments).

Application

Send your CV to <lucille@elda.org> ou <gabriele@elda.org>;
Include [Transcription *language*] in the object field.

6-62

(2023-06-08) Postdoc @ ENS,Paris, France

DRhyaDS

A new framework for understanding the Dynamic Rhythms and Decoding of Speech

Job Title - Postdoctoral Researcher

Disciplines and Areas of Research - Speech science, Psycholinguistics, Psychoacoustics

Contract Duration - 1 Year

Research Overview: The DRhyaDS project aims to develop a new framework for understanding the dynamic rhythms and decoding of speech. It focuses on exploring the temporal properties of speech and their contribution to speech perception. The project challenges the conventional view that speech rhythm perception relies on a one-to-one association between specific modulation frequencies in the speech signal and linguistic units. One of the key objectives of the project is to investigate the impact of language-specific temporal characteristics on speech dynamics. The project team will analyze two corpora of semi-spontaneous speech data from French and German, representing syllable-timed and stress-timed languages, respectively. Various acoustic analyses will be conducted on these speech corpora to explore the variability of slow temporal modulations in speech at an individual level. This comprehensive acoustic exploration will involve extracting and analyzing prosody, spectral properties, temporal dynamics, and rhythmic patterns. By examining these acoustic parameters, the project aims to uncover intricate details about the structure and variation of speech signals across languages and speakers, contributing to a more nuanced understanding of the dynamic nature of spoken language and its role in human communication. Environment: The selected candidate will be an integral part of an international research team and will work in a collaborative and stimulating lab environment. The project brings together a FrancoGerman team of experts in linguistics, psychoacoustics and cognitive neuroscience, led by Dr. Léo Varnet (CNRS, ENS Paris) and Dr. Alessandro Tavano (Max Planck Institute, Goethe University Frankfurt). The successful candidate will work under the supervision of Dr Léo Varnet, at the Laboratoire des Systèmes Perceptifs (ENS Paris).

Job description: This is a one-year postdoctoral contract position, offering a net salary in accordance with French legislation (~2500€/month + social and medical benefits). Women and minorities are strongly encouraged to apply. The successful candidate will participate in research activities, collaborate with team members, and contribute to scientific publications and communications. Additionally, they will have the autonomy to suggest and implement their own analysis techniques and approaches. Their responsibilities will include: - Taking a lead role in collecting a comprehensive corpus of French speech data, adhering to a rigorous data collection protocol - Collaborating closely with the German team to leverage the existing German speech corpus for comparative analysis and cross-linguistic investigations - Conducting in-depth acoustic analysis of the corpora, employing advanced techniques to investigate the variability and dynamics of slow temporal modulations in speech - Actively participating in team meetings, workshops, and conferences to present research progress, exchange ideas, and contribute to the intellectual growth of the project - Engaging in science outreach activities to promote the project's research outcomes and facilitate public understanding of speech perception and language processing.

Qualifications: - A recently obtained PhD in a relevant field (e.g., linguistics, psychology, neuroscience, computational sciences) - Strong expertise in linguistics, speech perception, acoustic analysis, and statistical methods - Proficiency in programming languages commonly used in speech research. Knowledge of MATLAB would be particularly valuable for data processing and analysis within the project. - Strong written and verbal communication skills in English. Candidates with proficiency in French and/or German language skills would be particularly appreciated, as it would enable a deeper understanding of the linguistic characteristics of the respective corpora.

Application process To apply for this position, please submit a CV and a cover letter (in French or English) along with the names and contact information of 2 referees to Léo Varnet (leo.varnet@cnrs.fr). The application deadline is 31th July 2023. Interviews will be conducted in September. The ideal start date is October-November 2023, with some flexibility allowed. Feel free to get in touch informally to discuss this position

6-63

(2023-06-16) PhD funded position@ INRIA France

Inria is opening a fully funded PhD position on multimodal speech
anonymization. For details and to apply, see:
https://jobs.inria.fr/public/classic/en/offres/2023-06410

Applications will be reviewed on a continuous basis until July 2.

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy

© Copyright 2026 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA