ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2023 » ISCApad #304 » Jobs

ISCApad #304

Tuesday, October 10, 2023 by Chris Wellekens

6 Jobs

6-1

(2023-04-07) Researcher position at the School of Informatics, Kyoto University , Japan

Researcher position at the School of Informatics, Kyoto University, Japan

Job description:
Research & Development in the Moonshot program 'Avatar Symbiotic Society',
in particular spoken dialogue design and implementation for semi-autonomous avatars

Expert area:
Spoken Dialogue Systems OR Human-Robot Interaction

Qualifications:
- Ph.D degree related with the above expert area.
- Programming skill (python)
- Fluent in English
- At least beginner level of Japanese language

Work Place:
Kyoto University, School of Informatics, Speech and Audio Processing Lab.
Sakyo-ku, Kyoto, Japan
http://sap.ist.i.kyoto-u.ac.jp/EN

Work Hours:
Discretionary work system (7 hour 45 min. standard)
Monday to Friday except for national holidays and summer holidays

Salary:
Determined based on the work experiences and the guideline of University

Starting Date:
As early as possible

Employment Term:
Can be renewed every year and until November, 2025.

Contact:
Tatsuya Kawahara, Professor
School of Informatics, Kyoto University
Sakyo-ku, Kyoto 606-8501, JAPAN
E-mail: kawahara@i.kyoto-u.ac.jp

Documents to be submitted:
- Resume (CV)
- List of publications
- List of reference persons

Application Deadline:
Closed when an appropriate person is found.

Back

Top

6-2

(2023-04-08) PhD Position @ SPEAC, Radboud University, Nijmegen, The Netherlands

We have an open 4yr PhD position in the SPEAC labhttps://hrbosker.github.io (Speech Perception in Audiovisual Communication) at the Donders Institute, Radboud University, Nijmegen, The Netherlands.

The position is funded through an ERC Starting Grant (HearingHands, 101040276) awarded to Dr. Hans Rutger Bosker. We are looking for candidates with a strong background in speech perception and an interest in audiovisual prosody and gesture-speech integration.

You will work closely with Dr. Hans Rutger Bosker (PI) and Prof. James McQueen (promotor). The PhD project aims to determine how and when the timing of seemingly meaningless up-and-down hand gestures influences audiovisual speech perception, specifically targeting more naturalistic communicative settings. You will use virtual avatars, allowing careful control of their gestural movements, to establish which kinematic and communicative properties of hand gestures influence low-level speech perception. You will assess how challenging listening conditions impact the perceptual weighting of visual, auditory and audiovisual cues to prominence, as well as determine the time-course of these cues using eye-tracking. Finally, you will design training studies to test how humans adjust their perception to between-talker variation in gesture-speech alignment.

- 4 year contract, 1.0 FTE

- gross monthly salary: ? 2,541 - ? 3,247 (scale P)

- application deadline: May 22, 2023

- preferred starting date: September 1, 2023

More details about the project, profile, and what we have to offer is available through the link below. If you have any questions, do get in touch at HansRutger.Bosker@donders.ru.nl

https://www.ru.nl/en/working-at/job-opportunities/phd-candidate-audiovisual-speech-perception-at-the-donders-centre-for-cognition

Back

Top

6-3

(2023-04-09) Postdoc @ University of Washington, USA

University of Washington, Seattle, WA, USA

Laboratory for Speech Physiology and Motor Control

Post-doctoral position in speech sensorimotor learning in typical adults,

DBS patients, and adults who stutter

The Laboratory for Speech Physiology and Motor Control (PI Ludo Max, Ph.D.) at the University of Washington (Seattle) is seeking to fill a post-doctoral position in the areas of sensorimotor integration and sensorimotor learning for speech production. The position will involve experimental work on sensorimotor adaptation, sensory prediction, and error evaluation in typical adults, Parkinson’s and essential tremor patients with deep brain stimulation implants (DBS), and adults who stutter. The lab is located in the University of Washington's Department of Speech and Hearing Sciences and has additional affiliations with the Graduate Program in Neuroscience, the Department of Bioengineering, and the Department of Linguistics.

The successful candidate will use speech sensorimotor adaptation paradigms (with digital signal processing perturbations applied to the real-time auditory feedback or mechanical loads applied to the jaw by a robotic device) to investigate various aspects of motor learning and control. Data collection will involve acoustic, kinematic, and neural data to investigate auditory-motor interactions during speech movement planning and execution.

The appointment is initially for one year, with renewal possible contingent upon satisfactory performance and productivity. We are looking for a candidate available to start in the summer of 2023, and applicants should have completed all requirements for their Ph.D. degree by that time. Review of applications will begin immediately. Candidates with a Ph.D. degree in neuroscience, cognitive/behavioral neuroscience, motor control/kinesiology, biomedical engineering, communication disorders/speech science, and related fields, are encouraged to apply.

For more information, please contact lab director Ludo Max, Ph.D. (LudoMax@uw.edu). Applications can be submitted to the same e-mail address. Interested candidates should submit (a) a cover letter describing their research experiences, interests, and goals, (b) a curriculum vitae, (c) the names and contact information of three individuals who can serve as references, and (d) reprints of relevant journal publications.

Back

Top

6-4

(2023-04-12) 3 Postdocs @ IRIT, Toulouse, France

Dans le cadre d'un projet PIA, nous avons trois offres de post-doc d'une durée de 2 ans,
ouvertes sur le site toulousain : reconnaissance automatique de la parole, analyse de
sentiments et modélisation de préférences, pour un assistant vocal embarqué dans les
véhicules.

Plus d'info ici : https://www.irit.fr/~Thomas.Pellegrini/pdf/postdoc_3offers_2023.pdf

Back

Top

6-5

(2023-04-15) Chargé.e de recherche et développement (H/F) projet Bruel, IRCAM, Paris, France

Offre d’emploi : 1 Chargé.e de recherche et développement (H/F) Conversion neuronale de l’identité vocale pour la réalisation d’attaques adverses

Disponibilité et durée : 18 mois, de préférence à partir du 01 juin 2023

Description du poste: Dans le cadre du projet ANR BRUEL (2022-2026), l’équipe Analyse et Synthèse des sons recherche un.e chargé.e de recherche pour la conception, l’implémentation, et l’apprentissage d’algorithmes de conversion neuronale de l’identité vocale pour la création d’attaques d’usurpation d’identité. A partir d’un ensemble de scénarios d’attaques envisagées pour réaliser ces attaques en fonction des moyens et ressources disponibles (expertise, algorithmes, données), les travaux consisteront dans un premier temps à réaliser l’implémentation d’un banc d’essais d’algorithmes pour évaluer la robustesse des systèmes d’authentification et de détection face à ces attaques. Les travaux porteront dans un second temps sur l’une ou plusieurs des problématiques suivantes : - L’apprentissage de la conversion d’identité à partir de données de qualité hétérogène et dégradée (compression, bruits, etc…) librement accessibles (par exemple sur internet), et le transfert d’identité à partir de peu de données par des stratégies d’adaptation neuronale à partir de peu d’exemples; - La génération de conversions avec un contrôle de l’emprunte acoustique pour que l’attaque soit adaptée à l’environnement sonore et au canal de communication en fonction des scénarios envisagés (depuis des conditions professionnelles jusqu’à des conditions dégradées de communication téléphonique ou internet). L’ensemble des travaux réalisés seront évalués selon les protocoles usuels en conversion d’identité vocale, mais également en relation avec les partenaires du projet pour mesurer les performances des systèmes d’authentification/détection en fonction des scénarios envisagés. Les avancées réalisées seront intégrées au système de conversion neuronale de l'identité vocale de l’Ircam et évaluées in situ dans le cadre de productions professionnelles et/ou artistiques réalisées à l’Ircam. Le.a chargé.e de recherche collaborera également avec l’équipe de développement et participera aux activités du projet (évaluation des algorithmes, réunions, spécifications, livrables, rapports).

Présentation du projet BRUEL Le projet ANR BRUEL (ElaBoRation d’Une méthodologie d’EvaLuation des systèmes d’identification par la voix) concerne l’évaluation/certification des systèmes d’identification par la voix face aux attaques adverses. En effet, les systèmes de reconnaissance automatique du locuteur sont vulnérables non seulement à la parole produite artificiellement par synthèse vocale, mais aussi à d'autres formes d'attaques telles que la conversion d’identité vocale et la relecture. Les artefacts créés lors de la création ou la manipulation de ces attaques frauduleuses constituent les marques laissées dans le signal par les algorithmes de synthèse vocale permettant ainsi de distinguer la voix réelle originale d’une voix usurpée. Dans ces conditions, la détection de l'usurpation d'identité requiert d'évaluer les contre-mesures d'usurpation d'identité en même temps que les systèmes de reconnaissance du locuteur. Le projet BRUEL ambitionne de proposer la première méthodologie d’évaluation/certification des systèmes d'identification par la voix basée sur une approche Critères Communs.

Contexte de travail Le travail sera effectué à l’IRCAM au sein de l’équipe Analyse et Synthèse des sons encadré par Nicolas Obin et Axel ROEBEL (SU, CNRS, IRCAM). Le travail pourra être mené partiellement à distance, avec la nécessité d’une participation aux réunions d’avancement du projet. L'Ircam est une association à but non lucratif, associée au Centre National d'Art et de Culture Georges Pompidou, dont les missions comprennent des activités de recherche, de création et de pédagogie autour de la musique du XXème siècle et de ses relations avec les sciences et technologies. Au sein de l'unité mixte de recherche, UMR 9912 STMS (Sciences et Technologies de la Musique et du Son) commune à l’Ircam, à Sorbonne Université, au CNRS, et au Ministère de la Culture et de la Communication, des équipes spécialisées mènent des travaux de recherche et de développement informatique dans les domaines de l'acoustique, du traitement des signaux sonores, des sciences cognitives, des technologies d’interaction, de l’informatique musicale et de la musicologie.

L'Ircam est situé au centre de Paris à proximité du Centre Georges Pompidou au 1, Place Stravinsky 75004 Paris.

Expérience et compétences requises: Nous recherchons un.e candidat.e spécialisé.e en apprentissage de réseaux de neurones profonds et en traitement automatique de la parole ou en vision, de préférence en deep fakes. Le·a candidate devra avoir une thèse de doctorat en sciences informatiques dans les domaines de l’apprentissage par réseaux de neurones profonds, ainsi que des publications dans des conférences et revues reconnues dans le domaine. Le·a candidat·e idéal·e aura:

• Une solide expertise en apprentissage machine, et en particulier en réseaux de neurones profonds.

• Une bonne expérience en traitement automatique de la parole ; de préférence dans le domaine de la génération ou des deep-fakes;

• Maîtrise du traitement du signal audio-vidéo numérique;

• Une excellente maîtrise du langage de programmation Python, de l’environnement TensorFlow pour l’apprentissage de réseaux de neurones, et du calcul distribué sur des serveurs GPUs

• Excellente maîtrise de l’anglais scientifique parlé et écrit

• Autonomie, travail en équipe, productivité, rigueur et méthodologie

Salaire Selon formation et expérience professionnelle

Candidatures Prière d'envoyer une lettre de motivation et un CV détaillant le niveau d'expérience/expertise dans les domaines mentionnés ci-dessus (ainsi que tout autre information pertinente) à Nicolas.Obin@ircam.fr et Axel.Roebel@ircam.fr Date limite de candidature 31 mai 2023

Back

Top

6-6

(2023-04-15) Chargé.e de recherche et développement (H/F) projet DeTOX, IRCAM, Paris, France

Offre d’emploi : 1 Chargé.e de recherche et développement (H/F) Génération de deep fakes audio-visuel

Disponibilité et durée : 15 mois, de préférence à partir du 01 juin 2023

Description du poste Dans le cadre du projet ASTRID DeTOX (2023-2025), l’équipe Analyse et Synthèse des sons recherche un.e chargé.e de recherche pourl’implémentation et l’apprentissage d’algorithmes pour la génération de deep fakes audio-visuels avec les missions suivantes : - Collection, implémentation, et apprentissage d’algorithmes représentatifs de l’état-del’art pour la génération de deep fakes audio et visuel - Implémentation d’un nouvel algorithme de génération de deep fakes audio-visuel avec synchronisation des deux modalités en particulier pour assurer la cohérence de la parole et du mouvement du lèvre et du bas du visage - Construction de bases de données audio-visuel des personnes ciblées et apprentissage de modèles de génération de deep fakes pour ces personnes Le.a chargé.e de recherche collaborera également avec l’équipe de développement et participera aux activités du projet (évaluation des algorithmes, réunions, spécifications, livrables, rapports). Présentation du projet DeTOX Les récents challenges ont montré qu'il était extrêmement difficile de mettre au point des détecteurs universels de vidéos hyper-truquées - à l'exemple des 'deep fakes' utilisés pour contrefaire l'identité d'une personne. Lorsque les détecteurs sont exposés à des vidéos générées par un algorithme nouveau, c'est-à-dire inconnu lors de la phase d'apprentissage, les performances sont encore extrêmement limitées. Pour la partie vidéo, les algorithmes examinent les images une par une, sans tenir compte de l'évolution de la dynamique faciale au cours du temps. Pour la partie vocale, la voix est générée de manière indépendante de la vidéo ; en particulier, la synchronisation audio-vidéo entre la voix et les mouvements des lèvres n'est pas prise en compte. Ceci constitue un point faible important des algorithmes de génération de vidéos hyper-truquées. Le présent projet vise à implémenter et à apprendre des algorithmes de détection de deepfakes personnalisés sur des individus pour lesquels on peut disposer et/ou fabriquer de nombreuses séquences audio-vidéo réelles et falsifiées. En se basant sur des briques technologiques de base en audio et vidéo récupérées de l'état de l'art, le projet se concentrera sur la prise en compte de l'évolution temporelle des signaux audio-visuels et de leur cohérence pour la génération et la détection. Nous souhaitons ainsi démontrer qu'en utilisant simultanément l’audio et la vidéo et en se focalisant sur une personne précise lors de l'apprentissage et de la détection, il est possible de concevoir des détecteurs efficaces même face à des générateurs encore non répertoriés. De tels outils permettront de scruter et de détecter sur le web d'éventuelles vidéos hyper-truquées de personnalités françaises importantes (président de la république, journalistes, chef d'étatmajor des armées, ...) et ce dès leur publication.

L'Ircam est situé au centre de Paris à proximité du Centre Georges Pompidou au 1, Place Stravinsky 75004 Paris.

Expérience et compétences requises Nous recherchons un.e candidat.e spécialisé.e en apprentissage de réseaux de neurones profonds et en traitement automatique de la parole ou en vision, de préférence en deep fakes. Le·a candidate devra avoir une thèse de doctorat en sciences informatiques dans les domaines de l’apprentissage par réseaux de neurones profonds, ainsi que des publications dans des conférences et revues reconnues dans le domaine. Le·a candidat·e idéal·e aura:

• Une solide expertise en apprentissage machine, et en particulier en réseaux de neurones profonds.

• Une bonne expérience en traitement automatique de la parole ou en vision ; de préférence dans le domaine des deep-fakes;

• Maîtrise du traitement du signal audio-vidéo numérique;

• Une excellente maîtrise du langage de programmation Python, de l’environnement TensorFlow pour l’apprentissage de réseaux de neurones, et du calcul distribué sur des serveurs GPUs

• Excellente maîtrise de l’anglais scientifique parlé et écrit

• Autonomie, travail en équipe, productivité, rigueur et méthodologie

Salaire Selon formation et expérience professionnelle

Back

Top

6-7

(2023-05-09) PhD position @ IMT Brest France and Instituto Superior Técnico Lisbon, Portugal

PhD Title: SUMMA-Sound : SUMMarization of Activities of daily living using Sound-based activity recognition Partnership:

IMT Atlantique : Campus ☒ Brest ☐ Nantes ☐ Rennes Laboratory : Lab-STICC Doctoral school : ☒ SPIN ☐ 3MG Funding: IMT Atlantique, co-tutelle with Instituto Superior Técnico

Context : IMT Atlantique, internationally recognised for the quality of its research, is a leading general engineering school under the aegis of the French Ministry of Industry and Digital Technology, ranked in the three main international rankings (THE, SHANGHAI, QS). Located on three campuses, Brest, Nantes and Rennes, IMT Atlantique aims to combine digital technology and energy to transform society and industry through training, research and innovation. It aims to be the leading French higher education and research institution in this field on an international scale. With 290 researchers and permanent lecturers, 1000 publications and 18 M€ of contracts, it supervises 2300 students each year and its training courses are based on cutting-edge research carried out within 6 joint research units: GEPEA, IRISA, LATIM, LABSTICC, LS2N and SUBATECH. The proposed thesis is part of the research activities of the team RAMBO (Robot interaction, Ambient systems, Machine learning, Behaviour, Optimization) and of the laboratory Lab-STICC and the department of Computer Science of IMT Atlantique. Scientific context: The objective of this thesis is to develop a method for collecting and summarizing domestic health-related data relevant for medical diagnosis, in a non-intrusive manner using audio information. This research addresses the lack of existing practical tools for providing high-level succinct information to medical staff on the evolution of patients they follow for health diagnostic purposes. This research is based on the assumption that valuable diagnostic data can be collected by observing short- and long-term lifestyle changes and behavioural anomalies. It relies on the latest advances in the domains of audio-based activity recognition, summarization of human activity, and health diagnosis. Research on health diagnosis in domestic environments has already explored a variety of sensors and modalities for gathering data on human health indicators [5]. Nevertheless, audio-based activity recognition is notable for its less intrusive nature. Employing state-of-the-art sound-based activity recognition models [2] to monitor domestic human activity, the thesis will investigate and develop methods for summarization of human activity [3] in a human-understandable language, in order to produce easily interpretable data by doctors who, remotely, monitor their patients [4]. This work continues and fosters the research of the RAMBO team at IMT Atlantique on ambient systems, enabling well ageing at home for the elderly adults or dependent populations [1]. We expect this thesis to provide technology likely to relieve the burden on gerontologists and elderly-care facilities, and alleviate the caregiver shortage by offering some automatic support to the task of monitoring elderly or handicapped people, enabling them to age-at-home while still being followed by medical specialists using automated means. Expected contributions of the thesis Scientific goals: (1) Determine the set of human activities relevant for health diagnosis, (2) Implement a state-of-the-art model for audio-based activity recognition and validate its function by clinicians, (3) Develop a model for summarizing the evolution of human activity over time intervals of arbitrary duration (typically spanning from days to months and possibly years). Expected outcomes of the PhD: (1) A model for semantic summarization of human activity, based on sound recognition of activities of daily living. (2) A proof of concept for this model Candidate profile and required skills: • Master Degree in Computer Science (or equivalent) • Programming and Software Engineering skills (Python, Git, Software Architecture Design) • Data science skills • Machine learning skills • English speaking and writing skills References: [1] Damien Bouchabou. “Human activity recognition in smart homes : tackling data variability using context-dependent deep learning, transfer learning and data synthesis”. Theses. Ecole nationale supérieure Mines-Télécom Atlantique, May 2022. url: https://theses.hal.science/tel-03728064. [2] Detection and Classification of Acoustic Scenes and Events (DCASE). url: https://dcase.community/challenge2022/task-soundevent-detection-in-domestic-environments (visited on 07/01/2022). [3] P Durga et al. “When less is better: A summarization technique that enhances clinical effectiveness of data”. In: Proceedings of the 2018 International Conference on Digital Health. 2018, pp. 116–120. [4] Akshay Jain et al. “Linguistic summarization of in-home sensor data”. In: Journal of Biomedical Informatics 96 (2019), p. 103240. issn: 1532-0464. [5] Mostafa Haghi Kashani et al. “A systematic review of IoT in healthcare: Applications, techniques, and trends”. In: Journal of Network and Computer Applications 192 (2021), p. 103164. Work Plan: The thesis will be organised in the following steps: (1) Definition of pertinent sounds and activities for health diagnosis, (2) Hardware set-up, (3) Dataset constitution, (4) Activity recognition, (5) Diarization of activities, (6) Summarization, (7) Validation in a real environment. Application: To apply for this position, please send an email with your Curriculum Vitae, a document with your academic results (if possible), and a couple of lines describing your motivation to pursue a PhD to mihai[dot]andries[at]imt-atlantique[dot]fr before 16 May 2023. Additional Information :  Application deadline : 16 May 2023  Start date : Fall 2023  Contract duration: 36 months  Localisation - Location : Brest (France) and Lisbon (Portugal)  Contact(s) : Mihai ANDRIES (mihai[dot]andries[at]imt-atlantique.fr) Plinio Moreno (plinio[at]isr.tecnico.ulisboa.pt)

Back

Top

6-8

(2023-05-11) PhD position @ ISIR and IRCAM Paris France

Multimodal behavior generation and style transfer for virtual agent animation

Catherine Pelachaud, Nicolas Obin catherine.pelachaud@isir.upmc.fr, nicolas.obin@ircam.fr

Humans communicate through speech but also through their hand gestures, body posture, facial expression, gaze, touch, speech prosody, etc, a wide range of multimodal signals. Verbal and non-verbal behavior play a crucial role in sending and in perceiving new information in human-human interaction. Depending on the context of communication and the audience, a person is continuously adapting its style during interaction. This stylistic adaptation implies verbal and nonverbal modalities, such as language, speech prosody, facial expressions, hand gesture, and body posture. Virtual agents, also called Embodied Conversational Agents (ECAs- see [B] for an overview), are entities that can communicate verbally and nonverbally with human interlocutors. Their roles can vary depending on the applications. They can act as a tutor, an assistant or even a companion. Matching agent’s behavior style with their interaction context ensures better engagement and adherence of the human users. A large number of generative models were proposed in the past few years for synthesizing gestures of ECAs. Lately, style modeling and transfer have been receiving an increase of attention in order to adapt the behavior of the ECA to its context and audience. The latest research proposes neural architectures including a content and a style encoders and a decoder conditioned so as to generate the ECA gestural behavior corresponding to the content and with the desired style. While the first attempts focused on modeling the style of a single speaker [4, 5, 7], there is a rapidly increasing effort towards multi-speaker and multi-style modeling and transfer [1,2]. In particular, few-shots style transfer architectures attempt to generate a gestural behavior in a certain style with the minimum amount of data of the desired style and with the minimum requirement in terms of further training or fine-tuning.

Objectives and methodology:

The aim of this PhD is to generate human-like gestural behavior in order to empower virtual agents to communicate verbally and nonverbally with different styles - extending previous thesis accomplished by Mireille Fares [A]. We view behavioral style as being pervasive while speaking; it colors the communicative behaviors while content is carried by multimodal signals but mainly expressed through text semantics. The objective is to generate ultra-realistic verbal and nonverbal behaviors (text style, prosody, facial expression, body gestures and poses) corresponding to a given content (mostly driven by text and speech), and to adapt it to a specific style. This raises methodological and fundamental challenges in the fields of machine learning and human-computer interaction: 1) How to define content and style; which modalities are involved and with which proportion in the gestural expression of content and style? 2) How do we implement efficient neural architectures to disentangle content and style information from multimodal human behavior (text, speech, gestures)? The proposed directions will leverage on the cutting-edge research in neural networks such as multimodal modeling and generation [8], information disentanglement [6], and text prompt generation as popularized by DALL-E or Chat-GPT [9].

The research questions can be summarized as follows:

· What is a multimodal style?: What are the style cues in each modality (verbal, prosody, and nonverbal behavior)? How to fuse each modality cues to build a multimodal style?

· How to control the generation of verbal and nonverbal cues using a multimodal style? How to transfer a multimodal style into generative models? How to integrate style-oriented prompts/instructions into multimodal generative models by keeping the underlying intentions to be conveyed by the agent?

· How to evaluate the generation?: How to measure the content preservation and the style transfer? How to design evaluation protocols with real users?

The PhD candidate will elaborate contributions in the field of neural multimodal behavior generation of virtual agents with a particular focus on multimodal style generation and controls:

· Learning disentangled content and style encodings from multimodal human behavior using adversarial learning, bottleneck learning, and cross-entropy / mutual information formalisms.

· Generating expressive multimodal behavior using prompt-tuning, VAE-GAN, and stable diffusion algorithms. To accomplish those objectives, we propose the following steps:

· Analyzing corpus to identify style and content cues in different modalities.

· Proposing generative models for multimodal style transfer according to different control levels (human mimicking or prompts/instructions)

· Evaluating the proposed models with dedicated corpus (e.g. PATS) and with real users. Different criterias will be evaluated: content preservation, style transfer, coherence of the ECA overall modalities. When evaluated with a human user, we envision measuring the user’s engagement, their knowledge memorization and preferences.

Supervision team

Catherine Pelachaud is director of research CNRS at ISIR working on embodied conversational agent, affective computing and human-machine interaction.
[A] M. Fares, C. Pelachaud, N. Obin (2022) Transformer Network for Semantically-Aware and Speech-Driven Upper-Face Generation, in EUSIPCO [B] C. Pelachaud, C. Busso, D. Heylen (2021), Multimodal behavior modeling for socially interactive agents. The Handbook on Socially Interactive Agents: 20 Years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, and Social Robotics Volume 1: Methods, Behavior, Cognition
Nicolas Obin is associate professor at Sorbonne Université and research scientist at Ircam in human speech generation, vocal deep fake, and multimodal generation.
[C] L. Benaroya, N. Obin, A. Roebel (2023). Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations. In Entropy 25 (2), 375
[D] F. Bous, L. Benaroya, N. Obin, A. Roebel (2022) Voice Reenactment with F0 and timing constraints and adversarial learning of conversions The supervision team is used to publish in high-venue conferences and journals in machine learning (e.g., AAAI, ICLR, DMKD), natural language processing & information access (e.g., EMNLP, SIGIR), agents (e.g., AAMAS), and speech (Interspeech).

Required Experiences and Skills

· Master or engineering degree in Computer Science or Applied Mathematics /knowledge in deep learning

· Very proficient in Python (NumPy, SciPy), TensorFlow/Pytorch environment, and distributed computation (GPU)

· High productivity, capacity for methodical and autonomous work, good communication skills.

Environment

The PhD will be hosted by two laboratories ISIR and IRCAM experts in the fields of machine learning, natural language / speech/ human behavior processing, and virtual agents with the support of the Sorbonne Center for Artificial Intelligence (SCAI). The PhD candidate is expected to publish in the most prominent conferences and journals in the domain (such as: ICML, EMNLP, AAAI, IEEE TAC, AAMAS, IVA, ICMI, etc...). SCAI is equipped with a cluster of 30 nodes: 100 GPU cards and a processor of 1800 TFLOPS / FP32. The candidate can also use the Jean Zay cluster hosted by the CNRS-IDRIS.

References

[1] C. Ahuja, D. Won Lee, and L.-P. Morency. 2022. Low-Resource Adaptation for Personalized Co-Speech Gesture Generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] S. Alexanderson, G. Eje Henter, T. Kucherenko, and J. Beskow. 2020. Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows. In Computer Graphics Forum, Vol. 39. 487–496.
[3] P. Bordes, E. Zablocki, L. Soulier, B. Piwowarski, P. Gallinari (2019). Incorporating Visual Semantics into Sentence Representations within a Grounded Space. In EMNLP/IJCNLP
[4] D. Cudeiro, T. Bolkart, C. Laidlaw, A. Ranjan, A., and M.J. Black. (2019). Capture, learning, and synthesis of 3D speaking styles. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10101–10111
[5] S. Ginosar, A. Bar, G. Kohavi, C., Chan, A., Owens and J. Malik. 2019. Learning individual styles of conversational gesture. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
[6] S. Subramanian, G. Lample, E.M. Smith, L. Denoyer, M.'A. Ranzato, Y-L. Boureau (2018). Multiple-Attribute Text Style Transfer. CoRR abs/1811.00552
[7] T. Karras, T. Aila, S. Laine, A. Herva, and J. Lehtinen. 2017. Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Transactions on Graphics (TOG) 36, 1–12
[8] C. Rebuffel, M. Roberti, L. Soulier, G. Scoutheeten, R. Cancelliere, P. Gallinari (2022). Controlling hallucinations at word level in data-to-text generation. In Data Min. Knowl. Discov. 36(1): 318-354
[9] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, Y. Shao, W. Zhang, M-H Yang, B Cui (2022). Diffusion Models: A Comprehensive Survey of Methods and Applications. CoRR abs/2209.00796

Back

Top

6-9

(2023-05-11) PhD-student position in artificial intelligence, human-robot interaction @ KTH, Stockholm, Sweden

We are looking for a PhD student who is interested in Artificial Intelligence, Machine Learning, Natural Language Processing and Human-Robot Interaction. The doctoral student will work in a newly funded project at the Department of Speech, Music and Hearing within the School of Electrical Engineering and Computer Science at KTH. The project is financed by the Swedish AI-program WASP (Wallenberg AI, Autonomous Systems and Software Program) which offers a graduate school with research visits, partner universities, and visiting lecturers.

The newly started project is titled 'Anticipatory Control in Conversational Human-Robot Interaction'. The aim of the project is to use self-supervised learning to develop generic language models for human-robot interaction and explore how such models can be used in real time to predict and anticipate human behavior and thereby improve the interaction. Whereas traditional language models in NLP (such as BERT, GPT) have focused on written language, we want to model multi-modal conversation, where aspects such as engagement, turn-taking, and incremental processing are of importance. This means that the models will have to process both text, audio and video, including aspects such as how the human users move and their facial expressions.

In collaboration with industry and other projects, we will then explore how such models can be used for social robotic applications. Another important focus will be on model analysis and visualization.

For more information about the position, see https://www.kth.se/en/om/work-at-kth/lediga-jobb/what:job/jobID:623390/where:4/

If you have any questions, don’t hesitate to contact Prof. Gabriel Skantze

(skantze@kth.se)

Back

Top

6-10

(2023-05-15) Post-doctoral and engineer positions@ LORIA-INRIA, Nancy, France

Automatic speech recognition for non-natives speakers in a noisy environment

Post-doctoral and engineer positions

Starting date: July-September of 2023

Duration: 24 months for a post-doc position and 12 months for an engineer position

Supervisors: Irina Illina, Associate Professor, HDR Lorraine University LORIA-INRIA Multispeech Team, illina@loria.fr

Emmanuel Vincent, Senion Research Scientist & Head of Science, INRIA Multispeech Team, emmanuel.vincent@inria.fr

http://members.loria.fr/evincent/

Cons: the application must meet the requirements of the French Directorate General of Armament (Direction générale de l'armement, DGA).

Context

When a person has their hands busy performing a task like driving a car or piloting an airplane, voice is a fast and efficient way to achieve interaction. In aeronautical communications, the English language is most often compulsory. Unfortunately, a large part of the pilots are not native English and speak with an accent dependent on their native language and are therefore influenced by the pronunciation mechanisms of this language. Inside an aircraft cockpit, non-native voice of the pilots and the surrounding noises are the most difficult challenges to overcome in order to have efficient automatic speech recognition (ASR). The problems of non-native speech are numerous: incorrect or approximate pronunciations, errors of agreement in gender and number, use of non-existent words, missing articles, grammatically incorrect sentences, etc. The acoustic environment adds a disturbing component to the speech signal. Much of the success of speech recognition relies on the ability to take into account different accents and ambient noises into the models used by ARP.

Automatic speech recognition has made great progress thanks to the spectacular development of deep learning. In recent years, end-to-end automatic speech recognition, which directly optimizes the probability of the output character sequence based on the input acoustic characteristics, has made great progress [Chan et al., 2016; Baevski et al., 2020; Gulati, et al., 2020].

Objectives

The recruited person will have to develop methodologies and tools to obtain high-performance non-native automatic speech recognition in the aeronautical context and more specifically in a (noisy) aircraft cockpit.

This project will be based on an end-to-end automatic speech recognition system [Shi et al., 2021] using wav2vec 2.0 [Baevski et al., 2020]. This model is one of the most efficient of the current state of the art. This wav2vec 2.0 model enables self-supervised learning of representations from raw audio data (without transcription).

How to apply: Interested candidates are encouraged to contact Irina Illina (illina@loria.fr) with the required documents (CV, transcripts, motivation letter, and recommendation letters).

Requirements & skills:

- Ph.D. degree in speech/audio processing, computer vision, machine learning, or in a related field,

- ability to work independently as well as in a team,

- solid programming skills (Python, PyTorch), and deep learning knowledge,

- good level of written and spoken English.

References

[Baevski et al., 2020] A. Baevski, H. Zhou, A. Mohamed, and M. Auli. Wav2vec 2.0: A framework for self-supervised learning of speech representations, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 2020.

[Chan et al., 2016] W. Chan, N. Jaitly, Q. Le and O. Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4960-4964, 2016.

[Chorowski et al., 2017] J. Chorowski, N. Jaitly. Towards better decoding and language model integration in sequence to sequence models. Interspeech, 2017.

[Houlsby et al., 2019] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly. Parameter-efficient transfer learning for NLP. International Conference on Machine Learning, PMLR, pp. 2790–2799, 2019.

[Gulati et al., 2020] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang. Conformer: Convolution-augmented transformer for speech recognition. Interspeech, 2020.

[Shi et al., 2021] X. Shi, F. Yu, Y. Lu, Y. Liang, Q. Feng, D. Wang, Y. Qian, and L. Xie. The accented english speech recognition challenge 2020: open datasets, tracks, baselines, results and methods. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6918–6922, 2021.

Back

Top

6-11

(2023-05-18) PhD position @ GIPSA Lab, Grenoble, France

Nous proposons une offre de thèse en acoustique-aérodynamique-mécatronique de la parole, dans le cadre du projet ANR AVATARS (« Artificial Voice production: control of bio-inspired port-HAmilToniAn numeRical and mechatronic modelS », 2023-2027). Le sujet porte sur la 'Caractérisation du comportement vocal humain dans la parole et dans le chant sur banc mécatronique robotisé. Application au développement de plis vocaux biomimétiques.'

Pour plus d'information et pour candidater :
https://www.gipsa-lab.grenoble-inp.fr/~nathalie.henrich/docs/PhDposition_ANR-AVATARS_GIPSA_2023.pdf

Back

Top

6-12

(2023-05-20) Poste d' enseignant chercheur, Grenoble, France

Nous recherchons pour l'année 2023-2024 une personne pour un CDD 50% enseignement et
recherche en Phonétique.

Les infos sont à retrouver ici:
https://emploi.univ-grenoble-alpes.fr/offres/enseignants-enseignants-chercheurs-contractuels/enseignant-e-chercheur-e-en-phonetique-1240660.kjsp?RH=TLK_CDD_ENS

Back

Top

6-13

(2023-05-22) PhD position @Lille and Grenoble, France

Nous recherchons un·e candidat·e pour une thèse sur la modélisation computationnelle du
lien perception-production en parole. La thèse sera codirigée par A. Basirat
(https://scalab.univ-lille.fr) et J. Diard (https://lpnc.univ-grenoble-alpes.fr/). Les
informations concernant le projet, les compétences attendues et la modalité de
candidature sont disponibles à l’adresse ci-dessous :

https://emploi.cnrs.fr/Offres/Doctorant/UMR9193-ANABAS-001/Default.aspx

Back

Top

6-14

(2023-05-22) PhD Causal Machine Learning Applied to NLP and the Study of Large Language Models, Grenoble, France

Job Offer: PhD Causal Machine Learning Applied to NLP and the Study of Large Language
Models.
Starting date: November 1st, 2023 (flexible)
Application deadline: From now until the position is filled
Interviews (tentative): Beginning of June and latter if the position is still open
Salary: ~2000€ gross/month (social security included)
Mission: research oriented (teaching possible but not mandatory)
Place of work (no remote): Laboratoire d'Informatique de Grenoble, CNRS, Grenoble, France

Keywords: natural language processing, causal machine learning, interpretability,
analysis, robustness, large language models, controllability

Description:
Natural language processing (NLP) has undergone a paradigm shift in recent years, owing
to the remarkable breakthroughs achieved by large language models (LLMs). Despite being
purely 'correlation machines' [CorrelationMachine], these models have completely altered
the landscape of NLP by demonstrating impressive results in language modeling,
translation, and summarization. Nonetheless, the use of LLMs has also surfaced crucial
questions regarding their reliability and transparency. As a result, there is now an
urgent need to gain a deeper understanding of the mechanisms governing the behavior of
LLMs, to interpret their decisions and outcomes in principled and scientifically grounded
ways.

A promising direction to carry out such analysis comes from the fields of causal analysis
and causal inference [CausalAbstraction]. Examining the causal relationships between the
inputs, outputs, and hidden states of LLMs, can help to build scientific theories about
the behavior of these complex systems. Furthermore, causal inference methods can help
uncover underlying causal mechanisms behind the complex computations of LLMs, giving hope
to better interpret their decisions and understand their limitations [Rome].

Thus, the use of causal analysis in the study of LLMs is a promising research direction
to gain deeper insights into the workings of these models.
As a Ph.D student working on this project, you will be expected to develop a strong
understanding of the principles of causal inference and their application to machine
learning, see for example the invariant language model framework [InvariantLM]. You will
have the opportunity to work on cutting-edge research projects in NLP, contributing to
the development of more reliable and interpretable LLMs. It is important to note that the
Ph.D. research project should be aligned with your interests and expertise. Therefore,
the precise direction of the research can and will be influenced by the personal taste
and research goals of the students. It is encouraged that you bring your unique
perspective and ideas to the table.

SKILLS
Master degree in Natural Language Processing, computer science or data science.
Mastering Python programming and deep learning frameworks.
Experience in causal inference or working with LLMs
Very good communication skills in English, (French not needed).

SCIENTIFIC ENVIRONMENT
The thesis will be conducted within the Getalp teams of the LIG laboratory
(https://lig-getalp.imag.fr/). The GETALP team has a strong expertise and track record in
Natural Language Processing. The recruited person will be welcomed within the team which
offer a stimulating, multinational and pleasant working environment.
The means to carry out the PhD will be provided both in terms of missions in France and
abroad and in terms of equipment. The candidate will have access to the cluster of GPUs
of both the LIG. Furthermore, access to the National supercomputer Jean-Zay will enable
to run large scale experiments.
The Ph.D. position will be co-supervised by Maxime Peyrard and François Portet.
Additionally, the Ph.D. student will also be working with external academic collaborators
at EPFL and Idiap (e.g., Robert West and Damien Teney)

INSTRUCTIONS FOR APPLYING
Applications must contain: CV + letter/message of motivation + master notes + be ready to
provide letter(s) of recommendation; and be addressed to Maxime Peyrard
(maxime.peyrard@epfl.ch) and François Portet (francois.Portet@imag.fr)

[InvariantLM] Peyrard, Maxime and Ghotra, Sarvjeet and Josifoski, Martin and Agarwal,
Vidhan and Patra, Barun and Carignan, Dean and Kiciman, Emre and Tiwary, Saurabh and
West, Robert, 'Invariant Language Modeling' Conference on Empirical Methods in Natural
Language Processing (2022): 5728–5743

[CorrelationMachine] Feder, Amir and Keith, Katherine A. and Manzoor, Emaad and Pryzant,
Reid and Sridhar, Dhanya and Wood-Doughty, Zach and Eisenstein, Jacob and Grimmer, Justin
and Reichart, Roi and Roberts, Margaret E. and Stewart, Brandon M. and Veitch, Victor and
Yang, Diyi, 'Causal Inference in Natural Language Processing: Estimation, Prediction,
Interpretation and Beyond' Transactions of the Association for Computational Linguistics
(2022), 10:1138–1158.

[CausalAbstraction] Geiger, Atticus and Wu, Zhengxuan and Lu, Hanson and Rozner, Josh and
Kreiss, Elisa and Icard, Thomas and Goodman, Noah and Potts, Christopher, 'Inducing
Causal Structure for Interpretable Neural Networks' Proceedings of Machine Learning
Research (2022): 7324-7338.

[Rome] Meng, Kevin, et al. 'Locating and Editing Factual Associations in GPT.' Advances
in Neural Information Processing Systems 35 (2022): 17359-17372.

Back

Top

6-15

(2023-05-27) PhD position@ EECS-KTH, Stockholm, Sweden

The School of Electrical Engineering and Computer Science (EECS) at the KTH Royal Institute of Technology has an open Ph.D position in Social Robotics at the division of Robotics, Perception and Learning (RPL).

ABOUT KTH

KTH Royal Institute of Technology in Stockholm has grown to become one of Europe’s leading technical and engineering universities, as well as a key center of intellectual talent and innovation. We are Sweden’s largest technical research and learning institution and home to students, researchers and faculty from around the world. Our research and education covers a wide area including natural sciences and all branches of engineering, as well as in architecture, industrial management, urban planning, history and philosophy.

PROJECT DESCRIPTION

This project addresses the challenge of how to enable robots to learn in a scalable and cost-efficient manner by gradually acquiring new knowledge from non-expert, semi-situated teachers. To achieve this, computational methods will be developed for robots to query the semi-situated teachers (e.g. crowd workers) and incorporate the newly acquired knowledge into their existing decision-making to further use in situ. This project is funded by the Swedish Foundation for Strategic Research.

The starting date for the positions is flexible, but preferably during the fall of 2023.

QUALIFICATIONS

The candidate must have a degree in Computer Science or related fields. Documented written and spoken English and programming skills are required. Experience with robotics, human-robot interaction, human-computer interaction, multimodal interaction and machine learning are important.

HOW TO APPLY

The application should include:

1. Curriculum vitae.

2. Transcripts from University/College.

3. Brief description of why the applicant wishes to become a doctoral student.

The application documents should be uploaded using the KTH's recruitment system. More information here:

https://www.kth.se/en/om/work-at-kth/lediga-jobb/what:job/jobID:625926/where:4/

The application deadline is ** June 2, 2023 **

Back

Top

6-16

(2023-05-28) Ph.D. position: Automatic speech recognition for non-natives speakers in a noisy environment, LORIA-INRIA, Nancy, France

Automatic speech recognition for non-natives speakers in a noisy environment

Ph.D. position

Starting date: September-October 2023

Duration: 36 months

Supervisors: Irina Illina, Associate Professor, HDR Lorraine University LORIA-INRIA Multispeech Team, illina@loria.fr

Emmanuel Vincent, Senior Research Scientist & Head of Science, INRIA Multispeech Team, emmanuel.vincent@inria.fr

http://members.loria.fr/evincent/

Cons: the application must meet the requirements of the French Directorate General of Armament (Direction générale de l'armement, DGA).

Context

When a person has their hands busy performing a task like driving a car or piloting an airplane, voice is a fast and efficient way to achieve interaction. In aeronautical communications, the English language is most often compulsory. Unfortunately, a large part of the pilots are not native English and speak with an accent dependent on their native language and are therefore influenced by the pronunciation mechanisms of this language. Inside an aircraft cockpit, the non-native voice of the pilots and the surrounding noises are the most difficult challenges to overcome in order to have efficient automatic speech recognition (ASR). The problems of non-native speech are numerous: incorrect or approximate pronunciations, errors of agreement in gender and number, use of non-existent words, missing articles, grammatically incorrect sentences, etc. The acoustic environment adds a disturbing component to the speech signal. Much of the success of speech recognition relies on the ability to take into account different accents and ambient noises in the models used by ASR.

Objectives

How to apply: Interested candidates are encouraged to contact Irina Illina (illina@loria.fr) with the required documents (CV, transcripts, motivation letter, and recommendation letters).

Requirements & skills:

- Master's degree in speech/audio processing, computer vision, machine learning, or in a related field,

- ability to work independently as well as in a team,

- solid programming skills (Python, PyTorch), and deep learning knowledge,

- good level of written and spoken English.

References

[Chorowski et al., 2017] J. Chorowski, N. Jaitly. Towards better decoding and language model integration in sequence to sequence models. Interspeech, 2017.

Back

Top

6-17

(2023-05-30) PhD position in NLP, @ Jozef International postgraduate School (Slovenia) and La Rochelle University (France).

We are looking for candidates for a fully funded research Ph.D. position in the field of news analysis. The candidate will be enrolled in a joint doctoral programme between Jozef Stefan International Postgraduate School (Slovenia) and La Rochelle University (France). The candidate will be co-supervised by asst. Prof. Dr. Senja Pollak and Prof. Dr. Antoine Doucet. Call open until: June 12, 2023.

The candidates will:

Stay 18 months in Slovenia and 18 months in France
Will receive full fellowship for 3 years
Will be part of research groups at Jožef Stefan Institute (Dept. of Knowledge Technologies) and L3i (La Rochelle University)

Possible topics:

News analysis
Opinion mining
Historical document processing
Diachronic analysis
Cross-lingual analysis
Other related topics

The doctoral candidate will benefit from the context of two active research groups, involving several other PhD students and postdoctoral fellows on each site. The collaboration builds up on two recent Horizon 2020 projects coordinated in Ljubljana and La Rochelle (Embeddia and NewsEye, respectively).

The application should contain:

A CV, including a list of past publications if available, grade from MSc studies, computational knowledge including natural language processing experience if available
Motivation letter (1 A4 page)
Contact of 2 referees
Transcript of MSc grades

The candidates are expected to have:

Excellent knowledge of the English language
Completed second-cycle university study programme or comparable education by September 1st 2023
Interest in scientific research
Good programming skills
Experience in natural language processing would be a plus

Apply by June 12 2023 by mail: senja.pollak@ijs.si, antoine.doucet@univ-lr.fr with subject “Slovenian-French PhD fellowship”

Back

Top

6-18

(2023-06-01) PhD position @ Computer Science Lab in Bordeaux, France (LaBRI) and the LORIA (Nancy, France)

In the framework of the PEPR Santé numérique “Autonom-Health” project (Health, behaviors and autonomous digital technologies), the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) and the LORIA (Nancy, France) are looking for candidates for a fully funded PhD position (36 months).

The « Autonom-Health » project is a collaborative project on digital health between SANPSY, LaBRI, LORIA, ISIR and LIRIS. The abstract of the « Autonom-Health » project can be found at the end of this email.

The missions that will be addressed by the retained candidates are among these tasks, according to the profile of the candidate:

- Data collection tasks:
- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)
- Collection of patient/doctor interactions during clinical interviews
- ASR-related tasks
- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,
- Adaptation of the ASR system to clinical interviews domain,
- Automatic phonetic transcription / alignment using end2end architectures
- Adapting ASR transcripts to be used with semantic analysis tools developed at LORIA
- Speech analysis tasks
- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.

The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the LORIA teams : « Multispeech » (contact: Emmanuel Vincent emmanuel.vincent@inria.fr) and/or the « Sémagramme » (contact: Maxime Amblard maxime.amblard@loria.fr).

Gross salary: approx. 2044 €/month

Starting date: October 2023
Required qualifications: Master in Signal processing / speech analysis / computer science Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, excellent command of French (interactions with French patients and clinicians), good level of scientific English. Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design. Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.

Applications:
To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV, cover letter (describing your personal qualifications, research interests and motivation for applying), contact information of two referees and academic certificates (Master, Bachelor certificates).

——
Abstract of the « Autonom-Health » project:

Western populations face an increase of longevity which mechanically increases the number of chronic disease patients to manage. Current healthcare strategies will not allow to maintain a high level of care with a controlled cost in the future and E health can optimize the management and costs of our health care systems. Healthy behaviors contribute to prevention and optimization of chronic diseases management, but their implementation is still a major challenge. Digital technologies could help their implementation through numeric behavioral medicine programs to be developed in complement (and not substitution) to the existing care in order to focus human interventions on the most severe cases demanding medical interventions.

However, to do so, we need to develop digital technologies which should be: i) Ecological (related to real-life and real-time behavior of individuals and to social/environmental constraints); ii) Preventive (from healthy subjects to patients); iii) Personalized (at initiation and adapted over the course of treatment) ; iv) Longitudinal (implemented over long periods of time) ; v) Interoperated (multiscale, multimodal and high-frequency); vi) Highly acceptable (protecting users’ privacy and generating trustability).

The above-mentioned challenges will be disentangled with the following specific goals: Goal 1: Implement large-scale diagnostic evaluations (clinical and biomarkers) and behavioral interventions (physical activities, sleep hygiene, nutrition, therapeutic education, cognitive behavioral therapies...) on healthy subjects and chronic disease patients. This will require new autonomous digital technologies (i.e. virtual Socially Interactive Agents SIAs, smartphones, wearable sensors). Goal 2: Optimize clinical phenotyping by collecting and analyzing non-intrusive data (i.e. voice, geolocalisation, body motion, smartphone footprints, ...) which will potentially complement clinical data and biomarkers data from patient cohorts. Goal 3: Better understand psychological, economical and socio-cultural factors driving acceptance and engagement with the autonomous digital technologies and the proposed numeric behavioral interventions. Goal 4: Improve interaction modalities of digital technologies to personalize and optimize long-term engagement of users. Goal 5: Organize large scale data collection, storage and interoperability with existing and new data sets (i.e, biobanks, hospital patients cohorts and epidemiological cohorts) to generate future multidimensional predictive models for diagnosis and treatment.

Each goal will be addressed by expert teams through complementary work-packages developed sequentially or in parallel. A first modeling phase (based on development and experimental testings), will be performed through this project. A second phase funded via ANR calls will allow to recruit new teams for large scale testing phase.

This project will rely on population-based interventions in existing numeric cohorts (i.e KANOPEE) where virtual agents interact with patients at home on a regular basis. Pilot hospital departments will also be involved for data management supervised by information and decision systems coordinating autonomous digital Cognitive Behavioral interventions based on our virtual agents. The global solution based on empathic Human-Computer Interactions will help targeting, diagnose and treat subjects suffering from dysfunctional behavioral (i.e. sleep deprivation, substance use...) but also sleep and mental disorders. The expected benefits from such a solution will be an increased adherence to treatment, a strong self-empowerment to improve autonomy and finally a reduction of long-term risks for the subjects and patients using this system. Our program should massively improve healthcare systems and allow strong technological transfer to information systems / digital health companies and the pharma industry.

Back

Top

6-19

(2023-06-02) Open faculty position at KU Leuven, Belgium: junior professor in Synergistic Processing of Multisensory Data for Audio-Visual Understanding

Open faculty position at KU Leuven, Belgium: junior professor in Synergistic Processing
of Multisensory Data for Audio-Visual Understanding

KU Leuven's Faculty of Engineering Science has an open position for a junior professor
(tenure track) in the area of audiovisual understanding. The successful candidate will
conduct research on synergetic processing of multisensory data for audio-visual
understanding, teach courses in the Master of Engineering Science and supervise students
in the Master and PhD programs. The candidate will be embedded in the PSI research
division of the Department of Electrical Engineering. More information is available at
https://www.kuleuven.be/personeel/jobsite/jobs/60193566?hl=en&lang=en . The deadline for
applications is September 29, 2023.

KU Leuven is committed to creating a diverse environment. It explicitly encourages
candidates from groups that are currently underrepresented at the university to submit
their applications.

Back

Top

6-20

(2023-06-04) PhD in ML/NLP @ Dauphine Université PSL, Paris and Université Grenoble Alpes, France

PhD in ML/NLP – Fairness and self-supervised learning for speech processing
Starting date: October 1st, 2023 (flexible)
Application deadline: June 9th, 2023
Interviews (tentative): June 14th, 2023

Salary: ~2000€ gross/month (social security included)

Mission: research oriented (teaching possible but not mandatory)

Keywords: speech processing, fairness, bias, self-supervised learning, evaluation metrics

CONTEXT

This thesis is in the context of the ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive and Innovative Speech Technologies). Self-supervised learning (SSL) has recently emerged as one of the most promising artificial intelligence (AI) methods as it becomes now feasible to take advantage of the colossal amounts of existing unlabeled data to significantly improve the performances of various speech processing tasks.

PROJECT OBJECTIVES

Speech technologies are widely used in our daily life and are expanding the scope of our action, with decision-making systems, including in critical areas such as health or legal aspects. In these societal applications, the question of the use of these tools raises the issue of the possible discrimination of people according to criteria for which society requires equal treatment, such as gender, origin, religion or disability... Recently, the machine learning community has been confronted with the need to work on the possible biases of algorithms, and many works have shown that the search for the best performance is not the only goal to pursue [1]. For instance, recent evaluations of ASR systems have shown that performances can vary according to the gender but these variations depend both on data used for learning and on models [2]. Therefore such systems are increasingly scrutinized for being biased while trustworthy speech technologies definitely represents a crucial expectation.

Both the question of bias and the concept of fairness have now become important aspects of AI, and we now have to find the right threshold between accuracy and the measure of fairness. Unfortunately, these notions of fairness and bias are challenging to define and their
meanings can greatly differ [3].

The goals of this PhD position are threefold:

- First make a survey on the many definitions of robustness, fairness and bias with the aim of coming up with definitions and metrics fit for speech SSL models

- Then gather speech datasets with high amount of well-described metadata

- Setup an evaluation protocol for SSL models and analyzing the results.

SKILLS

Master 2 in Natural Language Processing, Speech Processing, computer science or data science.
Good mastering of Python programming and deep learning framework.
Previous experience in bias in machine learning would be a plus
Very good communication skills in English
Good command of French would be a plus but is not mandatory

SCIENTIFIC ENVIRONMENT

The PhD position will be co-supervised by Alexandre Allauzen (Dauphine Université PSL, Paris) and Solange Rossato and François Portet (Université Grenoble Alpes). Joint meetings are planned on a regular basis and the student is expected to spend time in both places. Moreover, two other PhD positions are open in this project. The students, along with the partners will closely collaborate. For instance, specific SSL models along with evaluation criteria will be developed by the other PhD students. Moreover, the PhD student will collaborate with several team members involved in the project in particular the two other PhD candidates who will be recruited and the partners from LIA, LIG and Dauphine Université PSL, Paris. The means to carry out the PhD will be provided both in terms of missions in France and abroad and in terms of equipment. The candidate will have access to the cluster of GPUs of both the LIG and Dauphine Université PSL. Furthermore, access to the National supercomputer Jean-Zay will enable to run large scale experiments.

INSTRUCTIONS FOR APPLYING

Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Alexandre Allauzen (alexandre.allauzen@espci.psl.eu), Solange Rossato (Solange.Rossato@imag.fr) and François Portet (francois.Portet@imag.fr). We celebrate diversity and are committed to creating an inclusive environment for all employees.

REFERENCES:

[1] Mengesha, Z., Heldreth, C., Lahav, M., Sublewski, J. & Tuennerman, E. “I don’t Think These Devices are Very Culturally Sensitive.”—Impact of Automated Speech Recognition Errors on African Americans. Frontiers in Artificial Intelligence 4. issn: 2624-8212. https://www.frontiersin.org/article/10.3389/frai.2021.725911 (2021).

[2] Garnerin, M., Rossato, S. & Besacier, L. Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech in Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing (2021), 86–92.
[3] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A Survey on Bias and Fairness in Machine Learning. ACMComput. Surv. 54. issn: 0360-0300. https://doi.org/10.1145/3457607 (July 2021).

Back

Top

6-21

(2023-06-06) Postdoc in recognition and translation @LABRI, Bordeaux, France

In the framework of the European FETPROACT « Fvllmonti » project and the PEPR Santé numérique “Autonom-Health” project, the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) is looking for candidates for a 24-months post-doctoral position.

The « Fvllmonti » project is a collaborative project on new transistor architectures applied to speech recognition and machine translation between IMS, LaBRI, LAAS, INL, EPFL, GTS and Namlab. More information on the project is available at www.fvllmonti.eu

The missions that will be addressed by the retained candidate are among these selected tasks, according to the profile of the candidate:

- Data collection tasks:

- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)

- ASR-related tasks

- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,

- Automatic phonetic transcription / alignment using end2end architectures

- Speech analysis tasks:

- Automatic social affect/emotion/attitudes recognition on speech samples

- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.

The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the « Multispeech » (contact: Emmanuel Vincent) and/or the « Sémagramme » (contact: Maxime Amblard) teams at LORIA.

Gross salary: approx. 2686 €/month

Starting data: As soon as possible

Required qualifications: PhD in Signal processing / speech analysis / computer science / language sciences

Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, good command of French (interactions with French patients and clinicians), good level of scientific English.

Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design.

Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.

Applications:

To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV (including publication list), cover letter (describing your personal qualifications, research interests and motivation for applying), evidence for software development experience (active Github/Gitlab profile or similar), two of your key publications, contact information of two referees and academic certificates (PhD, Diploma/Master, Bachelor certificates).

——
Abstract of the « Autonom-Health » project:

Western populations face an increase of longevity which mechanically increases the number of chronic disease patients to manage. Current healthcare strategies will not allow to maintain a high level of care with a controlled cost in the future and E health can optimize the management and costs of our health care systems. Healthy behaviors contribute to prevention and optimization of chronic diseases management, but their implementation is still a major challenge. Digital technologies could help their implementation through numeric behavioral medicine programs to be developed in complement (and not substitution) to the existing care in order to focus human interventions on the most severe cases demanding medical interventions.

The missions that will be addressed by the retained candidate are among these selected tasks, according to the profile of the candidate:

- Data collection tasks:

- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)

- ASR-related tasks

- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,

- Automatic phonetic transcription / alignment using end2end architectures

- Speech analysis tasks:

- Automatic social affect/emotion/attitudes recognition on speech samples

- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.

Gross salary: approx. 2686 €/month

Starting data: As soon as possible

Required qualifications: PhD in Signal processing / speech analysis / computer science / language sciences

Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design.

Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.

Applications:

The missions that will be addressed by the retained candidate are among these selected tasks, according to the profile of the candidate:

- Data collection tasks:

- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)

- ASR-related tasks

- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,

- Automatic phonetic transcription / alignment using end2end architectures

- Speech analysis tasks:

- Automatic social affect/emotion/attitudes recognition on speech samples

- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.

Gross salary: approx. 2686 €/month

Starting data: As soon as possible

Required qualifications: PhD in Signal processing / speech analysis / computer science / language sciences

Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design.

Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.

Applications:

The missions that will be addressed by the retained candidate are among these selected tasks, according to the profile of the candidate:

- Data collection tasks:

- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)

- ASR-related tasks

- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,

- Automatic phonetic transcription / alignment using end2end architectures

- Speech analysis tasks:

- Automatic social affect/emotion/attitudes recognition on speech samples

- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.

Gross salary: approx. 2686 €/month

Starting data: As soon as possible

Required qualifications: PhD in Signal processing / speech analysis / computer science / language sciences

Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design.

Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.

Applications:

Back

Top

6-22

(2023-06-02) Transcriptors for ELDA Paris France

ELDA (Evaluations and Language resources Disctribution Agency) looks for full/part time transcriptors for transcription of phone calls in the financial domain.

Location: ELDA (Paris-France)

Latest starting date: July 2023

Languages and mission details

German - 21h of speech to be transcribed + phonetisation task;
Spanish - 1h of speech to be transcribed;
Italian - 1h of speech to be transcribed;
Japanese - 21h of speech to transcribe + phonetisation task;
Polish - number of hours vary as it is still being tested.

Profile

Native expertise in the selected language with a very good level of spelling and grammar;
Good knowledge of French and/or English;
Good computer skills;
Ability to integrate and scrupulously follow transcription rules.

Salary and duration

Starting from SMIC depending on skills;
Approximately 3 months (FT for the longer assignments).

Application

Send your CV to <lucille@elda.org> ou <gabriele@elda.org>;
Include [Transcription *language*] in the object field.

Back

Top

6-23

(2023-06-08) Postdoc @ ENS,Paris, France

DRhyaDS

A new framework for understanding the Dynamic Rhythms and Decoding of Speech

Job Title - Postdoctoral Researcher

Disciplines and Areas of Research - Speech science, Psycholinguistics, Psychoacoustics

Contract Duration - 1 Year

Research Overview: The DRhyaDS project aims to develop a new framework for understanding the dynamic rhythms and decoding of speech. It focuses on exploring the temporal properties of speech and their contribution to speech perception. The project challenges the conventional view that speech rhythm perception relies on a one-to-one association between specific modulation frequencies in the speech signal and linguistic units. One of the key objectives of the project is to investigate the impact of language-specific temporal characteristics on speech dynamics. The project team will analyze two corpora of semi-spontaneous speech data from French and German, representing syllable-timed and stress-timed languages, respectively. Various acoustic analyses will be conducted on these speech corpora to explore the variability of slow temporal modulations in speech at an individual level. This comprehensive acoustic exploration will involve extracting and analyzing prosody, spectral properties, temporal dynamics, and rhythmic patterns. By examining these acoustic parameters, the project aims to uncover intricate details about the structure and variation of speech signals across languages and speakers, contributing to a more nuanced understanding of the dynamic nature of spoken language and its role in human communication. Environment: The selected candidate will be an integral part of an international research team and will work in a collaborative and stimulating lab environment. The project brings together a FrancoGerman team of experts in linguistics, psychoacoustics and cognitive neuroscience, led by Dr. Léo Varnet (CNRS, ENS Paris) and Dr. Alessandro Tavano (Max Planck Institute, Goethe University Frankfurt). The successful candidate will work under the supervision of Dr Léo Varnet, at the Laboratoire des Systèmes Perceptifs (ENS Paris).

Job description: This is a one-year postdoctoral contract position, offering a net salary in accordance with French legislation (~2500€/month + social and medical benefits). Women and minorities are strongly encouraged to apply. The successful candidate will participate in research activities, collaborate with team members, and contribute to scientific publications and communications. Additionally, they will have the autonomy to suggest and implement their own analysis techniques and approaches. Their responsibilities will include: - Taking a lead role in collecting a comprehensive corpus of French speech data, adhering to a rigorous data collection protocol - Collaborating closely with the German team to leverage the existing German speech corpus for comparative analysis and cross-linguistic investigations - Conducting in-depth acoustic analysis of the corpora, employing advanced techniques to investigate the variability and dynamics of slow temporal modulations in speech - Actively participating in team meetings, workshops, and conferences to present research progress, exchange ideas, and contribute to the intellectual growth of the project - Engaging in science outreach activities to promote the project's research outcomes and facilitate public understanding of speech perception and language processing.

Qualifications: - A recently obtained PhD in a relevant field (e.g., linguistics, psychology, neuroscience, computational sciences) - Strong expertise in linguistics, speech perception, acoustic analysis, and statistical methods - Proficiency in programming languages commonly used in speech research. Knowledge of MATLAB would be particularly valuable for data processing and analysis within the project. - Strong written and verbal communication skills in English. Candidates with proficiency in French and/or German language skills would be particularly appreciated, as it would enable a deeper understanding of the linguistic characteristics of the respective corpora.

Application process To apply for this position, please submit a CV and a cover letter (in French or English) along with the names and contact information of 2 referees to Léo Varnet (leo.varnet@cnrs.fr). The application deadline is 31th July 2023. Interviews will be conducted in September. The ideal start date is October-November 2023, with some flexibility allowed. Feel free to get in touch informally to discuss this position

Back

Top

6-24

(2023-06-16) PhD funded position@ INRIA France

Inria is opening a fully funded PhD position on multimodal speech
anonymization. For details and to apply, see:
https://jobs.inria.fr/public/classic/en/offres/2023-06410

Applications will be reviewed on a continuous basis until July 2.

Back

Top

6-25

(2023-06-16) Post doc @ IMAG, Grenoble, France

Call for postdoc applications in Natural Language Processing for the automatic detection of gender stereotypes in the French media (Grenoble Alps University, France)

Starting date: flexible, November 30, 2023, at the latest

Duration: full-time position for 12 months

Salary: according to experience (up to 4142€/ month)

Application Deadline: Open until filled

Location: The position will be based in Grenoble, France. This is not a remote work.

Keywords: natural language processing, gender stereotypes bias, corpus analysis, language models, transfer learning, deep learning

*Context* The University of Grenoble Alps (UGA) has an open position for a highly motivated postdoc researcher to joint the multidisciplinary GenderedNews project. Natural Language Processing models trained on large amount of on-line content, have quickly opened new perspectives to process on-line large amount of on-line content for measuring gender bias in a daily basis (see our project https://gendered-news.imag.fr/ ). Regarding research on stereotypes, most recent works have studied Language Models (LM) from a stereotype perspective by providing specific corpora such as StereoSet (Nadeem et al., 2020) or CrowS-Pairs (Nangia et al. 2020). However, these studies are focusing on the quantifying of bias in the LM predictions rather than bias in the original data (Choenni et al., 2021). Furthermore, most of these studies ignore named entities (Deshpande et al., 2022) which account for an important part of the referents and speakers in news. In this project, we intend to build corpora, methods and NLP tools to qualify the differences between the language used to describe groups of people in French news.

*Main Tasks*

The successful postdoc will be responsible for day-to-day running of the research project, under the supervision of François Portet (Prof UGA at LIG) and Gilles Bastin (prof UGA at PACTE). Regular meetings will take place every two weeks.

- Defining the dimensions of stereotypes to be investigated and the possible metrics that can be processed from a machine learning perspective.

- Exploring, managing and curating news corpora in French for stereotypes investigation, with a view to making them widely available to the community to favor reproducible research and comparison.

- Studying and developing new computational models to process large number of texts to reveal stereotype bias in news. Make use of pretrained models for the task.

- Evaluate the methods on curated focused corpus and apply it to the unseen real longitudinal corpus and analyze the results with the team.

- Preparing articles for submission to peer-reviewed conferences and journals.

- Organizing progress meetings and liaising between members of the team.

The hired person will interact with PhD students, interns and researchers being part of the GenderedNews project. According to his/her background his/her own interests and in accordance with the project's objective, the hired person will have the possibility to orient the research in different directions.

*Scientific Environment*

The recruited person will be hosted within the GETALP teams of the LIG laboratory (https://lig-getalp.imag.fr/), which offers a dynamic, international, and stimulating environment for conducting high-level multidisciplinary research. The person will have access to large datasets of French news, GPU servers, to support for missions as well as to the scientific activities of the labs. The team is housed in a modern building (IMAG) located in a 175-hectare landscaped campus that was ranked as the eighth most beautiful campus in Europe by the Times Higher Education magazine in 2018.

The person will also closely work with Gilles Bastin (PACTE, a Sociology lab in Grenoble) and Ange Richard (PhD at LIG and PACTE). The project also includes an informal collaboration with 'Prenons la une' (https://prenonslaune.fr/) a journalists’ association which promotes a fair representation of women in the media.

*Requirements*

The candidate must have a PhD degree in Natural Language Processing or computer science or in the process of acquiring it. The successful candidate should have

- Good knowledge of Natural Language Processing - Experience in corpus collection/formatting and manipulation. - Good programming skills in Python - Publication record in a close field of research - Willing to work in multidisciplinary and international teams - Good communication skills - Good mastering of French is required

*Instructions for applying*

Applications will be considered on the fly and must be addressed to François Portet (Francois.Portet@imag.fr). It is therefore advisable to apply as soon as possible. The application file should contain

- Curriculum vitae - References for potential letter(s) of recommendation - One-page summary of research background and interests for the position - Publications demonstrating expertise in the aforementioned areas - Pre-defense reports and defense minutes; or summary of the thesis with the date of defense for those currently in doctoral studies

*References*

Deshpande et al. (2022). StereoKG: Data-Driven Knowledge Graph Construction for Cultural Knowledge and Stereotypes. arXiv preprint arXiv:2205.14036.

Choenni et al. (2021). Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you? arXiv preprint arXiv:2109.10052.

Nadeem et al. (2020) StereoSet: Measuring stereotypical bias in pretrained language models. ArXiv.

Nangia et al. (2020) CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. In EMNLP2020.

Back

Top

6-26

(2023-06-16) 6 Post-doc positions in 'Education and AI in the 21st century. Technology-enabled innovations in subject-specific teaching settings (PostdocTEIFUN)' , Tübingen and Stuttgart, Germany

the new Postdoc-Kolleg 'Education and AI in the 21st century.

Technology-enabled innovations in subject-specific teaching

settings (PostdocTEIFUN)' of the Tübingen School of Education (TüSE)

and Professional School of Education Stuttgart-Ludwigsburg (PSE) will

start in 2024.

It is offering six Post-doc positions funded for six years (100% TVL

E14) to conduct interdisciplinary research in the field of education

and AI.

The official announcement can be found here:

https://uni-tuebingen.de/universitaet/stellenangebote/newsfullview-stellenangebote/article/6-akademische-ratsstellen-m-w-d-e-14-tv-l-100/

Among the various potential research fields, the following could be of

interest here:

'Development of didactically informed Intelligent Language Tutoring

approaches targeting spoken language learning for the English

classroom. Potential directions include but are not limited to the

realization of perceptual training, input enhancement, automatic

corrective feedback, or prosodic complexity analysis.'

Interested? Please contact us at:

Sabine Zerbian, sabine.zerbian@ifla.uni-stuttgart.de

Detmar Meurers, detmar.meurers@uni-tuebingen.de

Note that there is a tight deadline for applications: 30.06.2023

Back

Top

6-27

(2023-06-17) PhD @ NTNU, Trondheim, Norway

The announcement is here:

https://www.jobbnorge.no/en/available-jobs/job/247012/phd-candidate-in-statistical-machine-learning-for-speech-technology

Deadline is 2023-07-31.

Back

Top

6-28

(2023-06-20) PhD position @ University of Applied Sciences, Hochschule Hannover, Germany

In a joint research project *VidQA*, our collaboration partner from University of Applied Sciences (Prof. Christian Wartena, HsH Hochschule Hannover) offers a full position (three years, as soon as possible) for a PhD student:

*VidQA* is a joint research project of the Institute for Applied Data Science Hannover (DATA|H) of the HsH, the Research Center L3S of the Leibniz University and the TIB – Leibniz Information Centre for Science and Technology. The goal of the project is the development and evaluation of new methods for semi-automatic generation of questions and answers for learning videos. Here, we pursue several research questions, such as, among others, the aspect of multimodality of video-based learning media, the generation of distractors ('wrong answers') for multiple-choice questions, and the automatic evaluation of answers for open-ended question formats.

*What you can expect*:

- Develop and implement procedures for generating (multiple choice) questions, answers, and distractors.

- Development and implementation of procedures for scoring free text answers

- Collaboration in the development and evaluation of a system for examing comprehension of learning videos

- Publication of research and development results at conferences and in professional journals

- Participation in the organization the project

*Requirements*

Research Field: Computer science

Education Level: Master Degree or equivalent

Skills/Qualifications

- Master's degree (or equivalent) in computer science or computational linguistics

- In-depth knowledge in the field of artificial intelligence and machine learning

- Proven knowledge in Natural Language Processing (NLP)

- Gender and diversity skills

Languages: ENGLISH

Level: Excellent

Languages: GERMAN

Level: Good

Website for additional job details: https://karriere.hs-hannover.de/bewerbung/beschreibung-900000104-10057.html

Where to apply: https://karriere.hs-hannover.de/bewerbung/beschreibung-900000104-10057.html

*First Contact*

************************************

Prof. Dr. Christian Wartena

University of Applied Sciences (HsH Hochschule Hannover)

Institute for Applied Data Science Hannover (DATA|H)

E-Mail: christian.wartena@hs-hannover.de

Postal address

City: Hannover

Website: https://www.hs-hannover.de/forschung/forschungsaktivitaeten/forschungscluster/smart-data-analytics

Street: Expo Plaza 12

Postal Code: 30539

Back

Top

6-29

(2023-06-26) These CIFRE pleinement financée, IMAG et Eloquant, Grenoble, France

Appel pour une thèse CIFRE pleinement financée

Sujet : PRESAC – Prédiction de la Satisfaction Client à partir de données massives d’appels vocaux

Partenaires : Laboratoire d’Informatique de Grenoble, Eloquant

Durée du contrat : CDD de 36 mois

Langue officielle : français

Date de début : septembre 2023

Date limite de candidature : jusqu’à ce que le poste soit pourvu

Salaire : De 2000 € brut / mois en début de thèse, à 2300 € brut / mois en fin de thèse

Avantages : mutuelle, tickets restaurant, participation aux frais de transports en commun, ...

Missions : recherche et développement (enseignement possible mais non obligatoire)

Mots clés : Satisfaction client, données massives d’appels vocaux à distance, grands modèles de langage

Environnement

Grenoble compte parmi les écosystèmes de recherche les plus dynamiques de France. Reconnu pour son excellence scientifique et technologique ainsi que son potentiel d’innovation, le site universitaire grenoblois a obtenu la labellisation « Initiative d’excellence », labellisation réservée à une dizaine de sites universitaires en France [1].

La personne recrutée sera accueillie en alternance (selon les besoins du projet) au sein de l'équipe GETALP [2] du Laboratoire d'Informatique de Grenoble (LIG), et au sein du pôle recherche de l’entreprise Eloquant. L'équipe GETALP est hébergée dans un bâtiment moderne (IMAG), situé sur un campus paysager de 175 hectares qui a été classé huitième plus beau campus d'Europe par le journal Times Higher Education en 2018. L’entreprise Eloquant est également située à proximité immédiate de ce campus.

Contexte industriel

Eloquant [3] est une entreprise d’informatique d’environ 110 personnes, spécialisée dans la relation client depuis 2001. Eloquant est aujourd’hui le seul acteur en France à proposer une solution globale en mode Software as a Service, destinée à faciliter le dialogue et l’écoute de ses clients. Elle vise à fournir à ses clients une fluidification et une accélération de leurs processus, en leur fournissant des informations d’intérêt tout en réduisant la sollicitation des clients finaux par l’envoi d'enquêtes de satisfaction post-appel.

Contexte académique

L’équipe GETALP est une équipe pluridisciplinaire (informatique, langue, phonétique, traduction et traitement de signaux, etc.) dont l’objectif est d’aborder tous les aspects théoriques, méthodologiques et pratiques de la communication et du traitement de l’information multilingue (écrite ou orale). La méthodologie de travail du GETALP s’appuie sur des allers-retours continus entre collectes de données, investigations fondamentales, développement de systèmes opérationnels, applications et évaluations expérimentales. L’équipe est aussi réputée pour avoir contribué au développement des premiers grands modèles de langage pour le français, à l’écrit (HANG et al., 2020), ou à l’oral (EVAIN et al., 2021a,b).

Objectifs de la thèse

Une première étape de cette thèse consiste à réaliser un état de l’art des techniques existantes pour prédire la satisfaction client à partir de conversations téléphoniques. Cette étude sera suivie par un travail d’analyse statistique des données propriétaires d’Eloquant, dans l’objectif de construire un corpus d’apprentissage et d’évaluation de méthodes d’inférence. Une fois le corpus de données constitué, le travail consistera à modéliser la satisfaction client à partir des indices linguistiques et acoustiques présents dans les enregistrements. Les outils d’extraction d’informations langagières reposeront autant sur des approches statistiques (e.g., descripteurs prosodiques et probabilistes appliquées sur des items), que sur des grands modèles de langage, tels que BERT (DEVLIN et al., 2019) pour le texte et Whisper (RADFORD et al., 2022) ou Wav2Vec 2.0 (BAEVSKI et al., 2020) pour la parole, notamment pour la langue française (EVAIN et al., 2021a,b). Nous envisageons également l’ajustement (fine-tuning) de ces modèles sur les données propriétaires pour optimiser les performances. Les descripteurs utilisés seront exploités pour évaluer différentes approches d’inférence de la satisfaction client sur les données préalablement identifiées, avec des approches neuronales telles que des modèles récurrents à base d'attention (BAHDANAU et al., 2015). Une part importante du travail consistera également à évaluer la robustesse du système sur la tâche, notamment sur des instances non étiquetées via une tâche d'évaluation humaine.

Compétences et savoir-faire attendu

Titulaire d’un Diplôme de Master 2 Recherche (ou d’un diplôme équivalent conférant le grade de Master, avec une expérience de recherche) avec des fortes connaissances en Traitement Automatique des Langues (TAL) et en apprentissage profond.

Puisqu’il s’agira d’étudier des phénomènes sociolinguistiques du français, la personne candidatant doit avoir le français comme langue maternelle ou un niveau C2.

Profil recherché

Bonnes connaissances en apprentissage artificiel, notamment apprentissage profond ;
Bonnes connaissances en traitement du signal audio ;
Méthodologies expérimentales pour l’évaluation ;
Bonne connaissance du langage Python ;
Motivation pour le travail en équipe et dans un environnement interdisciplinaire ;
Bonnes capacités de communication ;
Très bonnes capacités rédactionnelles

Instructions pour candidater

Les dossiers de candidature doivent être adressés à l’ensemble des personnes suivantes : Fabien Ringeval (fabien.ringeval@imag.fr), Marco Dinarelli (marco.dinarelli@univ-grenoble-alpes.fr), Ruslan Kalitvianski (ruslan.kalitvianski@eloquant.com), Mathieu Ruhlmann (mathieu.ruhlmann@eloquant.com) ainsi qu’à l’adresse drh@eloquant.com et doivent contenir :
Les relevés de notes détaillés du Master (ou équivalent), 1re et 2nde année ;
Un curriculum vitae à jour ;
Un résumé d’une page sur l'expérience et les intérêts liés au poste ;
Une ou plusieurs lettres de recommandation (non obligatoire)

Bibliographie

BAEVSKI, Alexei, MOHAMED, Abdelrahman, AULI, Michael. Wav2VEc 2.0: A Framework for Self-Supervised Learning of Speech Representations. In Neural Information Processing Systems (NeurIPS), 2020.

BAHDANAU, Dzmitry, CHO, Kyunghyun, BENGIO, Yoshua. Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representation (ICLR), 2015.

DEVLIN, Jacob, CHANG, Ming-Wei, LEE, Kenton, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. Association for Computatinoal Linguistics (ACL), 2019.

EVAIN, Solène, NGUYEN, Ha, LE, Hang, et al. Task Agnostic and Task Specific Self-Supervised Learning from Speech with LeBenchmark. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS), 2021.

EVAIN, Solène, NGUYEN, Ha, LE, Hang, et al. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech. In Proceedings of Interspeech, 2021.

HANG, Le, VIAL, Loïc, FREJ, Jibril, et al. FlauBERT: Unsupervised Language Model Pre-training for French. In Language Resources and Evaluation Conference (LREC), 2020.

RADFORD, Alec, KIM, Jong Wook, XU, Tao, et al. Robust Speech Recognition via Large-Scale Weak Supervision. ArXiv preprint abs/2212.04356, 2022

[1] https://www.univ-grenoble-alpes.fr/recherche/

[2] https://lig-getalp.imag.fr/

[3] https://www.eloquant.com/

Back

Top

6-30

(2023-06-27) Research Associate in Integrated Multitask Neural Speech Labelling, University of Sheffield, UK

Deadline July 13th, 2023

https://google.com/document/d/1aGT3w3ComN2HrjWONhr37-NxHxykHvE2/edit?usp=sharing&ouid=104401497736710475694&rtpof=true&sd=true.

Back

Top

6-31

(2023-07-01) Offre de thèse en 'Apprentissage profond pour l'identification du locuteur et séparation de la parole', CNRS

Pour plus d'informations voir :
https://emploi.cnrs.fr/Offres/Doctorant/UMR7020-RICMAR-001/Default.aspx

Pour candidater remplissez le formulaire suivant : https://forms.gle/DdaV8BhN3E4DuXnA8

La date de clôture à candidatures est le 10 juillet 2023.

Back

Top

6-32

(2023-07-19) Ingénieur chef de projet ressources et technologies linguistiques, INRIA, Nancy,France

Ingénieur chef de projet ressources et technologies linguistiques

Centre Inria : CRI Nancy - Grand Est

Ville : Nancy, France
Date de prise de fonction souhaitée : 2023-10-01
Type de contrat : CDD 4 ans
Niveau de diplôme exigé : BAC+5 ou équivalent
Niveau d’expérience souhaité : de 3 à 5 ans

Pour postuler : https://jobs.inria.fr/public/classic/fr/offres/2023-06574
Pour plus d’informations, contacter : Slim.Ouni@loria.fr

Description complète du poste :
https://jobs.inria.fr/public/classic/fr/offres/2023-06574

Poste : Ingénieur chef de projet ressources et technologies linguistiques

CONTEXTE

Ce poste se place dans le cadre du Défi Inria COLaF (Corpus et Outils pour les Langues de France), qui est une collaboration entre les équipes ALMAnaCH et MULTISPEECH. L’objectif du Défi est de développer et mettre à disposition des technologies numériques linguistiques pour la francophonie et les langues de France, en contribuant à la création de corpus de données inclusifs, de modèles, et de briques logicielles. L’équipe ALMAnaCH focalise sur le texte et l’équipe MULTISPEECH sur la parole multimodale. Les deux principaux objectifs de ce projet sont :

(1) La collecte de corpus de données francophones, massifs et inclusifs : Il s’agit de constituer de très grands corpus textuels et de parole, avec des métadonnées riches pour améliorer la robustesse des modèles face à la variation linguistique, avec une place particulière pour la variation géographico-dialectale dans le contexte de la francophonie, dont une partie pourra être multimodale (audio, image, vidéo), voire spécifique à la langue des signes française (LSF). Les données liées à la parole multimodale concerneront entre autres les dialectes, les accents, la parole des personnes âgées, des enfants et des adolescents, la LSF et les autres langues largement parlées en France.

La collecte de corpus sera basée prioritairement sur les données existantes. Ces données (parole multimodale) peuvent provenir des archives de l’INA et des radio-télévisions régionales ou étrangères, mais rarement sous une forme directement exploitable, ou bien auprès des spécialistes, mais sous forme de petits corpus dispersés. La difficulté consiste d’une part à identifier et pré-traiter les données pertinentes afin d’obtenir des corpus homogènes, et d’autre part à clarifier (et si possible assouplir) les contraintes légales et les contreparties financières régissant leur usage afin d’assurer l’impact le plus large possible. Lorsque les contraintes légales ne permettent pas d’utiliser les données existantes, un effort supplémentaire de collecte de données sera nécessaire. Ce sera probablement le cas des enfants (applications à l’éducation) et les personnes âgées (applications à la santé). Selon la situation, cet effort sera sous-traité à des linguistes de terrain ou mènera à une campagne à grande échelle. Cela sera conduit en collaboration avec Le VoiceLab et la DGLFLF.

(2) Le développement et la mise à disposition de technologies linguistiques inclusives : Les technologies linguistiques considérées dans ce projet par l’équipe MULTISPEECH sont la reconnaissance et la synthèse de la parole, et la génération de la langue des signes. De nombreuses technologies sont déjà commercialisées. Il s’agit donc de ne pas réinventer ces outils, mais leur apporter les modifications nécessaires, afin qu’ils puissent exploiter les corpus inclusifs créés. Les technologies qui seront utilisées dans le cadre de ce projet portent sur, y compris, mais sans s’y limiter, les tâches suivantes :

• Identification et prétraitement (semi-)automatique des données pertinentes au sein de masses de données existantes. Cela inclut la détection et le remplacement d’entités nommées à des fins d’anonymisation.

• Architectures neuronales et approches adaptées aux scénarios à faibles ressources (augmentation de données, apprentissage par transfert, apprentissage faiblement/non supervisé, apprentissage actif, et combinaison entre ces diverses formes d’apprentissage)

MISSIONS

L’ingénieur chef de projet aura deux missions principales :

• La gestion du projet et la coordination pratique de la contribution de l’équipe MULTISPEECH au Défi Inria. L’ingénieur chef de projet travaillera en étroite collaboration avec un ingénieur « junior », un chercheur et deux doctorants, tous travaillant dans le cadre de ce projet. Il assurera un encadrement rapproché de l’ingénieur « junior » et une interaction très fréquente avec le chercheur et les doctorants. Il sera en contact également avec les membres de l’équipe MULTISPEECH. Il y aura certainement une concertation et une collaboration solide avec son homologue au sein de l’équipe ALMAnaCH.

• La collecte de données et création de corpus de parole multimodale (cela comprend : certains dialectes, les accents, les personnes âgées, les enfants et adolescents, la LSF et certaines langues largement parlées en France autre que le français). Une grande partie de la collecte des données se fera auprès d’associations de locuteurs, des producteurs de contenus et tout partenaire pertinent pour la récupération de données. L’ingénieur chef de projet sera amené à discuter, notamment les aspects juridiques, avec nos interlocuteurs.

ACTIVITÉS PRINCIPALES

• Définition des différents types de corpus à collecter (identifier les corpus potentiellement exploitables, établir une priorité et un planning de collecte)

• Collecte de corpus de parole auprès de producteurs de contenus ou de tout autre partenaire. (s'assurer que les données respectent les normes et les standards de qualité)

• Négociation des contrats d'utilisation des données, en veillant à respecter les aspects juridiques (négocier les conditions d'utilisation des données avec les producteurs de contenus ou les partenaires, en veillant à ce que les droits de propriété intellectuelle soient respectés et que les aspects juridiques soient pris en compte).

• Création et mise à disposition des technologies linguistiques pour le traitement de ces corpus : une fois collectées, les données doivent être analysées et traitées de manière à en extraire des informations utiles. L’ingénieur chef de projet doit proposer des technologies et des outils parmi l’existant, nécessaires à cette analyse, et s'assurer qu'ils sont accessibles aux utilisateurs.

• Encadrement rapproché de l’ingénieur junior : accompagnement et conseil au niveau des choix techniques et stratégiques de développement.

• Concertation et animation des échanges entre les membres du projet : (1) avec le chercheur et les deux doctorants (réflexions et échanges sur les données, et leurs adéquations au Défi.) ; (2) coordination avec les membres du projet au sein de l’équipe ALMAnaCH.

• Veille technologique, en particulier dans le domaine du ce défi.

• Rédaction et présentation de documentation technique

Note : Il s’agît ici d’une liste indicative d’activités qui pourra être adaptée dans le respect de la mission telle que libellée plus haut.

COMPÉTENCES

PROFIL RECHERCHÉ :

• Diplômé en informatique, linguistique ou toute autre formation relevant du domaine du traitement automatique de la parole ou des langues

• Expérience confirmée en gestion de projet et en communication

• Connaissance approfondie des technologies linguistiques

• Capacité à travailler en équipe et à respecter les délais

• Bonne connaissance de l'anglais

SAVOIRS

• Capacité à rédiger, à publier et à présenter en français et en anglais

• Maitrise des techniques de conduite des projets et de négociation

• Bases juridiques (données personnelles, propriété intellectuelle, droit des affaires)

SAVOIR-FAIRE

• Capacités d'analyse, rédactionnelles et de synthèse

• Savoir accompagner et conseiller

• Savoir développer un réseau relationnel

• Savoir mener de front différents projets en même temps

• Capacités de négociation

SAVOIR-ÊTRE

• Sens des responsabilités et autonomie

• Sens du contact et goût pour le travail en équipe

• Rigueur, sens des priorités et du reporting

• Qualités relationnelles (écoute- diplomatie- pouvoir de conviction)

• Appétence pour la négociation (Le VoiceLab, DGLFLF, etc.)

• Capacité d’anticipation

• Esprit d’initiative et curiosité d’esprit

INFORMATIONS COMPLÉMENTAIRES
Poste à temps complet, à pourvoir dès que possible.
Rémunération selon l’expérience.
Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.

AVANTAGES

• Restauration subventionnée

• Transports publics remboursés partiellement

• Congés: 7 semaines de congés annuels + 10 jours de RTT (base temps plein) + possibilité d'autorisations d'absence exceptionnelle (ex : enfants malades, déménagement)

• Équipements professionnels à disposition (visioconférence, prêts de matériels informatiques, etc.)

• Prestations sociales, culturelles et sportives (Association de gestion des œuvres sociales d'Inria)

• Accès à la formation professionnelle

• Sécurité sociale

RÉMUNÉRATION

2765€ brut/mois (selon l’expérience)

À PROPOS D'INRIA

Inria est l’institut national de recherche en sciences et technologies du numérique. La recherche de rang mondial, l’innovation technologique et le risque entrepreneurial constituent son ADN. Au sein de 200 équipes-projets, pour la plupart communes avec les grandes universités de recherche, plus de 3 500 chercheurs et ingénieurs y explorent des voies nouvelles, souvent dans l’interdisciplinarité et en collaboration avec des partenaires industriels pour répondre à des défis ambitieux. Inria soutient la diversité des voies de l’innovation : de l’édition open source de logiciels à la création de startups technologiques (Deeptech).

À PROPOS DU CENTRE INRIA NANCY – GRAND EST

Le centre Inria Nancy – Grand-Est est un des huit centres d’Inria regroupant 400 personnes, réparties dans 22 équipes de recherche, et 8 services d’appui à la recherche. Toutes ces équipes de recherche sont communes avec des partenaires académiques, et trois d’entre elles sont basées à Strasbourg.

Ce centre de recherche est un acteur majeur et reconnu dans le domaine des sciences numériques. Il est au cœur d'un riche écosystème de R&D et d’innovation : PME fortement innovantes, grands groupes, start-up, incubateurs & accélérateurs, pôles de compétitivité, acteurs de la recherche et de l’enseignement supérieur, instituts de recherche technologique.

ENVIRONNEMENT DE TRAVAIL

L’ingénieur chef de projet travaillera au sein de l’équipe projet MULTISPEECH au Centre de recherche Inria Nancy. Les recherches de MULTISPEECH sont centrées sur la parole multimodale, notamment sur son analyse et sa génération dans le contexte de l'interaction homme-machine. Un point central de ces travaux est la conception de modèles et de techniques d'apprentissage automatique pour extraire des informations sur le contenu linguistique, l'identité et les états du locuteur, et l'environnement de la parole, et pour synthétiser la parole multimodale en utilisant des quantités limitées de données étiquetées.

Pour postuler :

https://jobs.inria.fr/public/classic/fr/offres/2023-06574

Back

Top

6-33

(2023-09-29) Offre de thèse à l'IRCAM, Paris, France

Offre de thèse sur la conversion neuronale de la parole financée dans le cadre du projet ANR EVA “Explicit Voice Attributes'

https://www.edite-de-paris.fr/offre-de-these-ircam/

Back

Top

6-34

(2023-08-10) Postdocs in Natural Language Processing for the automatic detection of gender, Alps University, Grenoble, France)

Call for postdoc applications in Natural Language Processing for the automatic detection of gender stereotypes in the French media (Grenoble Alps University, France)

Starting date: flexible, November 30, 2023, at the latest

Duration: full-time position for 12 months

Salary: according to experience (up to 4142€/ month)

Application Deadline: Open until filled

Location: The position will be based in Grenoble, France. This is not a remote work.

Keywords: natural language processing, gender stereotypes bias, corpus analysis, language models, transfer learning, deep learning

*Main Tasks*

- Defining the dimensions of stereotypes to be investigated and the possible metrics that can be processed from a machine learning perspective.

- Exploring, managing and curating news corpora in French for stereotypes investigation, with a view to making them widely available to the community to favor reproducible research and comparison.

- Studying and developing new computational models to process large number of texts to reveal stereotype bias in news. Make use of pretrained models for the task.

- Evaluate the methods on curated focused corpus and apply it to the unseen real longitudinal corpus and analyze the results with the team.

- Preparing articles for submission to peer-reviewed conferences and journals.

- Organizing progress meetings and liaising between members of the team.

*Scientific Environment*

*Requirements*

The candidate must have a PhD degree in Natural Language Processing or computer science or in the process of acquiring it. The successful candidate should have

*Instructions for applying*

*References*

Deshpande et al. (2022). StereoKG: Data-Driven Knowledge Graph Construction for Cultural Knowledge and Stereotypes. arXiv preprint arXiv:2205.14036.

Choenni et al. (2021). Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you? arXiv preprint arXiv:2109.10052.

Nadeem et al. (2020) StereoSet: Measuring stereotypical bias in pretrained language models. ArXiv.

Nangia et al. (2020) CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. In EMNLP2020.

Back

Top

6-35

(2023-09-04) Project Manager@ELDA, Paris, France

The Evaluations and Language resources Distribution Agency (ELDA), a company specialized in Human Language Technologies within an international context is currently seeking to fill an immediate vacancy for the permanent position: Project Manager - Intellectual Property, Personal Data Protection for AI and Language Technologies.

Job description

Under the CEO’s supervision, the Project Manager will handle legal issues related to the compilation, use and distribution of language datasets in a European and international environment. This yields excellent opportunities for creative, and motivated candidates wishing to participate actively in the Language Engineering field.

Their main tasks will consist of:

Analysing the legal status of language datasets and language models;
Implementing data protection requirements in the processing and distribution of language data;
Analysing data collection legal issues;
Drafting and negotiating distribution contracts for language datasets to be added to an online catalogue;
Implementing evaluation procedures for IPR clearance of digital data;
Assessing ethical practices within our activities.

The position is based in Paris.

Salary: Commensurate with qualifications and experience (between 40-60K€).

Other benefits: complementary health insurance and meal vouchers.

Required profile:

PhD (or Master’s degree + at least 3 years’ experience) in IT Law, with good understanding of intellectual property and data protection;
Fluent English, with advanced writing and analytical skills;
Experience implementing data protection regulations;
Knowledge of the European digital regulatory environment (eg. Digital Services Act, Digital Market Act, Data Act, Digital Governance Act, Open Data Directive, AI Act);
Familiar with public licensing schemes (CC, GPL, etc.);
Involvement in European or international projects related to IPR and other legal aspects;
Dynamic, communicative, flexible and capable of working independently as well as in a team;
EU citizen, or residence permit enabling to work in France.

About

ELDA is an SME established in 1995 to promote the development and exploitation of Language Resources (LRs). Language Resources include all data necessary for language engineering, such as monolingual and multilingual lexica, text corpora, speech databases and terminology. ELDA’s role is to produce LRs, to collect and to validate them and, foremost, make them available to users in compliance with applicable regulations and ethical requirements.

For further information about ELDA, visit: http://www.elda.org

Applicants should email a cover letter addressing the points listed above together with a curriculum vitae to:

ELDA
9 rue des Cordelières
75013 Paris
FRANCE
Email: job@elda.org

Back

Top

6-36

(2023-09-08) 2 Research and Teaching Associates – PhD Positions –Signal Processing and Speech Communication Laboratory (TU Graz), Austria

The Signal Processing and Speech Communication Laboratory (https://www.spsc.tugraz.at) of
Graz University of Technology (TU Graz) is looking for

    2 Research and Teaching Associates
            – PhD Positions –
    in Signal Processing and Speech Communication

with appointments planned for November 2023.

Both associates are expected to perform excellent research towards a PhD degree (often in cooperation with international partners) under the guidance of professors Gernot Kubin and Barbara Schuppler. Furthermore, the associates will co-advise Bachelor’s and Master’s student projects and develop and teach laboratory and problem classes on various aspects of signal processing. Fluency in English is a must, knowledge of German is an asset. A strong background in signal processing and/or speech communication as well as an excellent Master’s degree in Electrical Engineering, Information Engineering, or similar are required.

Entry-level gross yearly salaries are about EUR 45.900,- for 40 hrs per week and initial contract durations may span up to 4 years.

The Signal Processing and Speech Communication Laboratory was the main organizer of INTERSPEECH 2019 in Graz (https://www.interspeech2019.org) and takes the lead in building a Graduate School of Excellence in European Speech and Language Technologies together with seven partners of the Unite! University Alliance (https://www.unite-university.eu). TU Graz is ranked #2 of all universities in the German speaking countries (https://www.umultirank.org).

Graz (https://www.graztourismus.at/en) is the second largest city of Austria located in the south-eastern province of Styria at the cross-roads of major continental European cultures. It enjoys a vibrant student life with eight universities and excellent leisure and sports opportunities in the larger Alps-Adriatic region. UNESCO has included the historic centre of Graz in its World Heritage List and Graz has been a Cultural Capital of Europe.

For further information, please contact the two advisors
Gernot Kubin at gernot.kubin@tugraz.at and Barbara Schuppler at b.schuppler@tugraz.at.

Applications are due by September 27, 2023, and must be submitted electronically at
https://jobs.tugraz.at/en/jobs/b645ee47-c3aa-712a-29b3-64d61edea5e6.

Back

Top

6-37

(2023-09-10) PhD position, MIAI, Université de Grenoble, France

Job Offer: PhD Self-supervised models for transcribing the spontaneous speech of for 3- to 6-year-old children in French

Starting date: between October 1st and December 1st, 2023 (flexible) Application deadline: From now until the position is filled Interviews: from September or latter if the position is still open

Salary: ~2000€ gross/month (social security included)

Mission: research oriented (teaching possible but not mandatory)

Place of work: Laboratoire d'Informatique de Grenoble, CNRS, Grenoble,France

Keywords: deep learning, natural language processing, speech recognition for children's voices, documentation of language development

Description As part of the Artificial Intelligence & Language Chair at the Multidisciplinary Institute in Artificial Intelligence (; https://miai.univ-grenoble-alpes.fr/research/chairs/perception-interaction/artificialintelligence-language-850480.kjsp?RH=6499588038450843), we offer a PhD thesis topic devoted to the enriched automatic transcription of the spontaneous speech of 3- to 6-year-old children using an architecture based on self-supervised models [1]. These methods have emerged as one of the most successful approaches in artificial intelligence (AI), as they allow to exploit a colossal amount of existing unlabeled data and so achieve significantly higher performance for many domains. As part of the DyLNet project (Language dynamics, linguistic learning, and sociability at preschool: benefits of wireless proximity sensors in collecting big data ; https://dylnet.univ-grenoblealpes.fr/), coordinated by A. Nardy, a children's speech collection was carried out in a socially mixed preschool over a period of 2 and a half years [2]. Each year, around 100 children wore a box fitted with microphones that continuously recorded their speech. These boxes were worn for one week a month. We thus collected ~ 30,000 hours of recordings, 815 of which were transcribed and annotated by linguists. This unprecedentedly large corpus of children's spoken French will enable to meet the technical challenges associated with automatic speech processing. While continuous and unsupervised collection methods are now available, another challenge is the automatic transcription of children's voices, made difficult by their acoustic characteristics. The aim of the thesis is to design a transcription system for researchers as well as child development professionals (teachers, speech therapists, etc.). The aim of the thesis is therefore to: - review the state-of-the-art models and the performances achieved by automatic transcription tools for children's voices - implement processes to exploit the mass of audio data collected, and the associated metadata (sociodemographic information on participants, contexts of enunciation, interlocutors, etc.). - design and develop a system for transcribing children's speech using self-supervised tools, as proposed by Speechbrain [3]. The best system obtained will be made available to the language acquisition research community and child development professionals. - set up a system evaluation protocol based on transcribed data - propose tools for automating some of the linguistic analyses to enrich the obtained transcriptions and document oral language development in 3- to 6-year-old children. Skills : Master degree in Computer Science, Artificial Intelligence or Data Science Mastering Python programming and deep learning frameworks. Experience in automatic natural language processing will be really appreciated Excellent communication skills in French or, failing that, in English Scientific environment : The PhD. position will be co-supervised by Benjamin Lecouteux, Solange Rossato (LIG, Univ. Grenoble Alpes) et Aurélie Nardy (Lidilem, Univ. Grenoble Alpes). The recruited person will be part of the GETALP team of the LIG laboratory (https://lig-getalp.imag.fr/) which has extensive expertise and experience in the field of Natural Language Processing. The GETALP team offers a stimulating, multinational working environment, and provides the resources needed to complete the thesis in terms of equipment and scientific exchanges. Regular meetings with the three supervisors will take place throughout the thesis.

Instructions for applying Application forms must contain: CV + letter/message of motivation + master + notes + be ready to provide letter(s) of recommendation. They should be addressed to Benjamin Lecouteux (benjamin.lecouteux@univ-grenoble-alpes.fr), Solange Rossato (solange.rossato@univ-grenoble-alpes.fr) and Aurélie Nardy (aurelie.nardy@univ-grenoblealpes.fr).

[1] Evain, S., Nguyen, H., Le, H., Boito, M. Z., Mdhaffar, S., Alisamir, S., ... & Besacier, L. (2021). Lebenchmark: A reproducible framework for assessing self-supervised representation learning from speech. https://doi.org/10.48550/arXiv.2104.11462

[2] Nardy, A., Bouchet, H., Rousset, I., Liégeois, L., Buson, L., Dugua, C., Chevrot, J.-P. (2021). Variation sociolinguistique et réseau social : constitution et traitement d’un corpus de données orales massives. Corpus, 22 [en ligne]. https://doi.org/10.4000/corpus.5561

[3] Ravanelli, M., Parcollet, T., Plantinga, P., Rouhe, A., Cornell, S., Lugosch, L., ... & Bengio, Y. (2021). SpeechBrain: A general-purpose speech toolkit. https://arxiv.org/abs/2106.0462

Back

Top

6-38

(2023-09-12) Assistant Professor (tenure track) position, University of Rochester, NY, USA

Anticipated Start Date: (Mid August, 2024)

DETAILED JOB DESCRIPTION:

The Department of Psychology at the Rochester Institute of Technology (RIT; www.rit.edu/psychology) invites candidates to apply for a tenure-track Assistant Professor position starting in August 2024. We are seeking an energetic and enthusiastic psychologist who will serve as an instructor, researcher, and mentor to students in our undergraduate (Psychology, Neuroscience) and graduate programs (Masters in Experimental Psychology, Ph.D. in Cognitive Science). We are particularly looking to build a cohort of faculty who can contribute to the interdisciplinary Ph.D. program in Cognitive Science and contribute to research, mentoring, and teaching using computational and laboratory methods. Candidates should have expertise in an area of Cognitive Science such as cognitive or behavioral neuroscience, AI, computational/psycho-linguistics, cognitive psychology, comparative psychology, or related areas. We are particularly interested in individuals whose area of research expertise expands the current expertise of the faculty. Candidates who can teach courses in natural language processing or computational modeling courses are especially encouraged to apply. The Department of Psychology at RIT serves a rapidly expanding student population at a technical university. The position requires a strong commitment to teaching and mentoring, active research and publication, and a strong potential to attract external funding. Teaching and research are priorities for faculty at RIT, and all faculty are expected to mentor students through advising, research and in-class experiences. The successful candidate will be able to teach courses in our undergraduate cognitive psychology track (Memory & Attention, Language & Thought, Decision Making, Judgement & Problem Solving), will be expected to teach research methods/statistics courses at the undergraduate and graduate level, and teach and mentor students in our graduate programs. In addition, candidates must be able to do research and work effectively within the department’s existing lab space. RIT provides many opportunities for collaborative research across the institute in many diverse disciplines such as AI, Digital Humanities, Human-Centered Computing, and Cybersecurity.

We are seeking individuals who have the ability and interest in contributing to a community committed to student-centeredness; professional development and scholarship; integrity and ethics; respect, diversity and pluralism; innovation and flexibility; and teamwork and collaboration. Select to view links to RIT’s core values, honor code, and statement of diversity.

THE COLLEGE/ DEPARTMENT:

The Department of Psychology at RIT offers B.S., M.S. degrees, Advanced Certificates, minors, immersions, electives, and a new interdisciplinary Ph.D. degree program in Cognitive Science. The B.S. degree provides a general foundation in psychology with specialized training in one of five tracks: biopsychology, clinical psychology, cognitive psychology, social psychology, and developmental psychology. The M.S. degree is in Experimental Psychology, with an Advanced Certificate offered in Engineering Psychology. We offer accelerated BS/MS programs with AI, Sustainability, and Experimental Psychology. The Ph.D. degree is in Cognitive Science and the program is broadly interdisciplinary with several partner units across the university. We also offer joint B.S. degrees in Human Centered Computing and Neuroscience.

The College of Liberal Arts is one of nine colleges within Rochester Institute of Technology. The College has over 150 faculty in 13 departments in the arts, humanities and social sciences. The College currently offers fourteen undergraduate degree programs and five Master degrees, serving over 800 students.

THE UNIVERSITY:

Founded in 1829, Rochester Institute of Technology is a diverse and collaborative community of engaged, socially conscious, and intellectually curious minds. Through creativity and innovation, and an intentional blending of technology, the arts and design, we provide exceptional individuals with a wide range of academic opportunities, including a leading research program and an internationally recognized education for deaf and hard-of-hearing students. Beyond our main campus in Rochester, New York, RIT has international campuses in China, Croatia, Dubai, and Kosovo. And with more than 19,000 students and more than 125,000 graduates from all 50 states and over 100 nations, RIT is driving progress in industries and communities around the world. Find out more at www.rit.edu .

REQUIRED MINIMUM QUALIFICATIONS:

Have PhD., or PhD. expected by July 1, 2024 in cognitive psychology or cognitive science related specialty (e.g. linguistics);
Have demonstrated ability to conduct independent research in psychology or closely related fields;
Have consistently and recently published;
Have demonstrated teaching ability and have taught college courses independently beyond TA;
Have demonstrated ability to supervise student research;
Demonstrate external research grant attainment potential;
Demonstrate expertise in research and teaching in cognitive science;
Show a career trajectory that emphasizes a balance between teaching and research;
Show a fit with the Department of Psychology’s general mission, teaching, research, and resources.
Ability to contribute in meaningful ways to the college’s continuing commitment to cultural diversity, pluralism, and individual differences.

HOW TO APPLY:

Apply online at http://careers.rit.edu/faculty; search openings, then Keyword Search 8262BR. Please submit your application, curriculum vitae, cover letter addressing the listed qualifications and upload the following attachments:

A brief teaching philosophy
A research statement that includes information about previous grant work, the potential for future grants, and information about one-on-one supervision of student research
The names, addresses and phone numbers for three references
Contribution to Diversity Statement

You can contact the chair of the search committee, Caroline DeLong, Ph.D. with questions on the position at: cmdgsh@rit.edu.

Review of applications will begin October 1, 2023 and will continue until an acceptable candidate is found.

Back

Top

6-39

(2023-09-15) Postdoc at IRISA, Rennes, France

Team Expression at IRISA is hiring a post-doc for 18 months in Speech synthesis. For more information, just follow this link:

Massive generation of TTS for Deepfake detection | le site web de l'IRISA

irisa.fr

Back

Top

6-40

(2023-10-05) Professor at Saarland University, Saarbrücken, Germany

The Department of Language Science and Technology of Saarland University
seeks to hire a Professor of Speech Science (W2 with tenure track to
W3). For details see
<https://www.uni-saarland.de/fileadmin/upload/verwaltung/stellen/Wissenschaftler/W2283_W2TTW3_Speech_Science.pdf>.

Back

Top

6-41

(2023-09-23) Assistant or Associate Professor of Computer Science, UTEP, El Paso, Texas

Assistant or Associate Professor of Computer Science, UTEP, El Paso, Texas

The University of Texas at El Paso (UTEP) invites applications for a tenured/tenure-track Associate Professor position and three tenure-track Assistant Professor Positions in Computer Science (CS) starting in fall 2024. We invite applicants from all areas of CS. For three of the positions (including the Associate Professor position), preference will be given to those with demonstrated expertise in Artificial Intelligence.

Candidates are expected to have a record of high-quality scholarship and should be able to demonstrate the potential for excellence in both research and teaching. The department values both interdisciplinary research and industry/government experience.

More information and application instructions are at https://www.utep.edu/cs/news/news-2023/assistantassociateprofessor.html

Back

Top

6-42

(2023-09-25) Professor (W2 with Tenure Track to W3) for Speech Science (m|f|x) ar Saarland University, Saarbrücken, Germany

Saarland University is a campus-based university with a strong international focus and a research-oriented profile. Numerous research institutes on campus and the systematic promotion of collaborative projects make Saarland University an ideal environment for innovation and technology transfer. To further strengthen this excellence in research and teaching, the Department of Language Science and Technology seeks to hire a

Professor (W2 with Tenure Track to W3) for Speech Science (m|f|x)
Reference n° W2283

Six-year tenure track position, starting April 2025, with the possibility of promotion to a permanent professorship (W3).
We are looking for a highly motivated researcher in the field of phonetics, speech science, and speech technology, with extensive knowledge of speech production, perception and acoustics. The successful candidate is expected to have expertise in experimental and computational approaches to research on spoken language. A focus on spoken dialog and conversational speech and/or multimodal aspects of communication is particularly welcome.
The Department of Language Science and Technology is internationally recognized for collaborative and interdisciplinary research, and the successful candidate is expected to contribute to relevant joint research initiatives. A demonstrated ability to attract external funding of research projects is therefore highly desired. Phonetics, speech science and speech technology are core elements of our study programs on the M.Sc. and B.Sc./B.A. level, and the successful candidate is expected to teach the associated courses within these programs.

What we can offer you:

Tenure track professors (W2) have faculty status at Saarland University, including the right to supervise Bachelor’s, Master’s and PhD students. The successful candidate will focus on carrying out world-class research, will lead their own research group, and will undertake teaching and supervision responsibilities. Tenure track professors (W2) with outstanding performance will receive tenure as a full professor (W3) provided a positive tenure evaluation is made. Decisions regarding tenure are made no later than six years after taking up the tenure track position.
The position offers excellent working conditions in a lively and international scientific community. Saarland University is one of the leading centers for language science and computational linguistics in Europe, and offers a dynamic and stimulating research environment. The Department of Language Science and Technology organizes about 100 research staff in nine research groups in the fields of computational linguistics, psycholinguistics, phonetics and speech science, speech processing, and corpus linguistics (https://www.uni-saarland.de/en/department/lst.html). The department serves as the focal point of the Collaborative Research Center 1102 'Information Density and Linguistic Encoding' (http://www.sfb1102.uni-saarland.de). It is part of the Saarland Informatics Campus (https://saarland-informatics-campus.de/en), which brings together 800 researchers and 2,000 students from 81 countries and collaborates closely with world-class research institutions on campus, such as the Max Planck Institute for Informatics, the Max Planck Institute for Software Systems, and the German Research Center for Artificial Intelligence (DFKI).

Qualifications:

The appointment will be made in accordance with the general provisions of German public sector employment law. Applicants will have a PhD or doctorate in an appropriate subject and will have demonstrated a proven track record of independent academic research (e.g. as a junior or assistant professor, or by having completed an advanced, post-doctoral research degree (Habilitation) or equivalent academic activity at a university or research institution). They will typically have completed a period of postdoctoral research and have teaching experience at the university level. They must have demonstrated outstanding research capabilities and have the potential to successfully lead their own research group.
The successful candidate will be expected to actively contribute to departmental research and teaching, including introductory lectures in phonetics and phonology, speech science, as well as more advanced lectures. The teaching language is English (in the MSc programs) and German (in the BSc/BA programs). We expect that the successful candidate has, or is willing to acquire within an appropriate period, sufficient proficiency to teach in both languages.

Your Application:

Applications should be submitted online at www.uni-saarland.de/berufungen. No additional paper copy is required. The application must contain:
    • a cover letter and curriculum vitae (including phone number and email address)
    • a full list of publications
    • a full list of third-party funding (own shares shown)
    • your proposed research plan (2-5 pages)
    • a teaching statement (1 page)
    • copies of your degree certificates
    • full-text copies of your 5 most important publications
    • a list of 3 academic references (including email addresses), at least one of whom must be a person who is outside the group of your current or former supervisors or colleagues.

Applications must be received no later than 12 October 2023. The search committee will decide in its first meeting on late applications. Please include the job reference number W2283 when you apply. Please contact crocker@lst.uni-saarland.de if you have any questions.

Saarland University regards internationalization as an institution-wide process spanning all aspects of university life and it therefore encourages applications that align with its internationalization strategy. Members of the university's professorial staff are therefore expected to engage in activities that promote and foster further internationalization. Special support will be provided for projects that continue with or expand on collaborative interactions within existing international cooperative networks, e.g. projects with partners in the European University Alliance Transform4Europe (www.transform4europe.eu) or the University of the Greater Region (www.uni-gr.eu).

Saarland University is an equal opportunity employer. In accordance with its affirmative action policy, Saarland University is actively seeking to increase the proportion of women in this field. Qualified women candidates are therefore strongly encouraged to apply. Preferential consideration will be given to applications from disabled candidates of equal eligibility.

When you submit a job application to Saarland University you will be transmitting personal data. Please refer to our privacy notice (https://www.uni-saarland.de/verwaltung/datenschutz/) for information on how we collect and process personal data in accordance with Art. 13 of the General Data Protection Regulation (GDPR). By submitting your application, you confirm that you have taken note of the information in the Saarland University privacy notice.

Back

Top

6-43

(2023-10-02) PhD position at IMT Atlantique, Brest, France

PhD Title: Summarization of activities of daily living using sound-based activity recognition

We are seeking candidates for a PhD position in co-tutelle between IMT Atlantique (Brest, France) and Instituto Superior Técnico (Lisbon, Portugal) on the topic 'Summarization of activities of daily living using sound-based activity recognition'.

Please find details in the attached PDF file.
Starting date: before et end of 2023
Closing date for applications: Oct 15, 2023

Back

Top

6-44

(2023-10-04) Transcripteurs de langue tchèque @ELDA, Paris, France

Dans le cadre de ses activités de production de ressources linguistiques, ELDA recherche des transcripteurs (f/h) de langue maternelle tchèque à temps plein ou partiel pour la transcription de 1500 heures d’enregistrements audio et/ou la révision des transcriptions. Le nombre total d'heures à transcrire ou à réviser sera adapté selon les disponibilités du candidat ou de la candidate.

La mission aura lieu dans les locaux d'ELDA (Paris 13e) ou à distance via un espace sécurisé. La mission peut démarrer dès à présent.

Profil recherché
• Locuteur (f/h) natif du tchèque avec un très bon niveau d'orthographe et de grammaire ;
• Bonne connaissance de la langue française et/ou anglaise ;
• Bonne maîtrise d'outils informatiques ;
• Capacité à intégrer et suivre scrupuleusement des règles de transcription.

Rémunération et durée
• À partir du SMIC horaire selon le profil ;
• Fin du projet prévue pour septembre 2024 ;
• Missions et contrat selon disponibilités.

Candidature :
Envoyer un CV à <gabriele@elda.org> et <dylan@elda.org>

ELDA (Agence pour la Distribution des ressources Linguistiques et l'Evaluation)
9, rue des Cordelières
75013 Paris

www.elda.org

Back

Top

6-45

(2023-10-04) Transcripteurs de langue estonienne@ELDA, Paris, France

Dans le cadre de ses activités de production de ressources linguistiques, ELDA recherche des transcripteurs (f/h) de langue maternelle estonienne à temps plein ou partiel pour la transcription de 1500 heures d’enregistrements audio et/ou la révision des transcriptions. Le nombre total d'heures à transcrire ou à réviser sera adapté selon les disponibilités du candidat ou de la candidate.

La mission aura lieu dans les locaux d'ELDA (Paris 13e) ou à distance via un espace sécurisé. La mission peut démarrer dès à présent.

Profil recherché
• Locuteur (f/h) natif de l'estonien avec un très bon niveau d'orthographe et de grammaire ;
• Bonne connaissance de la langue française et/ou anglaise ;
• Bonne maîtrise d'outils informatiques ;
• Capacité à intégrer et suivre scrupuleusement des règles de transcription.

Rémunération et durée
• À partir du SMIC horaire selon le profil ;
• Fin du projet prévue pour septembre 2024 ;
• Missions et contrat selon disponibilités.

Candidature :
Envoyer un CV à <gabriele@elda.org> et <dylan@elda.org>

ELDA (Agence pour la Distribution des ressources Linguistiques et l'Evaluation)
9, rue des Cordelières
75013 Paris

www.elda.org

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy