ISCA - International Speech
Communication Association


ISCApad Archive  »  2024  »  ISCApad #310  »  Jobs

ISCApad #310

Tuesday, April 09, 2024 by Chris Wellekens

6 Jobs
6-1(2023-10-02) PhD position at IMT Atlantique, Brest, France

PhD Title: Summarization of activities of daily living using sound-based activity recognition

We are seeking candidates for a PhD position in co-tutelle between IMT Atlantique (Brest, France) and Instituto Superior Técnico (Lisbon, Portugal) on the topic 'Summarization of activities of daily living using sound-based activity recognition'.

Please find details in the attached PDF file.
Starting date: before et end of 2023
Closing date for applications: Oct 15, 2023

Back  Top

6-2(2023-10-04) Transcripteurs de langue tchèque @ELDA, Paris, France

Dans le cadre de ses activités de production de ressources linguistiques, ELDA recherche des transcripteurs (f/h) de langue maternelle tchèque à temps plein ou partiel pour la transcription de 1500 heures d’enregistrements audio et/ou la révision des transcriptions. Le nombre total d'heures à transcrire ou à réviser sera adapté selon les disponibilités du candidat ou de la candidate.

La mission aura lieu dans les locaux d'ELDA (Paris 13e) ou à distance via un espace sécurisé. La mission peut démarrer dès à présent.

Profil recherché
• Locuteur (f/h) natif du tchèque avec un très bon niveau d'orthographe et de grammaire ;
• Bonne connaissance de la langue française et/ou anglaise ;
• Bonne maîtrise d'outils informatiques ;
• Capacité à intégrer et suivre scrupuleusement des règles de transcription.

Rémunération et durée
• À partir du SMIC horaire selon le profil ;
• Fin du projet prévue pour septembre 2024 ;
• Missions et contrat selon disponibilités.

Candidature :
Envoyer un CV à <gabriele@elda.org> et <dylan@elda.org>

ELDA (Agence pour la Distribution des ressources Linguistiques et l'Evaluation)
9, rue des Cordelières
75013 Paris

www.elda.org

Back  Top

6-3(2023-10-04) Transcripteurs de langue estonienne@ELDA, Paris, France

Dans le cadre de ses activités de production de ressources linguistiques, ELDA recherche des transcripteurs (f/h) de langue maternelle estonienne à temps plein ou partiel pour la transcription de 1500 heures d’enregistrements audio et/ou la révision des transcriptions. Le nombre total d'heures à transcrire ou à réviser sera adapté selon les disponibilités du candidat ou de la candidate.

La mission aura lieu dans les locaux d'ELDA (Paris 13e) ou à distance via un espace sécurisé. La mission peut démarrer dès à présent.

Profil recherché
• Locuteur (f/h) natif de l'estonien avec un très bon niveau d'orthographe et de grammaire ;
• Bonne connaissance de la langue française et/ou anglaise ;
• Bonne maîtrise d'outils informatiques ;
• Capacité à intégrer et suivre scrupuleusement des règles de transcription.

Rémunération et durée
• À partir du SMIC horaire selon le profil ;
• Fin du projet prévue pour septembre 2024 ;
• Missions et contrat selon disponibilités.

Candidature :
Envoyer un CV à <gabriele@elda.org> et <dylan@elda.org>

ELDA (Agence pour la Distribution des ressources Linguistiques et l'Evaluation)
9, rue des Cordelières
75013 Paris

www.elda.org

Back  Top

6-4(2023-10-05) Professor at Saarland University, Saarbrücken, Germany
The Department of Language Science and Technology of Saarland University
seeks to hire a Professor of Speech Science (W2 with tenure track to
W3). For details see
<https://www.uni-saarland.de/fileadmin/upload/verwaltung/stellen/Wissenschaftler/W2283_W2TTW3_Speech_Science.pdf>.
 
--
Back  Top

6-5(2023-10-14) Internship @ Orange, Lannion, France
Dans le cadre de ses activités de développement de technologies vocales pour l'Afrique subsaharienne, l'entité DATA&AI d'Orange Innovation propose aux étudiants en 2è année de Master un stage de recherche de 6 mois. 
Ce stage a pour objectif le développement de systèmes de compréhension automatique de la parole de bout-en-bout pour des langues africaines (End-to-End Spoken Language Understanding systems in Sub-Saharan African Languages) et pourra débuter dès le mois de janvier 2024. Le stage se déroulera dans les locaux d'Orange Innovation à Lannion. 
Pour davantage d'informations sur l'offre et candidater, rendez-vous sur cette page https://orange.jobs/jobs/v3/offers/129806?lang=fr
Back  Top

6-6(2023-10-16) Ingenieur d'études, DDL, Université de Lyon, France

Le laboratoire DDL de l l'Université de Lyon recherche un ingénieur d'étude pour une durée de 4 mois.

L'offre est accessible au lien suivant :

 https://emploi.cnrs.fr/Offres/CDD/UMR5596-MELCAN-001/Default.aspx

Back  Top

6-7(2023-10-16) Fully funded PhD positions @ University of Colorado Boulder, CO,USA
The Human Bio-Behavioral Signals (HUBBS) Lab at University of Colorado Boulder is seeking outstanding candidates for fully funded positions within the Ph.D. program in Computer Science, specializing in affective computing, human-computer interaction, biomedical health analytics, and human-centered machine learning. 

 

Ideal candidates should possess the following qualifications: 

- Proficiency in data analytics and/or machine learning 

- Prior research experience in machine learning or human-computer interaction, preferably prior publication(s) and/or a Masters (with thesis).   

- Prior experience in human-centered applications 

 

Interested candidates should apply using the following link and list Dr. Theodora Chaspari as a potential Ph.D. advisor:
https://www.colorado.edu/bb/grad-apply 

 

Join the Human Bio-Behavioral Signals (HUBBS) Lab at CU Boulder! 

The goal of HUBBS Lab is to make fundamental contributions to human-centered machine intelligence (ML) and to promote scientific advancement in trustworthy artificial intelligence (AI). This endeavor highly draws from interdisciplinary collaborations in health and psychological sciences, social sciences, and learning sciences, and leads to interdisciplinary scientific contributions. Main research areas include trustworthy and responsible human-centered AI, including dimensions of privacy-preservation, explainability, and fairness; human-AI teaming and collaborative decision-making; intelligent assistive interfaces for personalized user feedback in education and health; assistive computational technologies for combatting racism and bias; multimodal data analytics. 

 

Join Our PhD Program in Computer Science at CU Boulder! 

We are committed to providing students with the best possible resources and support throughout their’ PhD journey. 

Our Ph.D. offer includes: 

  • At least five years of full funding (tuition, fees, health insurance, dental insurance) plus a stipend of approximately $3K/month for the 9-month academic year (additionally, most students receive the stipend during summer). There are 6 weeks of parental leave and sick leave. 
  • A first-year fellowship of $4K to help with relocation and support during the first semester, and a $1K early career professional development fellowship to attend conferences 
  • GRE is not required 

The Computer Science Department and CU Boulder have consistently risen in rankings over the years. We are the fastest-growing and largest department within the College of Engineering and Applied Science. The department is home to 50 tenured and tenure-track faculty plus 25 instructional faculty and now hosts nearly 1,700 undergraduate and 500 graduate students. 

 

Highlights  

  • Our faculty members are renowned for their exceptional contributions to academia and research. These include the Alan T. Waterman Award, given for notable achievements in science and engineering, multiple NSF CAREER awards, which are granted to promising researchers in the early stages of their careers, Sloan Research Fellowship, providing support and recognition to early-career scientists and scholars, and more. 
  • Many of our students are funded by NSF’s Graduate Research Fellowship Program (GRFP), NASA GRO fellowship, Meta Research PhD Fellowship, Google Ph.D. Fellowship, Amazon Science, among others. 

We are committed to fostering a diverse, inclusive, and academically excellent community. At the college level, our undergraduate student population included 30% who self-identified as women, 27% who self-identified as persons of color, 15% who were first-generation college students, and 14% who were international students. Our graduate student body included 32% of students who self-identified as women, 16% who self-identified as a person of color, and 29% international students. In 2022, 26% of our tenured/tenure track faculty self-identify as women, and 22% self-identify as people of color. 

 

Location, location, location! 

Boulder is located at the base of the Rocky Mountains in north-central Colorado. Boulder is consistently ranked as one of the happiest and healthiest places to live in the United States, known for its active, outdoorsy lifestyle and a strong sense of community. The city enjoys 300 days of sunshine annually and has over 150 miles of hiking and biking trails, along with easy access to skiing, rock climbing, and other outdoor activities. 

Major companies like Google, Microsoft, Ball Aerospace, and Lockheed Martin have offices in Boulder. The Denver metropolitan area, a 30-minute drive from Boulder, is a world-class city with an international airport with direct flights to Tokyo, Paris, London, and Mexico City. 

 


Back  Top

6-8(2023-10-18) Post doc researcher @ University of Glasgow, UK
We are looking to recruit a postdoctoral researcher (up to 3 years) with experience in deep learning + multimodal information processing (vision +audio/text). The researcher will contribute to an exciting interdisciplinary project that will develop human-centric AI models for analysing complex, audiovisual data to understand diversity and inclusion in screen media.

Please find details and link to apply for the post here: https://www.jobs.ac.uk/job/DDG225/research-assistant-associate
 
Information queries can be sent to tanaya.guha@glasgow.ac.uk
Back  Top

6-9(2023-10-15) Fully funded PhD positions at CUBES, University of Memphis, TN, USA

The CUBES (Computational Understanding of Behaviors, Experiences, and Subjective states) Lab at the University of Memphis is seeking outstanding candidates for fully funded positions within the PhD program in Computer Science. We are looking for students interested in research at the intersection of machine learning, psychology, and signal processing and specializing in affective computing, human-computer interaction, and ethical human-centered machine learning.

 

Ideal candidates should have experience in:

-      Data analytics, machine learning, and/or trustworthy and fair AI

-      Human-centered applications either in research or industry

-      Research communication (e.g., publications, presentations, master’s thesis)

 

Interested candidates should apply to the PhD program at UofM and list Dr. Brandon Booth as a potential PhD advisor. Priority screening of applications for the Spring 2024 term will begin on November 1st, 2023, and applications will be accepted through December 1st, 2023. We particularly welcome candidates from groups that are historically underrepresented in computer science.

 

Join the CUBES Lab at UofM!

The goal of CUBES Lab is to use multimodal AI to understand, track, and promote beneficial mental states/experiences/behaviors in real-life settings without perpetuating group biases. We conduct basic research to realize informative and interpretable representations of experiences, and we aim to close the loop in interactive experience monitoring, scale up experience modeling systems, and find unique ways for human-AI teams to succeed where either alone may fail.  Our work is inclusive of new ideas and directions, especially in interdisciplinary and translational research in education, health, procedural fairness, and other prosocial domains.

 

Join our PhD Program in Computer Science at UofM!

The University of Memphis is a top-tier research university with a Carnegie R1 designation, and we are committed to engaging PhD students in cutting-edge research in their area of interest.  Our PhD program builds both breadth (through core graduate courses) and depth (via a rich selection of advanced courses and requiring participation in research projects), and it provides opportunities to work with interdisciplinary teams on federally funded research collaborations. For example, CS faculty lead the NIH-funded mDOT Biomedical Technology Resource Center and the Center for Information Assurance (CfIA). In addition, CS faculty work closely with multidisciplinary centers at the university such as the Institute for Intelligent Systems (IIS).

 

Memphis Highlights

Known as America’s Number 1 logistics hub, Memphis has been ranked as one of the “World’s Greatest Places” by TIME, as America’s 4th best city for jobs by Glassdoor, and 4th in “Best Cost of Living”. Memphis metropolitan area has a population of 1.3 million. It boasts a vibrant culture and has a pleasant climate with an average temperature of 63 degrees. 

 

Brandon Booth, PhD (He/Him/His)
Assistant Professor
Department of Computer Science
The University of Memphis

392 Dunn Hall
Memphis, TN 38152
brandon.m.booth@memphis.edu
memphis.edu/brandon-booth
Back  Top

6-10(2023-10-23) Post Doc position: Natural Language Processing, Saarland University, Germany

Post Doc  position: Natural Language Processing, Saarland University, Germany

=============================================================

(Computer Science, Computational Linguistics  or similar)

The research group is focusing on getting a deeper understanding of how modern deep learning methods can be applied to natural languages. Our recent achievements include a best paper award at COLING 2022 and a best theme paper award at ACL 2023. We offer a PostDoc  position that is topically open and should have a strong focus on applying machine learning techniques to natural language data. The research should on the one hand be connected to ongoing research of PhD students and at the same time pursue a clear new direction. 

The ideal candidate for the position would have:

   1. Solid experience in natural language processing

   2. Excellent knowledge of machine learning and deep learning

   3. Be involved, knowledgeable and generous in scientific discussions with all group members

   4. Excellent programming skills

   5. Doctoral degree in Computer Science, Computational Linguistics or similar

Salary: The PostDoc  position will be 100% of full time on the German E13 scale (TV-L) which is about 4188€ per month before tax and social security contributions. The appointments will be for two years with a possible extension.

About the department: The department of Language Science and Technology is one of the leading departments in the speech and language area in Europe. The flagship project at the moment is the CRC on Information Density and Linguistic Encoding. It also runs a significant number of European and nationally funded projects. In total, it has seven faculty and around 50 postdoctoral researchers and PhD students. The department is part of the Saarland Informatics Campus. With 900 researchers, two Max Planck institutes and the German Research Center for Artificial Intelligence, it is one of the leading locations for Informatics in Germany and Europe.

How to apply: Please send us a letter of motivation, a research plan (max one page), your CV, your transcripts, a list of publications, and the names and contact information of at least two references, as a single PDF or a link to a PDF if the file size is more than 5 MB.

Please apply latest by November 20th, 2023. Earlier applications are welcome and will be processed as they come in.

Contact: Applications and any further inquiries regarding the project should be directed to dietrich.klakow at lsv.uni-saarland.de 

Back  Top

6-11(2023-11-15) Master 2 Internship, LPC Marseille, France

Master 2 Internship Proposal

Advisors: Jules Cauzinille, Benoˆıt Favre, Arnaud Rey

November 2023

Deep transfer knowledge from speech to primate vocalizations

Keywords: Computational bioacoustics, deep learning, self-supervised learning, transfer knowledge, efficient fine-tuning, primate vocalizations

1 Context This internship takes part in a multidisciplinary research project aimed at bridging the gap between state of the art deep leaning methods developed for speech processing and computational bioacoustics. Computational bioacoustics is a relatively new research filed which proposes to tackle the study of animal acoustic communication with computational approaches Stowell [2022]. Recently, bioacousticians are showing increasing interest for the deep learning revolution embodied in transformer architectures and self-supervised pre-trained models, but much investigation still needs to be carried out. We propose to test the viability of self-supervision and knowledge transfer as a bioacoustic tool by pre-training models on speech and using them for primate vocalisation analysis.

2 Problem Statement Speech based models are able to reach convincing performance on primate-related tasks including segmentation, individual identification or call type classification Sarkar and Doss [2023] as they are with many different downstream tasks (such as vocal emotion recognition Wang et al. [2021]). We have tested publicly available models such as HuBERT Hsu et al. [2021] and Wav2Vec2 [Schneider et al., 2019], two self-supervised speech-based architectures, on some of these tasks with Gibbon vocalizations. Our method involves probing and traditional fine-tuning of these models.

As to ensure true knowledge transfer from pre-training speech datasets to the downstream classification tasks, the goal of this internship will be to implement efficient fine-tuning methods in a similar fashion. These will allow to limit and control the amount of information lost in the finetuning process. Depending on the interests of the candidate, the methods can include prompt tuning Lester et al. [2021], attention prompting Gao et al. [2023], low rank adaptation Hu et al. [2021] or adversarial reprogramming Elsayed et al. [2018]. The candidate will also be free to explore other methods relevant to the question at hand, either on Gibbons or other species data-sets currently being collected.

3 Profile The intern will propose and implement the efficient fine-tuning solutions on an array of (preferably self-supervised) acoustic models pre-trained on speech or general sound such as HuBERT, Wav2vec, WavLM, VGGish, etc. Exploring adversial re-programming of models pre-trained on other modalities (images, videos, etc.) could also be carried out. The work will be implemented using pytorch.The candidate must have the following qualities :

• Excellent knowledge of deep learning methods

• Extensive experience with PyTorch models

• An interest in processing bioacoustic data

• An interest in reading and writing scientific papers as well as some curiosity for

research challenges

The internship will last 6 months at the LIS and LPC laboratories in Marseille during spring 2024.

The candidate will work in close collaboration with Jules Cauzinille as part of his thesis on “Self-supervised learning for primate vocalization analysis”. The candidate will also be in contact with the researchers community of the ILCB.

4 Contact Please send a CV, transcripts and a letter of application to jules.cauzinille@lis- lab.fr, benoit.favre@lislab.fr, and arnaud.rey@cnrs.fr. Do not hesitate to contact us if you have any question (or if you want to hear what our primates sound like).

References

Gamaleldin F. Elsayed, Ian Goodfellow, and Jascha Sohl-Dickstein. Adversarial reprogramming of neural networks, 2018.

Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, and Yu Qiao. Llama-adapter v2: Parameter-efficient visual instruction model, 2023.

Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, PP:1–1, 2021. doi: 10.1109/TASLP.2021.3122291.

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021.

Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning, 2021.

Eklavya Sarkar and Mathew Magimai Doss. Can Self-Supervised Neural Networks Pre-Trained on Human Speech distinguish Animal Callers?, May 2023. arXiv:2305.14035 [cs, eess].

Steffen Schneider, Alexei Baevski, Ronan Collobert, and Michael Auli. wav2vec: Unsupervised Pre-Training for Speech Recognition. In Proc. Interspeech 2019, pages 3465–3469, 2019. doi: 10.21437/Interspeech.2019-1873.

Dan Stowell. Computational bioacoustics with deep learning: a review and roadmap. 10:e13152, 2022. ISSN 2167-8359. doi: 10.7717/peerj.13152. URL https://peerj.com/articles/13152.

Yingzhi Wang, Abdelmoumene Boumadane, and Abdelwahab Heba. A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding. CoRR, abs/2111.02735, 2021. doi: 10.48550/arXiv.2111.02735

Back  Top

6-12(2023-11-23) Post-doctoral research position - L3i - La Rochelle France

-- Post-doctoral research position - L3i - La Rochelle France

---------------------------------------------------------------------------------------------------------------------------

Title : Emotion detection by semantic analysis of the text in comics speech balloons

 

The L3i laboratory has one open post-doc position in computer science, in the specific field of natural language processing in the context of digitised documents.

 

Duration: 12 months (an extension of 12 months will be possible)

Position available from: as soon as possible

Salary: approximately 2100 € / month (net)

Place: L3i lab, University of La Rochelle, France

Specialty: Computer Science/ Document Analysis/ Natural Language Processing

Contact: Jean-Christophe BURIE (jcburie [at] univ-lr.fr) / Antoine Doucet (antoine.doucet [at] univ-lr.fr)

 

Position Description

The L3i is a research lab of the University of La Rochelle. La Rochelle is a city in the south west of France on the Atlantic coast and is one of the most attractive and dynamic cities in France. The L3i works since several years on document analysis and has developed a well-known expertise in ‘Bande dessinée”, manga and comics analysis, indexing and understanding.

The work done by the post-doc will take part in the context of SAiL (Sequential Art Image Laboratory) a joint laboratory involving L3i and a private company. The objective is to create innovative tools to index and interact with digitised comics. The work will be done in a team of 10 researchers and engineers.

The team has developed different methods to extract and recognise the text of the speech balloons. The specific task of the recruited researcher will be to use Natural Language Processing strategies to analyse the text in order to identify emotions expressed by a character (reacting to the utterance of another speaking character) or caused by it (talking to another character). The datasets will be collections of comics in French and English.

 

Qualifications

Candidates must have a completed PhD and a research experience in natural language processing. Some knowledge and experience in deep learning is also recommended.

 

General Qualifications

• Good programming skills mastering at least one programming language like Python, Java, C/C++

• Good teamwork skills

• Good writing skills and proficiency in written and spoken English or French

 

Applications

Candidates should send a CV and a motivation letter to jcburie [at] univ-lr.fr and antoine.doucet [at] univ-lr.fr.

Back  Top

6-13(2023-11-25) M2 Master Internship, Nancy, France

M2 Master Internship

Automatic Alsatian speech recognition 

1 Supervisors

Name: Emmanuel Vincent

Team and lab: Multispeech team, Inria research center at Université de Lorraine, Nancy

Email: emmanuel.vincent@inria.fr

Name: Pascale Erhart

Team and lab: Language/s and Society team, LiLPa, Strasbourg

Email: pascale.erhart@unistra.fr

2 Motivation and context This internship is part of the Inria COLaF project (Corpora and tools for the languages of France), whose objective is to develop and disseminate inclusive language corpora and technologies for regional languages (Alsatian, Breton, Corsican, Occitan, Picard, etc.), overseas languages and non-territorial immigration languages of France. With few exceptions, these languages are largely ignored by language technology providers [1]. However, such technologies are keys to the protection, promotion and teaching of these languages. Alsatian is the second regional language spoken in France in terms of number of speakers, with 46% of Alsace residents saying they speak Alsatian fairly well or very well [2]. However, it remains an underresourced language in terms of data and language technologies. Attempts at machine translation have been made as well as data collection [3].

3 Objectives The objective of the internship is to design an automatic speech recognition system for Alsatian based on sound archives (radio, television, web, etc.). This raises two challenges: i) Alsatian is not a homogeneous language but a continuum of dialectal varieties which are not always written in a standardized way, ii) the textual transcription is often unavailable or differs from the pronounced words (transcription errors , subtitles, etc.). Solutions will be based on i) finding a suitable methodology for choosing and preparing data, ii) designing an automatic speech recognition system using end-to-end neural networks which can rely on the adaptation of an existing multilingual system like Whisper [4] in a self-supervised manner from a number of untranscribed recordings [5] and in a supervised manner from a smaller number of transcribed recordings, or even from text-only data [6]. The work will be based on datasets collected by LiLPa and the COLaF project’s engineers, which include the television shows Sunndi's Kater [7] and Kùmme Mit [8] whose dialogues are scripted, some radio broadcasts from the 1950s–1970s with their typescripts [9], as well as untranscribed radio broadcasts of France Bleu Elsass. Dictionaries of Alsatian such as the Wörterbuch der elsässischen Mundarten which can be consulted via the Woerterbuchnetz portal [10] or phonetization initiatives [11] could be exploited, for example using Orthal spelling [12]. The internship opens the possibility of pursuing a PhD thesis funded by the COLaF project.

4 Bibliography

[1] DGLFLF, Rapport au Parlement sur la langue française 2023, https://www.culture.gouv.fr/Media/Presse/Rapport-au-Parlement-surla-langue-francaise-2023

[2] https://www.alsace.eu/media/5491/cea-rapport-esl-francais.pdf

[3] D. Bernhard, A-L Ligozat, M. Bras, F. Martin, M. Vergez-Couret, P. Erhart, J. Sibille, A. Todirascu, P. Boula de Mareüil, D. Huck, “Collecting and annotating corpora for three under-resourced languages of France: Methodological issues”, Language Documentation & Conservation, 2021, 15, pp.316-357.

[4] A. Radford, J.W. Kim, T. Xu, G. Brockman, C. McLeavey, I. Sutskever, “Robust speech recognition via large-scale weak supervision”, in 40th International Conference on Machine Learning, 2023, pp. 28492-28518.

[5] A. Bhatia, S. Sinha, S. Dingliwal, K. Gopalakrishnan, S. Bodapati, K. Kirchhoff, “Don't stop selfsupervision: Accent adaptation of speech representations via residual Adapters”, in Interspeech, 2023, pp. 3362-3366.

[6] N. San, M. Bartelds, B. Billings, E. de Falco, H. Feriza, J. Safri, W. Sahrozi, B. Foley, B. McDonnell, D. Jurafsky, “Leveraging supplementary text data to kick-start automatic speech recognition system development with limited transcriptions”, in 6th Workshop on Computational Methods for Endangered Languages, 2023, pp. 1-6.

[7] https://www.france.tv/france-3/grand-est/sunndi-s-kater/ [8] https://www.france.tv/france-3/grand-est/kumme-mit/toutes-les-videos/ [9] https://www.ouvroir.fr/cpe/index.php?id=1511 [10] https://woerterbuchnetz.de/?sigle=ElsWB#0 [11] 10.5281/zenodo.1174213 [12] https://orthal.fr/ 

5 Profile MSc in speech processing, natural language processing, computational linguistics, or computer science. Strong programming skills in Python/Pytorch. Knowledge of Alsatian and/or German is a plus, but in no way a prerequisite

Back  Top

6-14(2023-11-26) Stage Université du Mans, Le Mans, France

Evaluation des systèmes de synthèse de la parole dans un environnement bruyant

 Sujet L’´evaluation perceptive est capitale dans de nombreux domaines li´es au technologie de la parole dont la synth`ese de la parole. Elle permet d’´evaluer la qualit´e de la synth`ese de mani`ere subjective en demandant `a un jury[5] de noter la qualit´e d’un stimuli de parole synth´etis´ee[1, 2]. De r´ecent travaux ont permis de d´evelopper un mod`ele d’intelligence artificielle[3, 4] qui permet de pr´edire l’´evaluation subjective d’un segment de parole synth´etis´ee, ainsi permettant de s’affranchir d’un test par jury. Le probl`eme majeur de cette ´evaluation est l’interpr´etation du mot “qualit´e”. Certains peuvent baser leur jugement sur les caract´eristiques intrins`eques de la parole (tel que le timbre, le d´ebit de parole, la ponctuation, etc) alors que d’autres peuvent baser leur jugement sur les caract´eristiques li´es au signal audio (comme la pr´esence ou non de distorsion). Ainsi, l’´evaluation subjective de la parole peut ˆetre biais´ee par l’interpr´etation de la consigne par les auditeurs. Par cons´equent, le mod`ele d’intelligence artificielle mentionn´e ci-dessus peut ˆetre ainsi bas´e sur des mesures biais´ees. Le projet a pour but de r´ealiser un travail exploratoire pour ´evaluer la qualit´e de la synth`ese de la parole d’une mani`ere plus robuste que celle ayant ´et´e propos´e jusqu’ici. Pour ceci, nous partons de l’hypoth`ese que la qualit´e de la synth`ese de la parole peut ˆetre estim´ee par le biais de sa d´etection dans un environnement r´eel. En d’autre termes, un signal synth´etis´e parfaitement pour reproduire un signal de parole humaine ne devrait pas ˆetre d´etect´e dans un environnement de la vie quotidienne. Bas´e sur cette hypoth`ese, nous proposons donc de monter une exp´erience de perception de la parole en milieu bruyant. Il existe des m´ethodes de reproduction de milieu sonore qui permettent de simuler un environnement existant au casque. L’avantage de ces m´ethodes c’est qu’il est ´egalement possible de jouer un enregistrement d’un milieu r´eel au casque tout en ajoutant des signaux comme s’il avait ´et´e pr´esent dans la sc`ene sonore enregistr´ee. Ceci implique d’une part une campagne de mesure acoustique dans des environnement bruyant de la vie quotidienne (transport, open space, cantine, etc). Ensuite, une g´en´eration de parole synth´etis´ee sera n´ecessaire tout en prenant en compte le contexte des enregistrements. Il sera ´egalement pertinent de faire varier les param`etres de la parole synth´etis´ee tout en gardant la mˆeme s´emantique. Les enregistrements de la vie quotidienne seront ensuite mix´es aux signaux de parole synth´etis´ee pour ´evaluer la d´etection de cette derni`ere. Nous utiliserons le pourcentage de fois que la parole synth´etis´ee sera d´etect´ee comme indicateur de qualit´e. Ces pourcentages de d´etection seront ensuite compar´es au pr´ediction du mod`ele d’intelligence artificielle mentionn´e ci-dessus. Ainsi, nous pourrons conclure (1) si les m´ethodes sont ´equivalentes ou compl´ementaires et (2) quel(s) param`etre(s) de la parole synth´etis´ee engendre une d´etection de cette derni`ere en milieu bruyant.

Informations compl´ementaires:

• Encadrement: Le stage sera co-encadr´e par Aghilas Sini, maˆıtre de conf´erence au Laboratoire d’Informatique de l’Universit´e du Mans (aghilas.sini@univ-lemans.fr) et Thibault Vicente, maˆıtre de conf´erence au Laboratoire d’Acoustique de l’Universit´e du Mans (thibault.vicente@univ-lemans.fr)

• Niveau requis: Stage de M2 recherche

• P´eriode envisag´ee: 6 mois (F´evrier `a Juillet 2024)

• Lieu: Le Mans Universit´e

• mots-cl´es: parole synth´etis´ee, synth`ese sonore binaurale, test par jury

References

[1] Y.-Y. Chang. Evaluation of tts systems in intelligibility and comprehension tasks. In Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing (ROCLING 2011), pages 64–78, 2011.

[2] J. Chevelu, D. Lolive, S. Le Maguer, and D. Guennec. Se concentrer sur les diff´erences: une m´ethode d’´evaluation subjective efficace pour la comparaison de syst`emes de synth`ese (focus on differences: a subjective evaluation method to efficiently compare tts systems*). In Actes de la conf´erence conjointe JEP-TALN-RECITAL 2016. volume 1: JEP, pages 137–145, 2016.

[3] C.-C. Lo, S.-W. Fu, W.-C. Huang, X. Wang, J. Yamagishi, Y. Tsao, and H.-M. Wang. MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion. In Proc. Interspeech 2019, pages 1541–1545, 2019.

[4] G. Mittag and S. M¨oller. Deep learning based assessment of synthetic speech naturalness. arXiv preprint arXiv:2104.11673, 2021.

[5] M. Wester, C. Valentini-Botinhao, and G. E. Henter. Are we using enough listeners? no!—an empirically-supported critique of interspeech 2014 tts evaluations. In 16th Annu. Conf. Int. Speech Commun. Assoc., 2015.

Back  Top

6-15(2023-11-27) Internship @ Telecom Paris, Paris, France

ANR Project «REVITALISE»

Automatic speech analysis of public talks.

 

Description. Today, humanity has reached a stage at which extremely important aspects (such as information exchange) are tied not only to actual so-called hard skills, but also to soft skills. One such important skill is public speaking. Like many forms of interaction between people, the assessment of public speaking depends on many factors (often subjectively perceived). The goal of our project is to create an automatic system which can take into account these different factors and evaluate the quality of the performance. This requires understanding which elements can be assessed objectively and which vary depending on the listener [Hemamou, Wortwein, Chollet21]. For such an analysis, it is necessary to analyze public speaking at various levels: high-level (audio, video, text), intermediate (voice monotony, auto-gestures, speech structure, and etc.) and low-level (fundamental frequency, action units, POS / tags, and etc.) [Barkar]. This internship offers an opportunity to analyze the audio component of a public speech. The student is asked to solve two main problems. The engineering task is to create an automatic speech transcription system that detects speech disfluency. To do this, it is proposed to collect a bibliography on this topic and come up with an engineering solution. The second, research task, is to use audio cues to automatically analyze the success of a performance of a talk. This internship will give you an opportunity to solve an engineering problem as well as learn more about research approaches. By the end you will have expertise in audio processing as well and machine learning methods for multimodal analysis. If the internship is successfully completed, an article may be published. PhD position fundings on Social Computing will be accessible in the team at the end of the internship (at INRIA).

Registration & Organisation. Name of organization: Institut Polytechnique de Paris, Telecom-Paris Website of organization: https://www.telecom-paris.fr Department: IDS/LTCI/ Address: Palaiseau, France

Supervision. Supervision will include weekly meetings with the main supervisor and regular meetings (every 2-3 weeks) with co-supervisors. Telecom-Paris, 2023-2024 ANR Project «REVITALISE» Name of supervisor: Alisa BARKAR Name of co-supervisor: Chloe Clavel, Mathieu Chollet, Béatrice BIANCARDI Contact details: alisa.barkar@telecom-paris.fr

Duration & Planning. The internship is planned as a 5-6 month full-time internship for the spring semester 2024. 6 months considers 24 weeks within which it will be covered following list of activities:

● ACTIVITY 1(A1): Problem description and integration to the working environment

● ACTIVITY 2(A2): Bibliography overview

● ACTIVITY 3(A3): Implementation of the automatic transcription with detected discrepancies

● ACTIVITY 4(A4): Evaluation of the automatic transcription

● ACTIVITY 5(A5): Application of the developed methods to the existing data

● ACTIVITY 6(A6): Analysis of the importance of para-verbal features for the performance perception

● ACTIVITY 7(A7): Writing the report

Selected references of the team.

1. [Hemamou] L. Hemamou, G. Felhi, V. Vandenbussche, J.-C. Martin, C. Clavel, HireNet: a Hierarchical Attention Model for the Automatic Analysis of Asynchronous Video Job Interviews. in AAAI 2019, to appear

2. [Ben-Youssef] Atef Ben-Youssef, Chloé Clavel, Slim Essid, Miriam Bilac, Marine Chamoux, and Angelica Lim. Ue-hri: a new dataset for the study of user engagement in spontaneous human-robot interactions. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, pages 464–472. ACM, 2017.

3. [Wortwein] Torsten Wörtwein, Mathieu Chollet, Boris Schauerte, Louis-Philippe Morency, Rainer Stiefelhagen, and Stefan Scherer. 2015. Multimodal Public Speaking Performance Assessment. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (ICMI '15). Association for Computing Machinery, New York, NY, USA, 43–50.

4. [Chollet21] Chollet, M., Marsella, S., & Scherer, S. (2021). Training public speaking with virtual social interactions: effectiveness of real-time feedback and delayed feedback. Journal on Multimodal User Interfaces, 1-13.

5. [Barkar] Alisa Barkar, Mathieu Chollet, Beatrice Biancardi, and Chloe Clavel. 2023. Insights Into the Importance of Linguistic Textual Features on the Persuasiveness of Public Speaking. In Companion Publication of the 25th International Conference on Multimodal Interaction (ICMI '23 Companion). Association for Computing Machinery, New York, NY, USA, 51–55. https://doi.org/10.1145/3610661.3617161 Telecom-Paris, 2023-2024 ANR Project «REVITALISE»

Other references.

1. Dinkar, T., Vasilescu, I., Pelachaud, C. and Clavel, C., 2020, May. How confident are you? Exploring the role of fillers in the automatic prediction of a speaker’s confidence. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8104-8108). IEEE.

2. Whisper: Robust Speech Recognition via Large-Scale Weak Supervision, Radford A. et al., 2022, url: https://arxiv.org/abs/2212.04356

3. Romana, Amrit and Kazuhito Koishida. “Toward A Multimodal Approach for Disfluency Detection and Categorization.” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023): 1-5.

4. Radhakrishnan, Srijith et al. “Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition.” ArXiv abs/2310.06434 (2023): n. pag.

5. Wu, Xiao-lan et al. “Explanations for Automatic Speech Recognition.” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023): 1-5.

6. Min, Zeping and Jinbo Wang. “Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study.” ArXiv abs/2307.06530 (2023): n. pag.

7. Ouhnini, Ahmed et al. “Towards an Automatic Speech-to-Text Transcription System: Amazigh Language.” International Journal of Advanced Computer Science and Applications (2023): n. pag.

8. Bigi, Brigitte. “SPPAS: a tool for the phonetic segmentations of Speech.” (2023).

9. Rekesh, Dima et al. “Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition.” ArXiv abs/2305.05084 (2023): n. pag.

10. Arisoy, Ebru et al. “Bidirectional recurrent neural network language models for automatic speech recognition.” 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015): 5421-5425.

11. Padmanabhan, Jayashree and Melvin Johnson. “Machine Learning in Automatic Speech Recognition: A Survey.” IETE Technical Review 32 (2015): 240 - 251.

12. Berard, Alexandre et al. “End-to-End Automatic Speech Translation of Audiobooks.” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018): 6224-6228.

13. Kheir, Yassine El et al. “Automatic Pronunciation Assessment - A Review.” ArXiv abs/2310.13974 (2023): n. pag. Telecom-Paris, 2023-2024

Back  Top

6-16(2023-11-30) Internship Université du Mans, Le Mans, France

Title: Predictive Modeling of Subjective Disagreement in Speech Annotation/Evaluation Host laboratory : LIUM

Location : Le Mans

Supervisors : Meysam Shamsi, Anthony Larcher

Beginning of internship : February 2024

Application deadline : 10/01/2024

Keywords: Subjective Disagreement Modeling, Synthetic Speech Quality Evaluation, Speech Emotion Recognition In the context of modeling subjective tasks, where diverse opinions, perceptions, and judgments exist among individuals, such as in speech quality or speech emotion recognition, addressing the challenge of defining ground truth and annotating a training set becomes crucial. The current practice of aggregating all annotations into a single label for modeling a subjective task is neither fair nor efficient [1]. The variability in annotations or evaluations can stem from various factors [2], broadly categorized into those associated with corpus quality and those intrinsic to the samples themselves. In the first case, the delicate definition of a subjective task introduces sensitivity into the annotation process, potentially leading to more errors, especially where the annotation tools and platform lack precision or annotators experience fatigue. In the second case, the inherent ambiguity in defining a subjective task and different perception may result in varying annotations and disagreements. Developing a predictive model to understand annotator/evaluator disagreement is crucial for engaging in discussions related to ambiguous samples and refining the definition of subjective concepts. Furthermore, this model can serve as a valuable tool for assessing the confidence of automatic evaluations [3,4]. This modeling approach will contribute to the automatic evaluation of corpus annotations, identification of ambiguous samples for reconsideration or re-annotation, automatic assessment of subjective models, and the detection of underrepresented samples and biases in the dataset. The proposed research involves utilizing a speech dataset such as MS-Podcast [5], SOMOS [6], VoiceMOS [7], for a subjective task with multiple annotations per sample. The primary objective is to predict the variation in assigned labels, measured through disagreement scores, entropy, or distribution.

Reference: [1]. Davani, A. M., Díaz, M., & Prabhakaran, V. (2022). Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics, 10, 92-110.

[2]. Kreiman, J., Gerratt, B. R., & Ito, M. (2007). When and why listeners disagree in voice quality assessment tasks. The Journal of the Acoustical Society of America, 122(4), 2354-2364.

[3]. Wu, W., Chen, W., Zhang, C., & Woodland, P. C. (2023). It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation. arXiv preprint arXiv:2310.00486.

[4]. Han, J., Zhang, Z., Schmitt, M., Pantic, M., & Schuller, B. (2017, October). From hard to soft: Towards more human-like emotion recognition by modelling the perception uncertainty. In Proceedings of the 25th ACM international conference on Multimedia (pp. 890-897).

[5]. Lotfian, R., & Busso, C. (2017). Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings. IEEE Transactions on Affective Computing, 10(4), 471-483.

[6]. Maniati, G., Vioni, A., Ellinas, N., Nikitaras, K., Klapsas, K., Sung, J.S., Jho, G., Chalamandaris, A., Tsiakoulis, P. (2022) SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis. Proc. Interspeech 2022, 2388-2392 [7]. Cooper, E., Huang, W. C., Tsao, Y., Wang, H. M., Toda, T., & Yamagishi, J. (2023). The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains. arXiv preprint arXiv:2310.02640.

Applicant profile : Candidate motivated by artificial intelligence, enrolled in a Master's degree in Computer Science or related fields

For application: Send CV + cover letter to : meysam.shamsi@univ-lemans.fr or anthony.larcher@univ-lemans.fr before 10/01/2024

Back  Top

6-17(2023-12-02) Senior Data Scientist at the University of Chicago, IL, USA

Senior Data Scientist at the University of Chicago

 

Please apply at https://uchicago.wd5.myworkdayjobs.com/External/job/Chicago-IL/Sr-Data-Scientist_JR24587

 

About the Department
 

The TMW Center for Early Learning + Public Health (TMW Center) develops science-based interventions, tools, and technologies to help parents and caregivers interact with young children in ways that maximize brain development. A rich language environment is critical to healthy brain development, however few tools exist to measure the quality or quantity of these environments. Access to this type of data allows caregivers to enhance interactions in real-time and gives policy-makers insight in how to best build policies that have a population-level impact.

The wearable team within TMW Center is building a low-cost wearable device that can reliably and accurately measure a child’s early language environment vis-à-vis the conversational turns between a child and caregiver. The goal is to provide accurate, real-time feedback that empowers parents and caregivers to create the best language environment for their children.


Job Summary
 

The job works independently to perform a variety of activities relating to software support and/or development. Analyzes, designs, develops, debugs, and modifies computer code for end user applications, beta general releases, and production support. Guides development and implementation of applications, web pages, and user-interfaces using a variety of software applications, techniques, and tools. Solves complex problems in administration, maintenance, integration, and troubleshooting of code and application ecosystem currently in production.

We are searching for a strategic and inquisitive senior data scientist to develop and optimize innovative AI-based models focused on speech/audio processing. The senior data scientist is expected to outline requirements, brainstorm ideas and solutions with leadership, manage data integrity and conduct experiments, assign tasks to junior staff, and monitor performance of the team.

 

Responsibilities

  • Formulates, suggests, and manages data-driven projects to support the development of audio algorithms and use cases.
  • Analyzes data from various entities for later use by junior data scientists.
  • Assesses scope and timelines, prioritize goals, and prepare project plans to meet product and research objectives.
  • Delegates tasks to junior data scientists and provide coaching to improve quality of work.
  • Continuously trains and nurtures data scientists to take on bigger assignments.
  • Provides leadership in advancing the science of TMW Center interventions by generating new ideas and collaborating with the research analysis team.
  • In collaboration with CTO, selects and guides decisions on statistical procedures and model selections, including conducting exploratory experiments to develop proof of concept.
  • Cross-validate models to ensure generalization and predictability.
  • Stays informed about developments in Data Science and adjacent fields to ensure most relevant methods and outputs are being leveraged.
  • Ensures data governance is in place to comply with regulations and privacy standards and maintain documentation of methodologies, coding, and results.
  • Designs new systems, features, and tools. Solves complex problems and identifies opportunities for technical improvement and performance optimization. Reviews and tests code to ensure appropriate standards are met.
  • Utilizes technical knowledge of existing and emerging technologies, including public cloud offerings from Amazon Web Services, Microsoft Azure, and Google Cloud.
  • Acts as a technical consultant and resource for faculty research, teaching, and/or administrative projects.
  • Performs other related work as needed.


Minimum Qualifications
 

Education:

Minimum requirements include a college or university degree in related field.

---
Work Experience:

Minimum requirements include knowledge and skills developed through 5-7 years of work experience in a related job discipline.

---
Certifications:

---

Preferred Qualifications

Education:

  • Master’s degree in Computer Science, Statistics, Mathematics, or Economics with a focus on computer science.

Experience:

  • Experience with Machine Learning and LLMs.
  • Experience working on audio or speech data.
  • Experience implementing edge models using TensorFlow micro, TensorFlow lite, and corresponding quantization techniques.
  • Experience building audio classification models or speech to text models.
  • Experience using the latest pre-trained models such as whisper and wav2vec.
  • Proven experience taking an idea or user need and translating it into fully realized applications.
  • Ability to relay insights in layman’s terms to inform business decisions. 
  • 3+ years leading and managing junior data scientists.

Technical Skills or Knowledge:

  • Proficiency in Python, Pytorch, Tensorflow, TinyML, Pandas and Numpy.
  • Experience with cloud environments such as AWS, Azure or GCloud.
  • Experience with command line interfaces (Linux, SSH).
  • Experience processing large datasets with Spark, Dask or Ray.

Application Documents

  • Resume (required)
  • Cover Letter (preferred)


When applying, the document(s) MUST be uploaded via the My Experience page, in the section titled Application Documents of the application.

 

Back  Top

6-18(2023-12-05) Post-doc et ingénieur d'étude dans le dans le cadre de l’ANR-JCJC RESSAC, LPNC, Grenoble, France

 

 
Back  Top

6-19(2023-12-05) Postdoctoral Scholar, Penn State University, PA, USA

Postdoctoral Scholar | Data Sciences and Artificial Intelligence at Penn State University

The Data Sciences and Artificial Intelligence (DS/AI) group at Penn State invites applications for a Postdoctoral Scholar position, set to commence in Fall 2024. This role is centered on cutting-edge research at the nexus of machine learning, deep learning, computer vision, psychology, and biology, with foci on psychology-inspired AI and addressing significant biological questions using AI.

To Apply: https://psu.wd1.myworkdayjobs.com/en-US/PSU_Academic/job/Postdoctoral-Scholar---College-of-IST-Data-Sciences-and-Artificial-Intelligence_REQ_0000050584-1

Qualifications:

  • Ph.D. in computer science, A.I., data science, physics, or neuroscience with an emphasis on machine learning, or a closely related field. To qualify, candidates must possess a Ph.D. or terminal degree before their employment starts at Penn State.

  • A strong record of publications in high-impact journals or premier peer-reviewed international conferences.

  • Prior experience in conducting interdisciplinary/multidisciplinary research is a plus.

 

About the position:

The successful candidate will be designated as a Postdoctoral Scholar at the College of Information Sciences and Technology (IST) of The Pennsylvania State University. The initial term of the position is for one year, with the possibility of renewal upon performance and fund availability. The scholar will be engaged in two interdisciplinary projects funded by the National Science Foundation, receiving mentorship from Professors James Wang (IST), Brad Wyble (Psychology), and Charles Anderson (Biology). The scholar will collaborate with highly motivated and talented graduate students and benefit from strong career development support, which includes training in teaching, grant proposal writing, and other collaborative work. Qualified candidates will have the ability to teach in IST after successfully completing one semester with approval from college leadership.

 

To apply:

  • Please submit a CV, research statement (max 3 pages), and other pertinent documents in a single PDF document with the application.

  • Deadline: February 29, 2024, for full consideration. Late applications are accepted but given secondary priority.

  • Only shortlisted candidates will be contacted to provide reference letters.

  • For inquiries, please email with the subject line “postdoc” to Professor James Wang at jwang@ist.psu.edu or visit the lab website http://wang.ist.psu.edu.

 

COMMITMENT TO DIVERSITY:

The College of IST is strongly committed to a diverse community and to providing a welcoming and inclusive environment for faculty, staff and students of all races, genders, and backgrounds. The College of IST is committed to making good faith efforts to recruit, hire, retain, and promote qualified individuals from underrepresented minority groups including women, persons of color, diverse gender identities, individuals with disabilities, and veterans. We invite applicants to address their engagement in or commitment to inclusion, equity, and diversity issues as they relate to broadening participation in the disciplines represented in the college as well as aligning with the mission of the College of IST in a separate statement.

 

CAMPUS SECURITY CRIME STATISTICS:

Pursuant to the Jeanne Clery Disclosure of Campus Security Policy and Campus Crime Statistics Act and the Pennsylvania Act of 1988, Penn State publishes a combined Annual Security and Annual Fire Safety Report (ASR). The ASR includes crime statistics and institutional policies concerning campus security, such as those concerning alcohol and drug use, crime prevention, the reporting of crimes, sexual assault, and other matters. The ASR is available for review here.

 

Employment with the University will require successful completion of background check(s) in accordance with University policies. 

 

EEO IS THE LAW

Penn State is an equal opportunity, affirmative action employer, and is committed to providing employment opportunities to all qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. If you are unable to use our online application process due to an impairment or disability, please contact 814-865-1473.

 
Back  Top

6-20(2023-12-07) Post-doc @ Université du Mans, Le Mans, France

Offre Post-doc – Linguistique / linguistique computationnelle 

 

Durée :            9 mois

Début :            janvier ou février 2024, un début au mois de mars 2024 est négociable

Lieu :               LIUM – Le Mans Université

Salaire net :     environ 2 000 €/mois, variable selon les compétences

Contact :         jane.wottawa@univ-lemans.fr, richard.dufour@univ-nantes.fr

Candidature :  Lettre de motivation, CV (3 pages maximum)

 

 

Dans le cadre du projet DIETS qui s’intéresse particulièrement aux métriques d’évaluation de systèmes automatiques de reconnaissance de la parole, une position post-doc est prévue pour 

a)     Mener une analyse linguistique et grammaticale sur les erreurs de sorties de systèmes automatiques de reconnaissance de la parole

b)    Mener des tests d’évaluation humaine en fonction de différents types d’erreurs 

c)     Comparer les choix des tests d’évaluation avec les évaluations faites par des métriques automatiques

d)    Publication des résultats (conférences, journaux)

 

 

Le projet DIETS

 

L'un des problèmes majeurs des mesures d'évaluation du traitement des langues est qu'elles sont conçues pour mesurer globalement une solution proposée par rapport à une référence considérée, l'objectif principal étant de pouvoir comparer les systèmes entre eux. Le choix des mesures d'évaluation utilisées est très souvent crucial puisque les recherches entreprises pour améliorer ces systèmes sont basées sur ces mesures. Alors que les systèmes automatiques, comme la transcription de la parole, s'adressent à des utilisateurs finaux, ils sont finalement peu étudiés : l'impact de ces erreurs automatiques sur les humains, et la manière dont elles sont perçues au niveau cognitif, n'ont pas été étudiés, puis finalement intégrés dans le processus d'évaluation.

 

Le projet DIETS, financé par l'Agence Nationale de la Recherche (2021-2024) et porté par le Laboratoire Informatique d'Avignon, propose de se focaliser sur la problématique du diagnostic/évaluation des systèmes de reconnaissance automatique de la parole (RAP) de bout en bout, basés sur des architectures de réseaux de neurones profonds, en intégrant la réception humaine des erreurs de transcription d'un point de vue cognitif. Le défi est ici double :

 

    1) Analyser finement les erreurs de RAP à partir d'une réception humaine.

 

    2) Comprendre et détecter comment ces erreurs se manifestent dans un cadre ASR de bout en bout, dont le travail est inspiré par le fonctionnement du cerveau humain.

 

Le projet DIETS vise à repousser les limites actuelles concernant la compréhension des systèmes ASR de bout en bout, et à initier de nouvelles recherches intégrant une approche transversale (informatique, linguistique, sciences cognitives...) en replaçant l'humain au centre du développement des systèmes automatiques.

 

 

Compétences requises 

 

L’offre de poste requiert les compétences suivantes : une bonne maîtrise de l’orthographe et de la grammaire française nécessaires pour catégoriser d’une manière informée les erreurs de différents systèmes de transcription et des compétences numériques puisqu’il faudrait récupérer les données à partir d’un serveur. Une formation en linguistique ou linguistique computationnelle est souhaitée. 

Une expérience dans l’organisation, la réalisation et l’analyse de tests comportementaux est un plus. 

 

Lieu d’accueil 

 

La structure d’accueil est le LIUM, laboratoire d’informatique de Le Mans Université situé au Mans. Une présence régulière au laboratoire est requise tout au long du Post-doc. Le LIUM est composé de deux équipes. Le post-doc se déroulera dans l’équipe LST qui développe ses activités de recherche dans le domaine du traitement automatique des langues naturelles sous forme de texte et de parole. Elle travaille avec des approches guidées par les données mais l'équipe est également spécialisée dans le deep learningappliqué au traitement des langues. L’équipe est actuellement composée d’une chargée de projets, de 11 enseignants-chercheurs (informaticiens, acousticiens, linguistes), de 4 chercheurs-doctorants et de deux masterants apprentis. 

 

 
Back  Top

6-21(2023-12-07) Stages (M1, M2, PFE ingénieur) @ IRIT, Toulouse, France

L’équipe SAMoVA de l’IRIT à Toulouse propose plusieurs stages (M1, M2, PFE ingénieur) en 2024 autour des thématiques suivantes (liste non exhaustive) :

 

- Génération Automatique De Partitions Musicales Dans Le Style Choro

- Compréhension De La Parole Et IA Au Service De L’Analyse Sensorielle
- Caractérisation Du Comportement Alimentaire Par Des Analyses Vidéo Et Multimodale
- Adaptations De Systèmes De Reconnaissance Automatique De Parole En Contexte Pathologique
- Traitement De Signal Et IA Pour Révéler Des Troubles Articulatoires En Production De Parole Atypique
- End-To-End Speech Recognition For Assessing Comprehension Skills Of Children Learning To Read
- Active Learning For Speaker Diarization
- Modélisation Automatique Du Rythme De La Parole
- Transcription de Verbalisations pour l’Analyse du Discours lors de Scénarios en Réalité Virtuelle
- Mise en œuvre d’un prototype de reconnaissance vocale comparative appliqué à l’apprentissage du langage oral
 
Tous les détails (sujets, contacts) sont disponibles dans la section 'Jobs' de l’équipe :
https://www.irit.fr/SAMOVA/site/jobs/
Back  Top

6-22(2023-12-07) Stage INA, Bry-sur-Marne, France

Nous proposons un stage de recherche (Bac+5) au service recherche de l'Institut National de l'Audiovisuel (INA). Le stage porte sur la détection de l'activité vocale dans des corpus audiovisuels à l'aide de représentations auto-supervisées.
Vous trouverez ci-joint l'offre de stage détaillée.

D'autres stages sont également proposés au sein de l'INA, l'ensemble des sujets peuvent être retrouvés sur la page suivante : https://www.ina.fr/institut-national-audiovisuel/equipe-recherche/stages.

 

Détection de l'activité vocale dans des corpus audiovisuels à l'aide de représentations auto-supervisées Stage de fin d’études d’Ingénieur ou de Master 2 – Année académique 2023-2024

 

Mots clés : deep learning, machine learning, self supervised models, voice activity detection, speech activity detection, wav2vec 2.0 Contexte L’Institut National de l’Audiovisuel (INA) est un établissement public à caractère industriel et commercial (EPIC), dont la mission principale consiste à sauvegarder et promouvoir le patrimoine audiovisuel français à travers la vente d’archives et la gestion du dépôt légal. À ce titre, l’Institut capte en continu 180 chaînes de télévision et radio et stocke plus de 25 millions d’heures de contenu audiovisuel. L’INA assure également des missions de formation, de production et de recherche scientifique. Le service de la recherche de l’INA mène depuis plus de 20 ans des travaux de recherche dans le domaine de l’indexation et de la description automatique de ces fonds selon l’ensemble des modalités : textes, sons et images. Le service participe à de nombreux projets collaboratifs de recherche que ce soit dans un cadre national et européen et accueille des stages de Master ainsi que des doctorants en co-encadrement avec des laboratoires nationaux d’excellence. Ce stage est proposé au sein de l’équipe de recherche (https://recherche.ina.fr) et se place dans le cadre d’un projet collaboratif financé par l’ANR : Gender Equality Monitor (GEM). D’autres sujets de stage sont également proposés dans l’équipe : https://www.ina.fr/institut-national-audiovisuel/equipe-recherche/stages

Objectifs du stage La détection d’activité vocale (Voice Activity Detection - VAD) est une tâche d’analyse audio qui vise à identifier les portions d’enregistrement contenant de la parole humaine, les distinguant des autres parties du signal contenant du silence, des bruits de fond ou de la musique. Souvent considérée comme un prétraitement, cette méthode utilisée en amont des tâches de reconnaissance automatique de la parole, des locuteurs ou des émotions. Si les outils VAD existants permettent d’obtenir d’excellents résultats sur les programmes d’information ou les émissions de plateau [Dou18a, Bre23], les recherches récentes menées à l’INA ont révélé que les performances des systèmes état-de-l’art sont moindres pour un grand nombre de matériaux peu représentés dans les corpus de parole annotés. Ces contenus, qui ont fait l’objet d’une campagne d’annotation interne, incluent des émissions musicales, des dessins animés, du sport, des fictions, des jeux télévisés et des documentaires. L'objectif du stage est de développer des modèles de détection d'activité vocale (VAD) en adoptant une approche fondée sur le paradigme d'apprentissage auto-supervisé et s’appuyant sur les architectures transformerstelles que wav2vec 2.0 [Bae20]. Les modèles basés sur ces architectures permettent d’obtenir des résultats état de l'art sur de nombreuses tâches de traitement de la parole à l’aide de quantités d’exemples annotés limitées : transcription, compréhension, traduction, détection d'émotions, reconnaissance de locuteur, détection du langage, etc [Li22, Huh23, Par23]. Plusieurs études récentes ont démontré l’efficacité des approches auto-supervisées pour la VAD [Gim21, Kun23], mais ont à ce jour été entraînées et évaluées sur des données ne reflétant pas la diversité des contenus audiovisuels. Le stage proposé vise à exploiter les millions d'heures de contenu audiovisuel conservés à l’INA pour l'entraînement et l’amélioration des modèles. Les modèles réalisés seront intégrés au logiciel open-source inaSpeechSegmenter, utilisé entre autres pour le décompte du temps de parole des femmes et des hommes dans les programmes à des fins de recherche ou de régulation du paysage audiovisuel [Dou18b, Arc23].

Valorisation du stage Différentes stratégies de valorisation des travaux seront envisagées, en fonction de leur degré de maturité et des orientations envisagées pour la suite des travaux :

● Diffusion des modèles réalisés sous licence open-source sur HuggingFace et/ou le dépôt Github de l’INA : https://github.com/ina-foss

● Rédaction de publications scientifiques

Conditions du stage Le stage se déroulera sur une période de 4 à 6 mois, au sein du service de la Recherche de l’Ina. Il aura lieu sur le site Bry 2, situé au 28 Avenue des frères Lumière, 94360 Bry-sur-Marne.La·le stagiaire sera encadré·e par Valentin Pelloin et David Doukhan. Un ordinateur équipé d’un GPU sera fourni ainsi qu’un accès au cluster de calcul de l’Institut. Gratification : 760 € brut / mois + 50 % pass navigo

Télétravail : possible une journée par semaine

Contact Pour soumettre votre candidature à ce stage, ou pour solliciter davantage d’informations, nous vous invitons à envoyer votre CV et votre lettre de motivation par e-mail aux adresses suivantes : vpelloin@ina.fr et ddoukhan@ina.fr. Profil recherché ● Étudiant·e en dernière année d’un bac +5 dans le domaine de l’informatique et de l'IA

● Forte appétence pour la recherche académique

● Intérêt pour le traitement automatique de la parole

● Maîtrise de Python et expérience dans l’utilisation de bibliothèques de ML

● Capacité à effectuer des recherches bibliographiques ● Rigueur, Synthèse, Autonomie, Capacité à travailler en équipe

Bibliographie

[Arc23] ARCOM (2023). “La représentation des femmes à la télévision et à la radio - Rapport sur l'exercice 2022” [en ligne].

[Bae20] A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” Neural Information Processing Systems, Jun. 2020.

[Bre23] Bredin, H. (2023). pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe, in INTERSPEECH 2023, ISCA, pp. 1983–1987.

[Dou18a] Doukhan, D., Carrive, J., Vallet, F., Larcher, A., & Meignier, S. (2018, April). An open-source speaker gender detection framework for monitoring gender equality. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5214-5218). IEEE.

[Dou18b] Doukhan, D., Poels, G., Rezgui, Z., & Carrive, J. (2018). Describing gender equality in french audiovisual streams with a deep learning approach. VIEW Journal of European Television History and Culture, 7(14), 103-122.

[Gim21] P. Gimeno, A. Ortega, A. Miguel, and E. Lleida, “Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021,” in Interspeech 2021, ISCA, Aug. 2021, pp. 4359–4363.

[Huh23] Huh, J., Brown, A., Jung, J. W., Chung, J. S., Nagrani, A., Garcia-Romero, D., & Zisserman, A. (2023). Voxsrc 2022: The fourth voxceleb speaker recognition challenge. arXiv preprint arXiv:2302.10248.

[Kun23] M. Kunešová and Z. Zajíc, “Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2023, pp. 1–5.

[Li22] Li, M., Xia, Y., & Lin, F. (2022, December). Incorporating VAD into ASR System by Multi-task Learning. In 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 160-164). IEEE.

[Par23] Parcollet, T., Nguyen, H., Evain, S., Boito, M. Z., Pupier, A., Mdhaffar, S., ... & Besacier, L. (2023). LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech. arXiv preprint arXiv:2309.05472.

Back  Top

6-23(2023-12-11) Postdoctoral researcher, Aalto University, Finland

Postdoctoral researcher in Speech Recognition and Language Modelling

 

The speech recognition group at Aalto University, Finland, focuses on new machine learning methods in automatic speech recognition (ASR) and language modelling. The group started to develop state-of-the-art unlimited vocabulary ASR systems already in the 1980's and 90’s led by Academician Prof. Kohonen. Since 2000 led by Prof. Mikko Kurimo, the group has done pioneering work in unlimited vocabulary language modelling using unsupervised subword units. One of the top achievements is winning the 3rd Multi-Genre Broadcast ASR challenge, where the top research groups in the field were challenged to build a recognizer for an under-resourced language using machine learning methods. The group has also won several ComParE paralinguistics challenges in 2022 and 2023. Recently, the group prepared and released two new large-scale open speech datasets for training and benchmarking ASR systems: nearly 4000 hrs of transcribed audiovisual data on Parliament sessions 2008-2020 and another nearly 4000 hrs of spontaneous speech from over 250 000 voluntary donations by Finnish speakers: https://www.kielipankki.fi/donate-speech/ and https://www.kielipankki.fi/corpora/fi-parliament-asr/

 

The speech recognition group led by Prof. Kurimo consists of 2 research fellows, a postdoc and 8 PhD students that bring together expertise from speech and language processing, deep learning and toolkits such as Kaldi, PyTorch and SpeechBrain. We are working on a wide variety of topics, ranging from core ASR and LLMs to their applications. We operate in a well-connected academic environment using excellent GPU and CPU computing facilities (including access to Europe’s fastest supercomputer LUMI) and have well equipped office space at Aalto University Otaniemi campus that is only 10 minutes subway connection away from downtown Helsinki.

 

We are now looking for a postdoc for 1-3 years to start in early 2024 on any of these projects:

- large-scale ASR for Finnish based on a combination of self-supervised pre-training for transformers, supervised fine-tuning with up to 5000 hours of transcribed speech data on general topics and fine-tuning with a small amount task-specific speech data and language models

- multimodal ASR using various forms of attention to speech, audio and video

- ASR for games in second language learning and assessment

 

The position requires a relevant doctoral degree in CS or EE, skills for doing excellent research in an (English-speaking) group, and outstanding research experience in at least one of the research themes and programming at one of the toolkits mentioned above. The candidate is expected to perform high-quality research and participate in the supervision of talented MSc and PhD students. The application, CV, list of publications, references and requests for further information should be sent by email to Prof. Mikko Kurimo (mikko.kurimo at aalto.fi). The DL for applications is January 31, but I will process the applications as soon as they come, and may recruit before the DL.

 

Aalto University is a new university created in 2010 from the merger of the Helsinki University of Technology, Helsinki School of Economics and the University of Art and Design Helsinki. The University’s cornerstones are its strengths in education and research, with 20,000 basic degree and graduate students. In addition to a decent salary, the contract includes occupational health benefits, and Finland has a comprehensive social security system. The Helsinki Metropolitan area forms a world-class information technology hub, attracting leading scientists and researchers in various fields of ICT and related disciplines. Moreover, as the birthplace of Linux, and the home base of Nokia Bell Labs, F-Secure, Rovio, Supercell, Slush (the biggest annual startup event in Europe) and numerous other technologies and innovations, Helsinki is fast becoming one of the leading technology startup hubs in Europe. See more e.g. at http://www.investinfinland.fi/. As a living and working environment, Finland consistently ranks high in quality of life, and Helsinki, the capital of Finland, is regularly ranked as one of the most livable cities in the world. See more at https://finland.fi

 

--

Back  Top

6-24(2023-12-15) PhD candidate in speech sciences, Université de Mons, Belgium

 

Metrology and Language Sciences Department, Phonetics Laboratory, Faculty of Psychology and Educational Sciences, UMONS, Belgium

_____________________________________________________________________

 

The Metrology and Language Sciences Department (web.umons.ac.be/smsl/) of the University of Mons is looking for candidates to take up a post of PhD candidate (M/F) from January 29, 2024.

 

CANDIDATE PROFILE (M/F) :

 

- Entry level: 'Bac +5' (master 300 ECTS credits) at least ;

- Initial training allowing access to doctoral studies organised by the Faculty of Psychology and Educational Sciences (Psychology, Educational Sciences, Speech Therapy, Linguistics) or by the Faculty of Medicine (in particular: ENT and neurology);

- Solid skills in the field of speech and language sciences, as well as in statistical data processing and research methodology;

- Good command of scientific English (oral and written); sufficient command of French;

- Good teamwork skills, creativity, autonomy, rigour, scientific curiosity;

- Additional assets: programming skills (knowledge of a language such as Python or R), clinical experience with patients with motor speech disorders, possession of a driving licence and a private vehicle.

 

JOB PROFILE :

 

The post holder (M/F) will contribute to the Department's research efforts in the area covered by the ARC EvalDY project described below. He/she will be preparing a doctoral thesis related to this project. They may be required to play a minor role in the department's teaching supervision activities.

 

Full-time research grant for a period of three years, renewable in one-year increments, with a starting date of 29 January 2024 at the earliest.

 

RECRUITMENT PROCEDURE:

 

Interested candidates are requested to submit, by January 5, 2024 at the latest, an application including :

- a letter of motivation

- a curriculum vitae (including e-mail address and contact telephone number),

- transcripts of each year of higher education,

- any other relevant documents,

all in a single pdf file sent to the following address: veronique.delvaux@umons.ac.be

 

After an initial assessment of applications based on the application file, a sub-set of candidates will be selected for a second phase involving a selection interview. Successful candidates will be notified by e-mail and/or telephone. Interviews will take place on January 12, 2024.

 

 

PROJECT: Evaluation of voice and speech disorders in dysarthria: EvalDy

 

The general aim of the project is to contribute to the characterisation and assessment of voice and speech disorders in dysarthria. The objective assessment (via acoustic and articulatory measurements) of pathological speech production is a rapidly expanding field of research, particularly in the French-speaking world, and there are many challenges to be met.

 

In the first phase, the project aims to document the speech production of a large number of French-speaking Belgian dysarthric patients, both men and women, with diverse profiles in terms of the type of dysarthria and associated aetiology (Parkinson's disease, Wilson's disease, Huntington's disease, Friedreich's ataxia, multiple sclerosis, amyotrophic lateral sclerosis, Kennedy's disease, dysarthria after stroke or head trauma) and the degree of severity of the dysarthria (mild, moderate, severe).

 

The acoustic recordings concern all the participants, who will be asked to produce the 8 modules of the MonPaGe 2.0.s protocol (repetition of pseudowords, intelligibility task, pneumo-phonatory module, reading of text, spontaneous speech, production of verbal diadocokinesis, automatic series and sentences with varied prosodic contours), to which 3 additional modules will be added (specifically targeting nasal phenomena, glides and phonetic flexibility skills). Several sub-groups of participants will be invited to carry out some of the modules in an experimental setting that will enable acoustic measurements to be combined with physiological measurements in order to study certain specific phenomena (acoustics and nasometry for nasality; acoustics, electroglottography and aerodynamics for coordination between the laryngeal and supra-laryngeal systems; acoustics and ultrasound imaging for articulatory precision; acoustics and imaging by nasofibroscopy and stroboscopy for voice quality).

Analysis of this large data set, in particular analysis of the relationships between acoustic and articulatory measurements, will aim to reduce the multiple acoustic measurements to a smaller number of reliable, robust indicators that can be used to characterise all the dimensions of dysarthric speech: laryngeal functioning, pneumo-phonatory behaviour (including intensity control), fluency, articulatory precision and gestural coordination, organisation of the vowel system, and aptitude for phonetic flexibility.

 

In a second phase, the project aims to use the acoustic indicators thus isolated to develop (i.e. design, operationalise, then assess the psychometric qualities and finally adapt) several assessment tools, each of which will be dedicated to meeting a more precise objective, defined either in relation to a research question or to a need identified in clinical practice.

 

The first objective concerns the sub-clinical signs of dysarthria in Parkinson's disease, and the possibility of using certain acoustic indices such as vocal biomarkers to assist clinicians in the early diagnosis of the disease. The second objective is to contribute to differential diagnosis, using a tool for acoustic assessment of speech production to distinguish between different subtypes of dysarthria, as well as between dysarthria and apraxia of speech. The third clinical objective concerns the temporal dynamics of the disease, viewed from an intra-individual perspective. The aim is to propose a tool that is suitable for longitudinal monitoring of dysarthric patients, once the diagnosis has been made. The fourth objective relates to a fundamental research question, that of characterising the evolution of dysarthria as a function of the degree of severity in the context of the retrogenesis hypothesis. The fifth objective concerns intelligibility. The aim is to produce a tool for assessing the intelligibility of dysarthric speech, which can be used in future work on the link between intelligibility, communicative efficiency and quality of life in dysarthric patients.

 

Prof. Véronique Delvaux, PhD

Chercheur qualifié FNRS à l'UMONS

Chargée de cours UMONS & ULB

Service de Métrologie et Sciences du Langage SMSL

Institut de Recherche en Sciences et Technologies du Langage IRSTL

Local –1.7, Place du Parc, 18, 7000 Mons

+3265373140

https://web.umons.ac.be/smsl/veronique_delvaux/

https://trends.levif.be/canal-z/entreprendre/z-science-14-06-23/

 

Back  Top

6-25(2023-12-18) Senior Project Manager @ELDA, Paris, France

The European Language Resources Distribution Agency (ELDA), a company specialized in Human Language Technologies within an international context, is currently seeking to fill an immediate vacancy for a permanent Senior Project Manager position, specialised in Speech Technologies.

Under the supervision of the CEO, the Senior Project Manager, specialised in Speech Technologies will be in charge of conducting the activities related to the production of language resources and the co-ordination of R&D projects. Their responsibilities include language resources design/specification, production frameworks and platforms setup, quality control and assessment, project-dedicated team members recruitment and management. They will also contribute to improving or updating of the current language resources production workflows. This yields excellent opportunities for qualified, creative, and motivated candidates wishing to participate actively in the Language Engineering field.

The position is based in Paris (13th).

Required profile:

  • PhD in computer science specialised in speech technologies. A proven background in research (scientific publications) will be a strong plus
  • At least 3 years of experience in speech technologies (speech recognition, synthesis, language modelling) and the well-used tools to produce and collect data, and assess quality
  • Ability to experiment with various techniques for improving or building tools (eg., transcription and annotation tools)
  • Contribution to international projects
  • Good knowledge of Linux and open source software
  • Proficiency in Python programming language
  • Good knowledge of scripting languages: bash, R, Perl
  • Experience and ability to supervise members of a multidisciplinary team
  • Dynamic and communicative, flexible to combine and work on different tasks
  • Proficiency in English with ability to write user guides, administration documentation and reports, and good mastering of French. Knowledge of other languages would be a plus.
  • Citizenship (or residency papers) of a European Union country

Salary: Commensurate with qualifications and experience (between 40-50K€).
Other benefits: complementary health insurance and meal vouchers.

About
ELDA is an SME established in 1995 to promote the development and exploitation of Language Resources (LRs). Language Resources include all data necessary for language engineering, such as monolingual and multilingual lexica, text corpora, speech databases and terminology. ELDA’s role is to produce LRs, to collect and to validate them and, foremost, make them available to users in compliance with applicable regulations and ethical requirements.

For further information about ELDA, visit: http://www.elda.org

Applicants should email a cover letter addressing the points listed above together with a curriculum vitae to:

ELDA
9, rue des Cordelières
75013 Paris
FRANCE
Email: job@elda.org 

Back  Top

6-26(2023-12-23) Ph position @ Saarland University, Germany


: Machine Learning for Natural Languages and other séquence data, Université de la Sarre, Allemagne

=================================================== = ======= ====

(Informatique, Linguistique computationnelle, Physique ou similaire)

Le groupe de recherche s'efforce de mieux comprendre comment les méthodes modernes d'apprentissage en profondeur peuvent être appliquées aux langues naturelles ou à d'autres documents séquentiels. Nos récentes réalisations incluent le prix du meilleur article au COLING 2022 et le prix du meilleur article thématique à l'ACL 2023. Nous proposons un poste de doctorat ouvert par sujet et qui devrait être fortement axé sur l'application de techniques d'apprentissage automatique. aux données en langage naturel ou à d'autres données de séquence (par exemple, représentations sous forme de chaînes de composés chimiques).

Le candidat idéal pour le poste aurait :

   1. Excellente connaissance du machine learning et du deep learning

   2. Excellentes compétences en programmation

   3. Master en informatique, linguistique informatique, physique ou similaire

Salaire : Le poste de doctorat sera à 75% à temps plein sur l'échelle allemande E13 (TV-L), soit environ 3144€ par mois avant impôts et cotisations sociales. Les nominations seront d'une durée de trois ans avec une prolongation possible à 50%.

À propos du département : Le département des sciences et technologies du langage est l'un des principaux départements dans le domaine de la parole et du langage en Europe. Le projet phare du moment est le CRC sur la densité de l'information et l'encodage linguistique. Elle gère également un nombre important de projets financés au niveau européen et national. Au total, il compte sept professeurs et une cinquantaine de chercheurs postdoctoraux et doctorants. Le département fait partie du Campus informatique de la Sarre. Avec 900 chercheurs, deux instituts Max Planck et le Centre allemand de recherche sur l'intelligence artificielle, c'est l'un des principaux sites informatiques en Allemagne et en Europe.

Comment postuler : Veuillez nous envoyer une lettre de motivation, un plan de recherche (une page maximum), votre CV, vos relevés de notes, si disponible une liste de publications, ainsi que les noms et coordonnées d'au moins deux références, sous forme d'un seul PDF. ou un lien vers un PDF si la taille du fichier est supérieure à 5 Mo.

Veuillez postuler au plus tard le 20 novembre 2023. Les candidatures antérieures sont les bienvenues et seront traitées au fur et à mesure de leur arrivée.

Contact : Les candidatures et toute autre demande concernant le projet doivent être envoyées à

 

--

Back  Top

6-27(2024-01-06) PhD Research Assistant @ DFKI Berlin, Germany
PhD Research Assistant for Multimodal Fake-News and Disinformation Detection at DFKI Berlin

The German Research Center for Artificial Intelligence (DFKI) has operated as a non-profit, Public-Private-Partnership (PPP) since 1988. DFKI combines scientific excellence and commercially-oriented value creation with social awareness and is recognized as a major 'Center of Excellence' by the international scientific community. In the field of artificial intelligence, DFKI as Germany’s biggest public and independent organisation dedicated to AI research and development, has focused on the goal of human-centric AI for more than 30 years. Research is committed to essential, future-oriented areas of application and socially relevant topics.

We are looking for a highly motivated research assistant to work on a project focused on fake-news and disinformation detection from speech and multimedia data. Content authenticity verification of speech combined with other modalities like text, visuals or meta-data will be a center part. In any case, xAI and bias analysis are aspects of high relevance to the position as well. 


The successful candidate will work closely with high-impact partners in this field, e.g. Technical University of Berlin, RBB (Berlin TV and news broadcaster), Deutsche Welle (Germany's broadcaster abroad), and 5 other partners. 

Responsibilities will include developing and testing different AI/NLP techniques, analyzing the performance of machine learning models in the context of applicable fake-news and disinformation fighting for journalists, and communicating project progress and results to relevant stakeholders. The position offers opportunities for pursuing a doctorate and publishing research results in scientific journals and conferences.

Qualified candidates will have a completed university degree in (technical) computer science or computational linguistics, excellent programming skills in Python, and a strong background in machine learning/AI and signal processing or NLP. Previous experience in the field of fake-news or spoofing / authenticity detection of multimedia data is an advantage.

DFKI offers an agile and lively international and interdisciplinary environment for working in a self-determined manner. If you are interested in contributing to cutting-edge research and working with a dynamic team, please apply!

Application deadline: Jan 23, 2024.

In terms of questions please don’t hesitate to contact tim.polzehl@dfki.de  
Back  Top

6-28(2024-01-07) PhD position @ Laboratoire Bordelais de Recherche en Informatique (LaBRI), Talence, France
Dans le cadre du projet PEPR Santé numérique 'Autonom-Health' (Health, behaviors and autonomous digital technologies), le groupe de recherche en parole et langage du Laboratoire Bordelais de Recherche en Informatique (LaBRI) recherche des candidats pour un poste de doctorant entièrement financé (36 mois).

Le projet 'Autonom-Health' est un projet collaboratif sur la santé numérique entre SANPSY, LaBRI, LORIA, ISIR et LIRIS. Le résumé du projet 'Autonom-Health' peut être trouvé à la fin de cet e-mail. Les missions qui seront abordées par les candidats retenus figurent parmi ces tâches, en fonction du profil du candidat :
- Tâches de collecte de données de parole :
- Définition de scénarios pour la collecte de discours spontanés à l'aide d'Agents Sociaux Interactifs (SIAs).
- Collecte d'interactions patient/médecin lors d'entretiens cliniques.
- Tâches liées à la reconnaissance automatique de parole :
- Évaluer et améliorer les performances de notre système ASR end2end ESPNET sur des données réelles en français spontané enregistrées à partir de sujets sains et de patients.
- Adaptation du système ASR au domaine des entretiens cliniques.
- Transcription phonétique automatique / alignement à l'aide d'architectures end2end.
- Adapter les transcriptions pour les utiliser avec les outils d'analyse sémantique développés au LORIA.
- Tâches d'analyse de la parole :
- Analyse des biomarqueurs vocaux pour différentes maladies : adaptation de nos biomarqueurs définis pour la somnolence, recherche de nouveaux biomarqueurs ciblés pour des maladies spécifiques.

Le poste sera hébergé au LaBRI, mais en fonction du profil du candidat, une collaboration étroite est attendue avec l'équipe 'Sémagramme' du LORIA (contact : Maxime Amblard maxime.amblard@loria.fr).
 
Salaire brut : environ 2044 €/mois 
Date de début : octobre 2024 
Qualifications requises : Master en traitement du signal / analyse de la parole / informatique 
Compétences : Programmation Python, apprentissage statistique (apprentissage machine, apprentissage profond), traitement automatique des signaux/de la parole, excellente maîtrise du français (interactions avec des patients et des cliniciens français), bon niveau d'anglais scientifique. 
Savoir-faire : Familiarité avec la boîte à outils ESPNET et/ou les outils d'apprentissage profond, connaissance de la conception de systèmes de traitement automatique de la parole. 
Compétences sociales : Bonne capacité à s'intégrer dans des équipes multidisciplinaires, capacité à communiquer avec des non-experts.
Candidatures : Pour postuler, veuillez envoyer par e-mail à jean-luc.rouas@labri.fr un seul fichier PDF contenant un CV complet, une lettre de motivation (décrivant vos qualifications, vos intérêts de recherche et votre motivation pour postuler), les coordonnées de deux référents et des copies de diplômes et relevés de notes (Master, Licence).


---

In the framework of the PEPR Santé numérique “Autonom-Health” project (Health, behaviors and autonomous digital technologies), the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) and the LORIA (Nancy, France) are looking for candidates for a fully funded PhD position (36 months).  

 
The « Autonom-Health » project is a collaborative project on digital health between SANPSY, LaBRI, LORIA, ISIR and LIRIS.  The abstract of the « Autonom-Health » project can be found at the end of this email.  
 
The missions that will be addressed by the retained candidates are among these tasks, according to the profile of the candidate: 
- Data collection tasks:
- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)
- Collection of patient/doctor interactions during clinical interviews
- ASR-related tasks
- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,
- Adaptation of the ASR system to clinical interviews domain,
- Automatic phonetic transcription / alignment using end2end architectures
- Adapting ASR transcripts to be used with semantic analysis tools developed at LORIA
- Speech analysis tasks
- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.

The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the LORIA teams : « Multispeech » (contact: Emmanuel Vincent emmanuel.vincent@inria.fr) and/or the « Sémagramme » (contact: Maxime Amblard maxime.amblard@loria.fr).

Gross salary: approx. 2044 €/month 
Starting date: October 2023
Required qualifications: Master in Signal processing / speech analysis / computer science 
Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, excellent command of French (interactions with French patients and clinicians), good level of scientific English. 
Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design. 
Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.

Applications: 
To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV, cover letter (describing your personal qualifications, research interests and motivation for applying), contact information of two referees and academic certificates (Master, Bachelor certificates).


—— 
Abstract of the « Autonom-Health » project:


Western populations face an increase of longevity which mechanically increases the number of chronic disease patients to manage. Current healthcare strategies will not allow to maintain a high level of care with a controlled cost in the future and E health can optimize the management and costs of our health care systems. Healthy behaviors contribute to prevention and optimization of chronic diseases management, but their implementation is still a major challenge. Digital technologies could help their implementation through numeric behavioral medicine programs to be developed in complement (and not substitution) to the existing care in order to focus human interventions on the most severe cases demanding medical interventions. 
 
However, to do so, we need to develop digital technologies which should be: i) Ecological (related to real-life and real-time behavior of individuals and to social/environmental constraints); ii) Preventive (from healthy subjects to patients); iii)  Personalized (at initiation and adapted over the course of treatment) ; iv) Longitudinal (implemented over long periods of time) ; v) Interoperated (multiscale, multimodal and high-frequency); vi) Highly acceptable (protecting users’ privacy and generating trustability).

The above-mentioned challenges will be disentangled with the following specific goals: Goal 1: Implement large-scale diagnostic evaluations (clinical and biomarkers) and behavioral interventions (physical activities, sleep hygiene, nutrition, therapeutic education, cognitive behavioral therapies...) on healthy subjects and chronic disease patients.  This will require new autonomous digital technologies (i.e. virtual Socially Interactive Agents SIAs, smartphones, wearable sensors). Goal 2:  Optimize clinical phenotyping by collecting and analyzing non-intrusive data (i.e. voice, geolocalisation, body motion, smartphone footprints, ...) which will potentially complement clinical data and biomarkers data from patient cohorts. Goal 3: Better understand psychological, economical and socio-cultural factors driving acceptance and engagement with the autonomous digital technologies and the proposed numeric behavioral interventions. Goal 4:  Improve interaction modalities of digital technologies to personalize and optimize long-term engagement of users. Goal 5: Organize large scale data collection, storage and interoperability with existing and new data sets (i.e, biobanks, hospital patients cohorts and epidemiological cohorts) to generate future multidimensional predictive models for diagnosis and treatment.

Each goal will be addressed by expert teams through complementary work-packages developed sequentially or in parallel. A first modeling phase (based on development and experimental testings), will be performed through this project. A second phase funded via ANR calls will allow to recruit new teams for large scale testing phase.

This project will rely on population-based interventions in existing numeric cohorts (i.e KANOPEE) where virtual agents interact with patients at home on a regular basis. Pilot hospital departments will also be involved for data management supervised by information and decision systems coordinating autonomous digital Cognitive Behavioral interventions based on our virtual agents. The global solution based on empathic Human-Computer Interactions will help targeting, diagnose and treat subjects suffering from dysfunctional behavioral (i.e. sleep deprivation, substance use...) but also sleep and mental disorders. The expected benefits from such a solution will be an increased adherence to treatment, a strong self-empowerment to improve autonomy and finally a reduction of long-term risks for the subjects and patients using this system. Our program should massively improve healthcare systems and allow strong technological transfer to information systems / digital health companies and the pharma industry.
 

Back  Top

6-29(2024-01-11) PhD student @ LPL, Aix-Marseille University, France

PhD position – Language Sciences

 

We are looking for a PhD student to work on the PROSOLANG project (“using PROSOdy to improve foreign LANGuage learning”) in the fields of phonetics, language teaching and psycholinguistics. The position is at the Laboratoire Parole et Langage of Aix Marseille Université in Aix-en-Provence. The project aims to provide computer-based cognitive training programs in the context of English phonetics classes to help francophone learners to overcome the difficulties they have with the perception of non-native melodic cues. The person working on this project will be responsible for designing, carrying out and testing the training programs as well as disseminating the results.

 

This 3-year position is fully funded by the A*midex fundation. The PhD student will be co-supervised by Amandine Michelas and Sophie Herment and will closely interact with Sophie Dufour as well as the S2S research team and the prosody group of the Laboratoire Parole et Langage.

The planned start date is 1 September 2024.

 

Applicants must hold a master’s degree in English, language sciences, cognitive science or psychology or a related discipline at the beginning of the PhD. The candidate must have a strong and documented interest in phonetics and language teaching. Previous experience with English teaching is advantageous but is not a prerequisite. Previous experience in scientific research in a laboratory is also a plus. Fluency in oral and written English and French is required.

 

Interested candidates are requested to submit, by March 31,, 2024 at the latest, an application including:

- a letter of motivation

- a curriculum vitae (including e-mail address and contact telephone number),

- any other relevant documents,

all in a single pdf file sent to the following address: amandine.michelas@univ-amu.fr.

 

After an initial assessment of applications based on the application file, a sub-set of candidates will be selected for a second phase involving a selection interview. Interviews will take place during the month of April 2024. Successful candidates will be notified by e-mail and/or telephone.

For any questions or further information, please email amandine.michelas@univ-amu.fr.

 

Doctorant en Sciences du langage

 

Nous recherchons un doctorant pour travailler sur le projet PROSOLANG (« utiliser la PROSOdie pour améliorer l'apprentissage des langues étrangères ») dans les domaines de la phonétique, de l'enseignement des langues et de la psycholinguistique. Le poste est à pourvoir au Laboratoire Parole et Langage d'Aix Marseille Université à Aix-en-Provence. Le projet vise à fournir des programmes d'entraînement cognitifs informatisés dans le cadre de cours de phonétique anglaise pour aider les apprenants francophones à surmonter les difficultés qu'ils ont avec la perception d’indices mélodiques non-natifs. La personne travaillant sur ce projet sera responsable de la conception, de la réalisation et du test des programmes d’entraînement ainsi que de la diffusion des résultats.

 

Ce poste, d'une durée de 3 ans, est entièrement financé par la fondation A*midex. Le doctorant sera co-encadré par Amandine Michelas et Sophie Herment et interagira étroitement avec Sophie Dufour ainsi qu'avec l'équipe de recherche S2S et le groupe de prosodie du Laboratoire Parole et Langage.

La date de début prévue est le 1er septembre 2024.

 

Les candidats devront être titulaires d’un master de recherche en anglais, en sciences du langage, en sciences cognitives ou en psychologie ou dans une discipline connexe au début du doctorat. Le candidat doit avoir un intérêt fort et documenté pour la phonétique et l’enseignement des langues. Une expérience antérieure dans l'enseignement de l'anglais est un avantage mais n'est pas une condition préalable. Une expérience préalable en recherche scientifique en laboratoire est également un plus. La maîtrise de l'anglais et du français à l'oral et à l'écrit est nécessaire.

 

Les candidats intéressés sont priés de soumettre, pour le 31/03/2024 au plus tard, un dossier de candidature comprenant :

- une lettre de motivation

- un curriculum vitae (comprenant l'adresse e-mail et le numéro de téléphone de contact),

- tout autre document pertinent,

le tout dans un seul fichier pdf envoyé à l'adresse suivante : amandine.michelas@univ-amu.fr.

 

Après une première évaluation des candidatures sur la base du dossier de candidature, un sous-ensemble de candidats sera sélectionné pour une deuxième phase comprenant un entretien de sélection. Les entretiens auront lieu durant le mois d’avril 2014. Les candidats retenus seront informés par e-mail et/ou par téléphone.

 

Si vous avez des questions, n'hésitez pas à nous contacter à l’adresse suivante : amandine.michelas@univ-amu.fr.

 

 

Back  Top

6-30(2024-01-15) Full-time (100%) Research Assistant / Ph.D. Student position, Bielefeld University, Germany
The Social Cognitive Systems Group at Bielefeld University is seeking applications for a 
 
** Full-time (100%) Research Assistant / Ph.D. Student position **
 
to work in a newly established project on multimodal creativity in AI-based co-speech gesture 
generation. The project is part of a newly established Collaborative Research Center (CRC 
1646) on “Linguistic Creativity in Communication” funded by the German Research Agency 
(DFG) for 4 years. The goal of the project is to investigate how co-speech gestures are employed 
to support both speaker and listener when new linguistic constructions are invented to solve
a challenging situation in communication (e.g. referring to an entity for which no conventionalized
term is available and ordinary language productivity does not suffice). It will be carried out by the
Social Cognitive Systems Group (Prof. Stefan Kopp) in collaboration with the Psycholinguistics 
Group (Prof. Joana Cholin) at Bielefeld University, and will encompass both experimental studies 
with human speakers as well as the development of computational models (using machine learning
techniques) of speech-gesture use in such situations. 
 
The announced position will be working for the computational part under supervision of Prof. 
Stefan Kopp. The main task will be to extend the currently popular data-based accounts that predict
gestures from (prosodic and textual) information in a given speech input, to models that are able
to generate novel gestures that (1) meet communicative demands that are not met by the given 
simultaneous speech or (2) mark and support the use of non-conventionalized creative language. 
We will build on the group’s long-standing previous work on cognitive and linguistic models of
speech-gesture generation, as well as deep machine learning-based  accounts of speech-driven 
gesture synthesis. In addition, the research assistant/PhD student will carry out interdisciplinary 
work with the psycholinguistic part of the project. 
 
The duration of the position is about 3,5 years (until end of 2027). Salary is 100% TVL-E13 scale
(about 4.000,- EUR per month before taxes, depending on relevant work experience).
 
Bielefeld is the vibrant center of the region of East Westphalia and Germany’s greenest big city 
with a lot of cultural, entertainment, and recreational opportunities. It is located in the center of
Germany, surrounded by beautiful forests, and connected to Germany’s high-speed rail system. 
Bielefeld University is a strong research-oriented university with more than 20.000 students and
a famous commitment to interdisciplinary research. It hosts major research centers such as the
Center for Cognitive Interaction Technology (CITEC) or the Center for Interdisciplinary Research 
(ZiF).
 
Application deadline is 25th of January, but later applications will be considered too until the
position has been filled. 
 
If you are interested to learn more about the position, please get in contact with Stefan Kopp
 
For information on how to apply please refer to:
Back  Top

6-31(2024-01-20) Internship in ANR Project Revitalise, Telecom-Paris, France

ANR Project «REVITALISE»

Automatic speech analysis of public talks.

Description. Today, humanity has reached a stage at which extremely important aspects (such as information exchange) are tied not only to actual so-called hard skills, but also to soft skills. One such important skill is public speaking. Like many forms of interaction between people, the assessment of public speaking depends on many factors (often subjectively perceived). The goal of our project is to create an automatic system which can take into account these different factors and evaluate the quality of the performance. This requires understanding which elements can be assessed objectively and which vary depending on the listener [Hemamou, Wortwein, Chollet21]. For such an analysis, it is necessary to analyze public speaking at various levels: high-level (audio, video, text), intermediate (voice monotony, auto-gestures, speech structure, and etc.) and low-level (fundamental frequency, action units, POS / tags, and etc.) [Barkar]. This internship offers an opportunity to analyze the audio component of a public speech. The student is asked to solve two main problems. The engineering task is to create an automatic speech transcription system that detects speech disfluency. To do this, it is proposed to collect a bibliography on this topic and come up with an engineering solution. The second, research task, is to use audio cues to automatically analyze the success of a performance of a talk. This internship will give you an opportunity to solve an engineering problem as well as learn more about research approaches. By the end you will have expertise in audio processing as well and machine learning methods for multimodal analysis. If the internship is successfully completed, an article may be published. PhD position fundings on Social Computing will be accessible in the team at the end of the internship (at INRIA).

Registration & Organisation.

Name of organization: Institut Polytechnique de Paris, Telecom-Paris Website of organization: https://www.telecom-paris.fr Department: IDS/LTCI/ Address: Palaiseau, France

Supervision.

Supervision will include weekly meetings with the main supervisor and regular meetings (every 2-3 weeks) with co-supervisors. Telecom-Paris, 2023-2024 ANR Project «REVITALISE» Name of supervisor: Alisa BARKAR Name of co-supervisor: Chloe Clavel, Mathieu Chollet, Béatrice BIANCARDI Contact details: alisa.barkar@telecom-paris.fr

Duration & Planning.

The internship is planned as a 5-6 month full-time internship for the spring semester 2024. 6 months considers 24 weeks within which it will be covered following list of activities:

● ACTIVITY 1(A1): Problem description and integration to the working environment

● ACTIVITY 2(A2): Bibliography overview

● ACTIVITY 3(A3): Implementation of the automatic transcription with detected discrepancies

● ACTIVITY 4(A4): Evaluation of the automatic transcription

● ACTIVITY 5(A5): Application of the developed methods to the existing data

● ACTIVITY 6(A6): Analysis of the importance of para-verbal features for the performance perception

● ACTIVITY 7(A7): Writing the report

Selected references of the team.

1. [Hemamou] L. Hemamou, G. Felhi, V. Vandenbussche, J.-C. Martin, C. Clavel, HireNet: a Hierarchical Attention Model for the Automatic Analysis of Asynchronous Video Job Interviews. in AAAI 2019, to appear

2. [Ben-Youssef] Atef Ben-Youssef, Chloé Clavel, Slim Essid, Miriam Bilac, Marine Chamoux, and Angelica Lim. Ue-hri: a new dataset for the study of user engagement in spontaneous human-robot interactions. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, pages 464–472. ACM, 2017.

3. [Wortwein] Torsten Wörtwein, Mathieu Chollet, Boris Schauerte, Louis-Philippe Morency, Rainer Stiefelhagen, and Stefan Scherer. 2015. Multimodal Public Speaking Performance Assessment. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (ICMI '15). Association for Computing Machinery, New York, NY, USA, 43–50.

4. [Chollet21] Chollet, M., Marsella, S., & Scherer, S. (2021). Training public speaking with virtual social interactions: effectiveness of real-time feedback and delayed feedback. Journal on Multimodal User Interfaces, 1-13.

5. [Barkar] Alisa Barkar, Mathieu Chollet, Beatrice Biancardi, and Chloe Clavel. 2023. Insights Into the Importance of Linguistic Textual Features on the Persuasiveness of Public Speaking. In Companion Publication of the 25th International Conference on Multimodal Interaction (ICMI '23 Companion). Association for Computing Machinery, New York, NY, USA, 51–55. https://doi.org/10.1145/3610661.3617161

 

Other references.

1. Dinkar, T., Vasilescu, I., Pelachaud, C. and Clavel, C., 2020, May. How confident are you? Exploring the role of fillers in the automatic prediction of a speaker’s confidence. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8104-8108). IEEE.

2. Whisper: Robust Speech Recognition via Large-Scale Weak Supervision, Radford A. et al., 2022, url: https://arxiv.org/abs/2212.04356

3. Romana, Amrit and Kazuhito Koishida. “Toward A Multimodal Approach for Disfluency Detection and Categorization.” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023): 1-5.

4. Radhakrishnan, Srijith et al. “Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition.” ArXiv abs/2310.06434 (2023): n. pag.

5. Wu, Xiao-lan et al. “Explanations for Automatic Speech Recognition.” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023): 1-5.

6. Min, Zeping and Jinbo Wang. “Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study.” ArXiv abs/2307.06530 (2023): n. pag.

7. Ouhnini, Ahmed et al. “Towards an Automatic Speech-to-Text Transcription System: Amazigh Language.” International Journal of Advanced Computer Science and Applications (2023): n. pag.

8. Bigi, Brigitte. “SPPAS: a tool for the phonetic segmentations of Speech.” (2023).

9. Rekesh, Dima et al. “Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition.” ArXiv abs/2305.05084 (2023): n. pag.

10. Arisoy, Ebru et al. “Bidirectional recurrent neural network language models for automatic speech recognition.” 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015): 5421-5425.

11. Padmanabhan, Jayashree and Melvin Johnson. “Machine Learning in Automatic Speech Recognition: A Survey.” IETE Technical Review 32 (2015): 240 - 251.

12. Berard, Alexandre et al. “End-to-End Automatic Speech Translation of Audiobooks.” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018): 6224-6228.

13. Kheir, Yassine El et al. “Automatic Pronunciation Assessment - A Review.” ArXiv abs/2310.13974 (2023): n. pag. Telecom-Paris, 2023-2024

Back  Top

6-32(2024-01-20) Professeur d'informatique, GETALP, Université de Grenoble,France

Un poste de PR en informatique (section 27) sera ouvert en 2024 à l'Université Grenoble
Alpes. La personne retenue intégrera le Laboratoire d'Informatique de Grenoble où le
GETALP (https://lig-getalp.imag.fr) est une équipe d'accueil.

Le profil est en cours de finalisation, mais les mots-clés associés seront :

IA générative, IA symbolique, bases de connaissances avec un focus important sur les
grands modèles de langages.

Le poste sera rattaché à l'UFR Informatique, mathématiques et mathématiques appliquées
(IM2AG -- https://im2ag.univ-grenoble-alpes.fr/).

N'hésitez pas à me contacter pour obtenir plus d'informations sur le profil recherche et
pour étudier une intégration potentielle au GETALP.

Ce message peut tout à fait être redistribué dans vos réseaux.

Bonne journée,

François Portet

--

Back  Top

6-33(2024-01-24) Enquêteur/Enquêtrice spécialisé(e) en analyse audio avec une compétence en traitement automatique de la parole et de l’audio (H/F), BEA, Paris, France

 

Bureau d'enquêt et d'analyse pour la sécurité de l'aviation civile (BEA)

Poste à pourvoir : Enquêteur/Enquêtrice spécialisé(e) en analyse audio avec une compétence en traitement automatique de la parole et de l’audio (H/F)

Titre du poste : Enquêteur spécialisé « AUDIO»

Niveau recherché : Doctorat / M2 / Diplôme d’Ingénieur

Localisation : BEA, Aéroport du Bourget

 

En France, le BEA est l’autorité responsable des enquêtes de sécurité dans l’aviation civile. Il participe également à de nombreuses enquêtes conduites à l’étranger. L'enquête de sécurité a pour seul objet de prévenir les accidents et les incidents en aviation civile. Elle comprend la collecte et l'analyse de renseignements, l'exposé des conclusions, y compris la détermination des causes et/ou des facteurs contributifs et, s'il y a lieu, l'établissement de recommandations de sécurité. Créé en 1946, le BEA relève du ministère en charge des transports.

 

Description de la mission proposée

Dans le cadre de l’exploitation des informations factuelles, nous recherchons un·e enquêteur/enquêtrice spécialisé·e en analyse de l’audio avec une compétence forte dans les domaines du traitement automatique de la parole et de l’audio. Votre mission combinera la partipation aux enquêtes de sécurité et au développement d’outils pour le laboratoire audio du BEA. Au sein du département Technique , en votre qualité d’enquêteur spécialisé, vous participerez aux enquêtes de sécurité du BEA, suite aux accidents et incidents survenus à des aéronefs civils en France ou à l'étranger en tant qu’enquêteur spécialisé dans l’exploitation des données issues d’enregistrements audio et vidéo. Fort de vos compétences complémentaires, vous serez également en charge du maintien des moyens informatiques, de la conduite de projets de développement existants et de nouveaux projets pour les besoins spécifiques du laboratoire d’analyses audio. Ces besoins relèvent de problématiques nécessitant de maintenir et d’établir des collaborations avec des partenaires universitaires et industriels autour de thématiques telles que la transcription automatique de la parole (Français et Anglais), la segmentation audio, l’évaluation objective de la qualité audio, l’analyse bas niveau de la parole et l’identification automatique de transitoires. Vos activités principales seront:

- Lecture d’enregistreurs de vol (« boites noires ») et pilotage de séances d’écoutes,

- Analyse acoustique des enregistrements audio comprenant l’identification des signatures acoustiques et la comparaison avec des échantillons sonores,

- Réalisation de transcriptions des communications pour les enquêtes de sécurité,

- Rédaction de documents techniques et participation à l'élaboration de rapports d'enquêtes,

- Enrichissement et valorisation de la base de données audio du laboratoire,

- Participation au maintien et à l’évolution des outils de détection et de localisation des balises acoustiques sous-marines,

- Maintien et amélioration des outils informatiques du laboratoire d’analyses audio,

- Animation du réseau scientifique du laboratoire audio,

- Écriture et gestion de projets de recherche en collaboration avec des partenaires académiques.

Vous participerez aux travaux d’analyse des enregistrements CVR (Cockpit Voice Recorder) dans le cadre d’enquêtes majeures en France et à l’étranger. À ce titre, vous devrez savoir faire preuve de rigueur, de discrétion, et appliquer de manière stricte les règles de confidentialité du BEA. Vous devrez être rapidement autonome pour devenir l’un ou l’une des référent·es techniques du département dans le domaine de l’analyse des données audio. Vous participerez également aux consultations industrielles dans le cadre du développement d’outils d’analyse audio-vidéo dédiés aux bureaux d’enquêtes. Vous serez amené·e à vous déplacer ponctuellement avec un préavis court en France et à l’étranger dans le cadre d’accident.

Profil idéal

Diplômé·e d'une formation Bac+5 ou Bac+8 avec une spécialisation et une expérience dans les techniques modernes de traitement automatique de la parole et de l’audio. Vous présentez les compétences suivantes :

- Traitement de la parole et analyse de la voix,

- Pilotage de projets en lien avec la recherche académique,

- Programmation (Python, langage objet ou similaire) et Machine Learning/IA,

- Traitement du signal, filtrage de données,

- Analyse spectrale à l’aide de logiciels dédiés,

- Connaissances de base en électronique numérique et analogique,

- Qualités rédactionnelles en anglais et en français,

- Acuité auditive adaptée au poste,

- Maîtrise de l’Anglais (lu, écrit et parlé).

Vous serez accompagné·e dans l’acquisition de certaines de ces compétences par le biais de formations. Vous êtes dynamique, rigoureux·se, curieux·se et créatif·ve. Vous désirez contribuer aux développements des compétences et des moyens techniques du laboratoire d’analyses audio du BEA. Vous savez travailler en équipe avec de bonnes qualités relationnelles et d’adaptation vous permettant de piloter des activités dans un contexte international. Votre esprit d'analyse et de synthèse vous permettent de communiquer vos résultats de manière efficace.

Candidature

Vous êtes invité·e à transmettre votre candidature constituée de votre CV ainsi qu’une lettre de motivation et tout document (recommandation) permettant de soutenir votre candidature par courriel à : recrutement-tec@bea.aero

Back  Top

6-34(2024-01-21) Maitre de conference à IRIT,Université de Toulouse 3, Paul Sabatier, France
Un poste de MCF CNU27 va être publié sur Galaxie dans les jours à venir.
 
L'équipe de recherche IRIS de l'IRIT encourage les candidatures et invite les personnes intéressées à présenter leur travail en séminaire en ce début d'année.
 
Le profil est proche de :
    - Recherche : grands modèles de langage (LLM), production et accès à l'information
    - Enseignement : algorithmique, programmation, bases de données (cf. le programme national)
 
Ce poste MCF27 est affecté à l'Université Toulouse 3 - Paul Sabatier :
    - Recherche : IRIT, département Gestion de données, équipe IRIS (d'autres équipes sont également associées)
    - Enseignement : composante IUT, département informatique (où j'enseigne)
 

L'IRIT et l'IUT sont distants de 10 minutes à pied, sur un campus vert bordé par le Canal du Midi et connecté à la ville par métro.

 
Pour plus de renseignement, contacter Guillaume Cabanac 
Back  Top

6-35(2024-02-01) Several academic positions @ LORIA, Nancy, France

7 postes de MCF et 2 postes de PR section 27 sont ouverts à l’Université de Lorraine affectés au Loria. Parmi les thématiques fléchées, le traitement automatique de la parole et des langues occupe une place majeure en recherche via les équipes du département D4 du Loria et en enseignement via le Master TAL d'IDMC et la future formation de niveau Licence en TAL d'IDMC qui ouvrira à la rentrée 2024.

Les candidats et candidates sont fortement invité·es à prendre contact avec le laboratoire et les composantes d’enseignement.

2 postes PR à l'IDMC et à l'IUT de Metz : Ouverts à toutes les équipes du Loria.

- 2 postes MCF à la Faculté des Sciences et Technologies : Ouverts aux équipes des départements D1 « Algorithmique, calcul, image et géométrie », D2 « Méthodes formelles » et D3 « Réseaux, systèmes et services ».

1 poste MCF à l'IDMC : Ouvert aux équipes des départements des départements D3 « Réseaux, systèmes et services », D4 « Traitement automatique des langues et des connaissances » et D5 « Systèmes complexes, intelligence artificielle et robotique ».

- 2 postes MCF aux Mines de Nancy et à Polytech Nancy : Profil recherche en sécurité et sûreté des systèmes informatiques.

1 poste MCF à l’INSPÉ : Profil recherche en Intelligence Artificielle dans les équipes du département D4 « Traitement automatique des langues et des connaissances » ou dans l’équipe BIRD du département D5 « Systèmes complexes, intelligence artificielle et robotique ».

1 poste MCF à Telecom Nancy : Profil recherche en Intelligence Artificielle et Data Science.

Plus d'informations sur https://www.loria.fr/fr/emplois/

Back  Top

6-36(2024-02-02) Four-year funded PhD studentships at the University of Edinburgh, UK

Four-year funded PhD studentships in Designing Responsible Natural Language Processing at the University of Edinburgh

 

The UKRI AI Centre for Doctoral Training (CDT) in Responsible and Trustworthy in-the-world Natural Language Processing (NLP) is inviting applications for fully-funded PhD studentships starting in September 2024 for our new Designing Responsible NLP integrated PhD training programme.

 

Natural Language Processing (NLP) is an area of AI operating at the intersections of computer science, linguistics, and interaction design that has rapidly jumped from the research lab to routine deployment in-the-world. Mature NLP systems offer powerful capabilities to create new products, services, and interactive experiences grounded in natural language, and underpin much of the current excitement around generative AI. However, they also bring significant challenges to responsible and trustworthy design, adoption and deployment.

 

Our students will gain the skills, knowledge and experience to study and design real-world applications of NLP that are responsible and trustworthy by design, in a highly interdisciplinary training environment hosted by the new Edinburgh Futures Institute. The training programme brings together world leading researchers at the University of Edinburgh in informatics, design, linguistics, speech science, psychology, law, philosophy, information science, and digital humanities, who will supervise students and guide them in their training and learning.

 

The CDT will be seeking to fund up to 12 studentships to start next academic year. We are looking for applicants with background in or related to:

•              Computer science, informatics and artificial intelligence

•              Design, human computer interaction and human centred computing

•              Language, linguistics and speech sciences

•              Law, governance and regulation

•              Digital Humanities and Information Science

 

These are just indicative, and we are interested in applicants who come from any background or discipline with relevant skills and expertise that connect to our five Training Areas. Our ambition is to recruit a diverse cohort of students coming from different disciplines and backgrounds, who are excited by the prospects of working with each-other and on real-world applications of NLP.

 

The deadline for applications is midnight (GMT) 11th March 2024.

 

To find out more information on the programme, funding available and its benefits take a look at the CDT website here: https://www.responsiblenlp.org/

 

Detail on how to apply can be found here: https://www.responsiblenlp.org/application-documents/

 

You can also register for our applicant webinars on the 12th and 13 February here: https://www.responsiblenlp.org/applicant-webinars/  - more dates will likely be added for later in February as well.

Back  Top

6-37(2024-02-05) 4 postes de doctorat, INRIA, Paris
Dans le cadre de l'équipe parole d'Inria Défense & Sécurité, nous proposons quatre postes de postdoctorat/jeunes chercheurs en parole. Les liens sont donnés en fin de ce message.
Le cœur de l'équipe est située dans les locaux d'Inria-Paris (proche de la gare de Lyon et devant déménager sous peu place d'Italie). 
Il est cependant possible d'être rattaché dans différents centres Inria sur le territoire, où l'équipe est déjà présente.
 
Les travaux sont réalisés dans le cadre d'une collaboration avec le Ministère des Armées, autour de la thématique du traitemement de l'information dans le renseignement.
Ils sont basés sur des données publiques, dans une logique de science ouverte permettant une publication aisée des résultats.
 
N'hésitez pas à me contacter pour plus d'information sur Inria Défense&Sécurité, sur l'équipe parole et/ou sur les postes proposés.
 
Cordialement
JF Bonastre
PS1 : les offres sont doublées pour correspondre à l’ancienneté des candidates et candidats. D'autres offres sont prévues d'ici à Septembre.
PS2 : Plusieurs offres de doctorat seront proposées sous peu, soit en collaboration avec d'autres équipes académiques soit directement dans Inria Défense&Sécurité (n'hsitez pas à me contacter dès maintenant).
PS3 : Des accueils dans le cadre de stages de master sont également possibles.
 
 
--
______________________________________________________________________
Jean-Francois BONASTRE
Directeur de Recherche, Inria Défense&Sécurité
Membre associé du LIA et Professeur des Universités, Avignon Université
Membre honoraire de l'IUF
Tel: +33/0 490843514
@jfbonastre
Back  Top

6-38(2024-02-10) Maitre de conférences IA, (LIUM) Université du Mans, France

Poste de maitre de conférences à l'Université du Mans

**Enseignement**

Les enseignements de la personne recrutée pourront avoir lieu :
- En licence d’informatique (et double licence mathématiques-informatique) avec les matières fondamentales de l’informatique telles que l’algorithmique et la programmation, les bases de données, la programmation web, le génie logiciel, les réseaux, etc.
- En licence dans d’autres départements de l’UFR Sciences et techniques, pour l’initiation à la programmation
- Dans le master d’informatique, parcours Intelligence Artificielle, où l’on trouve des modules sur des sujets comme les méthodes probabilistes et neuronales, les infrastructures logicielles et matérielles pour l’IA, les techniques de gestion et d’exploitation des big data, le cloud computing, le traitement automatique des langues écrites et orales.


La personne recrutée sera amenée à s’investir dans l’élaboration et l’encadrement des formations.

L’université propose une décharge d’enseignement de 96h qui peut être étalée sur deux ou trois ans pour favoriser l’intégration de la personne recrutée en lui permettant de développer son projet de recherche et de préparer ses enseignements.

Département d’enseignement : Informatique
Lieu(x) d’exercice :  Le Mans
Nom directrice dépt : Dominique Py
Tel. directrice dépt : 02 43 83 38 55
Mél directrice dépt : Dominique.Py@univ-lemans.fr
URL : https://sciences.univ-lemans.fr/fr/filieres/informatique.html
 
Contact pour le profil enseignement : Mme Dominique Py (directrice du département) - 02 43 83 38 55 - Dominique.Py@univ-lemans.fr

**Recherche**

Le Laboratoire d’Informatique de l’Université du Mans (LIUM) recherche un·e maître de conférences pour intégrer l’équipe Language and Speech Technology (LST).
Dans ce cadre thématique, l’équipe a développé des axes de recherches transversaux que nous souhaitons renforcer :
- L’interprétabilité et l’explicabilité des systèmes ;
- L’apprentissage multi-modal (texte, voix, image, graphes) ;
- L’apprentissage de représentations ;
- L’apprentissage en continu (lifelong learning) ;
- Le traitement automatique de corpus multilingues et de langues peu dotées ;
- La création de corpus et de méthodes d’évaluations.

Les travaux de l’équipe LST ont vocation à considérer plusieurs tâches applicatives liées au Traitement Automatique du Langage Naturel (TALN) et de La Parole (TALP) :
- La transcription de la parole et caractérisation du locuteur ;
- La détection d’opinions ;
- La traduction automatique (du texte vers le texte, ou de l’oral vers l’oral) ;
- L’extraction de connaissances issues du langage écrit ou oral ;
- La segmentation thématique de corpus textuels et audios ;
- La synthèse de la parole.

L’équipe souhaite recruter un⋅e candidat⋅e avec une forte expertise en apprentissage automatique et une motivation pour les thématiques de l’équipe qui lui permettront de travailler avec les membres de l’équipe LST.

Les critères prioritaires qui permettront d’évaluer les candidatures en recherche, dès la sélection des dossiers : - la présence d’un projet de recherche personnalisé adapté aux thématiques de l’équipe ;
- les compétences et les expériences de recherche ;
- la liste des publications ;
- la capacité à s’investir, conduire et élaborer des activités collaboratives, notamment dans le cadre de projets de différentes natures (européen, ANR, contrat de recherche industriel, thèse CIFRE, etc.).

Les candidat⋅e⋅s sont invité⋅e⋅s à prendre contact avec le responsable de l’équipe LST pour pouvoir personnaliser leur projet de recherche au regard des travaux en cours dans l’équipe.

Contact pour le profil recherche : M. Antoine Laurent (responsable équipe LST) – 02 43 83 38 30 - antoine.laurent@univ-lemans.fr

Lieu(x) d’exercice : Le Mans
Nom directeur labo : Sébastien George
Tel. directeur labo :  02 43 59 49 16
Email directeur labo : sebastien.george@univ-lemans.fr
URL labo : http://lium.univ-lemans.fr


**Descriptif du laboratoire**

Le Laboratoire d'Informatique de l'Université du Mans (LIUM), créé il y a environ quarante ans, regroupe la plupart des enseignants-chercheurs en informatique de l’Université du Mans. Il comprend actuellement vingt-huit enseignants-chercheur⋅e⋅s, vingt doctorant⋅e⋅s et chercheur⋅e⋅s non permanents. Le LIUM est composé de deux équipes :
- une équipe de spécialisés en Environnements informatiques pour l'apprentissage humain (Ingénierie des EIAH),
- une équipe de spécialisés en traitement du langage (Language and Speech Technology — LST).

Le laboratoire et l’équipe proposent des conditions de travail avantageuses :
- Un cluster de calcul à disposition en plus de l’accès aux clusters régional et national ;
- Une équipe jeune et dynamique avec de nombreux recrutements récents ;
- Une équipe et un laboratoire à taille humaine dont le pilotage se fait en bonne intelligence ;
- Une dynamique de projets (ANR, Europe, JSALT) et de collaborations avec les entreprises (Meta, MMA, SNCF, Airbus, Orange, etc).

--
Back  Top

6-39(2024-02-15) 2 Postdoctoral Researchers in Multimodal Interaction @Trinity College Dublin, Ireland.

2 Postdoctoral Researchers in Multimodal Interaction wanted in Trinity College Dublin, Ireland.

 

Two postdoc positions are available in the lab of Prof. Naomi Harte at Trinity College Dublin in Ireland. The positions are both in multimodal interaction and are part of a larger project which is a multidisciplinary exploration of speech-based interaction. One postdoc is focussed on understanding the nature of multimodal interaction (https://www.adaptcentre.ie/careers/postdoctoral-researcher-in-understanding-multimodal-interaction-eenh_rf01/). That person will have a background in an area like psycholinguistics/cognitive science/linguistics or similar. The second post is more looking for an engineer with experience in ASR or conversational analysis. They will be developing better neural architectures to exploit a deeper understanding of multimodality in speech (https://www.adaptcentre.ie/careers/postdoctoral-researcher-in-audio-visual-neural-architecture-eenh_rf02-2/). Both are 3-year posts. Ideally the posts would both begin in June 2024, but some flexibility may be possible.

 

Prof. Harte can be contacted by email (nharte@tcd.ie) if you want additional details about the post after reading the above links, but the formal application is via the links above.

 

Back  Top

6-40(2024-02-21) Academic positions at Avignon University, Avignon, France
Vous trouverez ci-dessous les profils de postes ouverts à Avignon Université en section 27 pour la rentrée 2024 :
 
- 1 poste de MCF en section 27 rattaché à l’IUT département sciences des données côté enseignement et au LIA côté recherche (intégration possible à l’équipe «Speech and Language Group » )
 
- 3 postes d’ATER - enseignement en licence et master informatique, recherche au LIA  (intégration possible à l’équipe «Speech and Language Group » )
 
 
N’hésitez pas à contacter les personnes concernées (information de contact dans les fiches de postes),
Back  Top

6-41(2024-02-25) Professeur en informatique (parole, IA) Université de Grenoble, France

Un poste de PR en informatique (section 27) sera ouvert en 2024 à l'Université Grenoble
Alpes. La personne retenue intégrera le Laboratoire d'Informatique de Grenoble où le
GETALP (https://lig-getalp.imag.fr) est une équipe d'accueil.

Le profil du poste PR n°253 est disponible ici  :
https://emploi.univ-grenoble-alpes.fr/concours/enseignants-chercheurs-/postes-pr-2024-1197762.kjsp?RH=1633530438110

IA générative, IA symbolique, bases de connaissances avec un focus important sur les
grands modèles de langages.

Le poste sera rattaché à l'UFR Informatique, mathématiques et mathématiques appliquées
(IM2AG -- https://im2ag.univ-grenoble-alpes.fr/).

N'hésitez pas à me contacter pour obtenir plus d'informations sur le profil recherche et
pour étudier une intégration potentielle au GETALP.

Back  Top

6-42(2024-03-10) PhD position @ CEREMA, Strasbourg, France

Proposition d'une thèse 2024-2027

 

Diagnostiquer l’acoustique d’une salle grâce au traitement du signal et à l’apprentissage automatique

 

mots clefs : Acoustique – Bâtiment – Apprentissage Automatisé – Méthodes inverses

Contexte : Les nuisances sonores sont citées comme première source de gêne par les populations et constituent un enjeu sanitaire et social important, contribuant notamment au stress, aux déficits d'attention en classe, ou aux acouphènes. La gêne est souvent liée à la mauvaise qualité acoustique de la salle due à une réverbération trop importante (cantine, piscine, crèche…). Dans le cadre de la réhabilitation acoustique des salles, la proposition d’une solution nécessite une bonne connaissance des caractéristiques géométriques et acoustiques de l’existant (dimension de la salle, absorption et diffusion de ses différents revêtements). Pour estimer ces paramètres inconnus, les acousticiens de terrain s’appuient sur des mesures du champ sonore combinées à des connaissances géométriques et acoustiques a priori du lieu et du dispositif utilisé (sources et microphones). L’estimation est typiquement effectuée par calage manuel et itératif des paramètres d’entrées de modèles acoustiques analytiques ou numériques sur les mesures. Le processus complet d’un diagnostic est donc long, coûteux et parfois imprécis selon les modèles utilisés. Face à ce constat, le développement de méthodes dites inverses permettant de remonter automatiquement aux paramètres acoustiques d’intérêt à partir de mesures audio seules constituerait une percée majeure pour l’acoustique du bâtiment, ouvrant la voie au développement d’outils plus simples, plus rapides et plus fiables à destination des acousticiens. Objectif : L’objectif de la thèse est le développement d’un système qui, à partir d’un nombre réduit de mesures acoustiques (ex : des réponses impulsionnelles « RI ») et de caractéristiques de salle connues (ex : ses dimensions approximatives), puisse automatiquement retrouver dans les mesures, les autres caractéristiques alors inconnues ayant induit le champ sonore (ex : absorption et diffusion des parois, puissance de la source …). Cette thèse vise à obtenir des percées méthodologiques sur ces problèmes inverses ouverts et difficiles, en combinant des approches novatrices issues des domaines du traitement du signal et de l’apprentissage automatique. Elle débloquera trois verrous clés.

Verrou 1 : Nos premiers travaux ont permis de développer des méthodes inverses d’optimisation permettant, pour des conditions idéalisées, d’estimer l’absorption des parois d’une salle de géométrie supposée connue [1] ou, à l’inverse, d’estimer la géométrie d’une pièce aux parois idéales [2]. Le verrou majeur reste la généralisation à des cas plus réalistes intégrant : une modélisation fine de la réponse des équipements et des propriétés des parois (dépendance de la fréquence et de l'angle d'incidence, diffusion acoustique...), et l'incertitude sur la géométrie.

Verrou 2 : Nos travaux actuels se scindent en deux approches : celles guidées par les données annotées simulées, consistant à apprendre un réseau de neurones (ex : [3]), et celles guidées par la physique, résolvant un problème inverse d’optimisation reposant sur un modèle acoustique idéalisé (ex : [1,2]). Un verrou important consiste à les hybrider. Cela passera par le renforcement du réalisme des simulateurs de RIs et des modèles acoustiques théoriques, l’utilisation possible de techniques auto-supervisées sur données nonannotées [4] et de techniques d’unrolling corrigeant les modèles physiques sous-jacents par apprentissage [5].

Verrou 3 : Le dernier reste celui du passage des RIs simulées au RIs réellement mesurées qui nécessitera d’adapter les méthodes d’apprentissage et d’optimisation issues des deux verrous précédents.

[1] S. Dilungana, A. Deleforge, C. Foy, S. Faisan, Geometry-Informed estimation of surface absorption profiles from impulses responses, Eusipco, 30th European Signal Processing Conference, Belgrade, Serbia, 2022.

[2] T. Sprunck, Y. Privat, C. Foy, A. Deleforge, Gridless 3D Recovery if Images Sources from Room Impulse Responses, preprint, 2022.

[3] S. Dilungana, A. Deleforge, C. Foy, and S. Faisan, Learning-based estimation of individual absorption profiles from a single room impulse response with known positions of source, sensor and surfaces. In INTER-NOISE and NOISE-CON Congress and Conference Proceedings, vol. 263, No. 1, pp. 5623-5630.

[4] A. Jaiswal, AR. Babu, MZ Zadeh, D. Banerjee, F. Makedon, A survey on contrastive self-supervised learning. Technologies. 2020, 28;9(1):2.

[5] V. Monga, Y. Li and YC. Eldar, Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. IEEE Signal Processing Magazine. 2021 Feb 25;38(2):18-44.

Apects pratiques : Le doctorant sera encadré par Antoine Deleforge (équipe TONUS*, Inria-Strasbourg), Sylvain Faisan (ICube**, Télécom-Physique Strasbourg) et Cédric Foy (UMRAE*** - Cerema de Strasbourg). Il sera physiquement au Cerema de Strasbourg (11 rue Jean Mentelin) mais pourra être amené à se déplacer ponctuellement pour se rendre aux deux autres laboratoires. * https://www.inria.fr/fr/tonus,** https://icube.unistra.fr/, *** https://www.umrae.fr/ Contacts : antoine.deleforge@inria.fr ou cedric.foy@cerema.fr

Back  Top

6-43(2024-03-19) Research position In Speech Synthesis Massive generation of TTS for Deepfake detection @IRISA, Lannion, France

 

CONTEXT

Expression team from IRISA is hiring an engineer in Computer Science on a full-time 12 months contract (may be extended). Expression research team is at the heart of the AI revolution as it studies and generates Human language using different modalities, i.e. Text, Speech and Sign. In particular, the team participates to a project targeting the development and evaluation of deep fake speech detection systems. To this end, we have to implement a large variety of speech synthesis systems, including voice cloning systems and voice conversion systems. The engineer will work on the massive generation of synthesized speech in the context of deepfake detection.

Team webpage: https://www-expression.irisa.fr/fr/

JOB DESCRIPTION

Mission: Developpment of speech synthesis systems including a large variety of technologies, including voice cloning and voice conversion systems :

• Data preparation for different languages ;

• Set up a global framework for Text-To-Speech synthesis (TTS) ;

• Implement different TTS systems for different languages ;

• Contribute to the developement of deep fake detection systems ; Environnement: The recruited person will be integrated to the research team and will collaborate with the partner company.

Required diploma: PhD in Computer Science, Master in Machine Learning or Master in Speech and Language Processing

Required skills: Software engineering (C++, Python) ; Machine learning methods and tools (Tensorflow, PyTorch, Keras) ; Automatic Speech and Language Processing ; CI/CD.

GENERAL INFO

Where: IRISA Lab in Lannion, France

When: As soon as possible (May 2024)

Duration: 12 months (may be extended) Salary: Depending on experience

Contacts: damien.lolive@irisa.fr, arnaud.delhay@irisa.fr, vincent.barreaud@irisa.fr

Back  Top

6-44(2024-03-21) PhD position @ LIUM, University Le Mans, France

Title: Optimizing Human Intervention for Synthetic Speech Quality Evaluation: Active Learning for Adaptability

Keywords: Active Learning, Synthetic Speech Quality Evaluation, Subjective Quality Modeling, Training Set Design for Domain Adaptation

Context: The primary objective of Text-to-Speech (TTS), speech conversion and speech to speech translation system is to synthesize or generate a high-quality speech signal. Typically, the quality of synthetic speech is subjectively evaluated by human listeners. This listening test aims to assess the degree of similarity to human speech rather than machine-like speech. The main challenge in assessing synthetic speech quality lies in finding a balance between the cost and reliability of evaluation. When the cost of conducting a human listening test is high, an automatic quality evaluation may be less reliable. Additionally, the definition of quality can be varied in different perspectives [7]. The quality of TTS output can be described in terms of various aspects such as intelligibility, naturalness, expressiveness, and the presence of noise. Furthermore, fine differences between two signals cannot be precisely tracked through Mean Opinion Score (MOS) ratings [1]. Moreover, the evolution of TTS systems has altered the nature of quality evaluation. Significant improvements in synthetic speech quality have been made over the last decade [2]. And while in the past, the emphasis was on intelligibility in speech synthesis, today, the focus is more on the expressiveness of synthetic speech. Recent efforts toward the automatic evaluation of synthesized speech [4] have demonstrated the success of objective metrics when the domain, language, and system are limited. In addition to the evolution of TTS quality over time, studies such as [10] and [8] have emphasized the need for new data collection and annotation for domain and language adaptation. Objective: The main objective of this thesis is to propose an active learning approach [9], where human intervention should be minimum, for a subjective task such as automatic evaluation of synthetic speech quality. The core of this framework would be an objective model as synthetic quality predictors, which require a diverse and efficient training samples. The main goal is to efficiently collect and query data in order to improve the precision of synthetic quality prediction or adapt the synthetic quality predictors to new domains and new generation of systems. It is essential to address different aspects of quality, domain-specific requirements, and linguistic variation through the acquisition of new data or retraining models with a specific emphasis on targeted sample sets. The main goal is to efficiently collect and query data to minimize information gaps, ensuring a comprehensive dataset for adaptation in order to maximize the performance improvement. The main adaptations that will be investigated in this project are language (adapting a trained quality predictor to a new language) and expressive speech synthesis (adapting a trained naturalness predictor to an expressive speech quality predictor). This adaptation could potentially extend to different listeners and system types, e.g. systems with different acoustic models or vocoder. In this context, the data collection (synthesizing new samples) is cost-effective, which allows focusing on only query optimization to identify the most informative samples. In a secondary objective, we will focus on modeling listeners’ disagreements in quality evaluation. This objective aims to address the diverse perspectives on the perception of TTS quality. Furthermore, this objective will work towards personalized quality prediction for TTS based on listeners’ individual definitions of quality. Consequently, analysing challenging scripts can reveal remaining challenges in the Text-to-Speech field.

References:

[1] Joshua Camp et al. “MOS vs. AB: Evaluating Text-to-Speech Systems Reliably Using Clustered Standard Errors”. In: Interspeech. 2023, pp. 1090–1094.

[2] Erica Cooper and Junichi Yamagishi. “How do Voices from Past Speech Synthesis Challenges Compare Today?” In: Proc. 11th ISCA Speech Synthesis Workshop (SSW 11). 2021, pp. 183–188. doi: 10.21437/SSW.2021-32.

[3] Erica Cooper et al. “Generalization ability of MOS prediction networks”. In: ICASSP. IEEE. 2022, pp. 8442–8446.

[4] Wen Chin Huang et al. “The VoiceMOS Challenge 2022”. In: Interspeech. 2022, pp. 4536–4540. doi: 10.21437/Interspeech.2022-970.

[5] Georgia Maniati et al. “SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis”. In: Interspeech. 2022, pp. 2388–2392. doi: 10.21437/Interspeech.2022-10922.

[6] Felix Saget et al. “LIUM-TTS entry for Blizzard 2023”. In: Blizzard Challenge Workshop. 2023. doi: hal.science/hal-04188761. [7] Fritz Seebauer et al. “Re-examining the quality dimensions of synthetic speech”. In: Proc. 12th ISCA Speech Synthesis Workshop (SSW2023). 2023, pp. 34–40. doi: 10.21437/SSW.2023-6.

[8] Thibault Sellam et al. “SQuId: Measuring speech naturalness in many languages”. In: ICASSP. IEEE. 2023, pp. 1–5.

[9] Burr Settles. “Active learning literature survey”. In: (2009).

[10] Wei-Cheng Tseng, Wei-Tsung Kao, and Hung-yi Lee. “DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores”. In: Interspeech. 2022, pp. 4541–4545.

Host laboratory : LIUM Location : Le Mans, France Supervisors : Anthony Larcher, Meysam Shamsi

Applicant profile : Candidate motivated by Artificial Intelligence, with Master's degree in Computer Science, Signal processing, Speech analysis or related fields

Instructions for Application: Send CV + letter/message of motivation + master’s note to : meysam.shamsi@univ-lemans.fr and anthony.larcher@univ-lemans.f

Back  Top

6-45(2024-03-23) Post-doc @LIUM, University of Le Mans, France

The LIUM is looking to recruit a post-doc to work on the development of interpretable transformer architectures.
Work will be done at LIUM and salary will be around 2,2k€ net/month, for 12 months.
More information on our website :


https://lium.univ-lemans.fr/en/contrat-post-doctoral-systemes-de-classifications-interpretables-de-bout-en-bout/

Back  Top

6-46(2024-03-22) Collaborateurs pour annotation de corpus textuels @ELDA, Paris, France

ELDA (Evaluations and Language resources Distribution Agency) recherche, dans le cadre de ses activités de production de ressources linguistiques, des annotateurs (F/H) natifs du français à temps plein pour l’annotation en entités nommés de documents textuels. La mission aura lieu dans les locaux d'ELDA (Paris 13e) et peut démarrer dès à présent.


Profil recherché :

• Natif du français et de nationalité française avec un très bon niveau de grammaire ;

• Bonne maîtrise d'outils informatiques ;

• Capacité à intégrer et suivre scrupuleusement des règles d’annotation ;

• Bon niveau de culture générale afin de pouvoir reconnaître les entités.


Modalités du contrat :

• Rémunération SMIC +20% (13,98€) horaire ;

• Fin du projet prévue pour septembre 2024 donc disponibilité juillet-août requise ;

• Travail en présentiel obligatoire ;

• CDD de 2 mois renouvelable.


Candidature :

• Envoyer un CV à <dylan@elda.org>


Back  Top

6-47(2024-03-27) Two funded PhD positions @Trinity College, Dublin, Ireland

 Two funded PhD positions to start in Sept 24 are proposed  in Trinity College Dublin. Both are focused on multimodal speech-based interaction.

 

One is in the AI space:

PhD Studentship in Neural Architectures for Multimodal Speech Recognition · ADAPT, the SFI Research Centre for AI-Driven Digital Content Technology (adaptcentre.ie)

 

while the other is more in the psycholinguistics space:

PhD Studentship in Multimodality in Action in Real Conversations · ADAPT, the SFI Research Centre for AI-Driven Digital Content Technology (adaptcentre.ie)

 

The positions are fully funded for 4 years for both EU and Non-EU students,  and are part of a larger project here focussed on multimodal interaction. Please share with students you know that may be interested. Application is via the above links, with March 29th the closing date.

 

Back  Top

6-48(2024-03-31) Theses et stages @INRIA Defense et Sécurité
Deux annonces de thèses et une de stage (pour les stages, il peut y avoir d'autres sujets) à Inria Défense&Sécurité :
* Détection et clustering de la langue parlée
* Description automatisée de scènes audio explicable et frugale
https://jobs.inria.fr/public/classic/clas/offres/2024-07410
Back  Top

6-49(2024-04-02) Proposition de these @ INRIA Nancy France

INRIA Nancy propose une thèse sur la génération de la langue des signes à partir de la parole, à  l’Université de Lorraine (à Nancy). 

 
Pour plus de détails et pour postuler, consultez :
https://jobs.inria.fr/public/classic/fr/offres/2024-07443 
Back  Top

6-50(2024-04-03) Two post-docs@Afeka Center of Language Processing (ACLP), Tel-Aviv, Israel

Two 2 open postdoctoral positions on the topic of Spoofing-robust speaker verification.

The aim of the research is to combine time and frequency domain information to achieve a better generalization robustness to new attacks.

The positions are open at Afeka Center of Language Processing (ACLP), Tel-Aviv, Israel, to work with Prof.Itshak Lapidot  and his  colleagues from Ben-Gurion University.

Candidates can contact Prof. Lapidot

 Prof. Itshak Lapidot  | Researcher |   ACLP – Afeka Center for Language Processing  |   

Afeka Academic College of Engineering |  Mobile:  052-8902471  |Tel: +972-3-7688793 itshakl@afeka.ac.il

Back  Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA