ISCApad #306 |
Saturday, December 09, 2023 by Chris Wellekens |
6-1 | (2023-06-01) PhD position @ Computer Science Lab in Bordeaux, France (LaBRI) and the LORIA (Nancy, France) In the framework of the PEPR Santé numérique “Autonom-Health” project (Health, behaviors and autonomous digital technologies), the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) and the LORIA (Nancy, France) are looking for candidates for a fully funded PhD position (36 months). The « Autonom-Health » project is a collaborative project on digital health between SANPSY, LaBRI, LORIA, ISIR and LIRIS. The abstract of the « Autonom-Health » project can be found at the end of this email.
The missions that will be addressed by the retained candidates are among these tasks, according to the profile of the candidate:
- Data collection tasks:
- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs) - Collection of patient/doctor interactions during clinical interviews - ASR-related tasks - Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients, - Adaptation of the ASR system to clinical interviews domain, - Automatic phonetic transcription / alignment using end2end architectures - Adapting ASR transcripts to be used with semantic analysis tools developed at LORIA - Speech analysis tasks - Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases. The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the LORIA teams : « Multispeech » (contact: Emmanuel Vincent emmanuel.vincent@inria.fr) and/or the « Sémagramme » (contact: Maxime Amblard maxime.amblard@loria.fr). Gross salary: approx. 2044 €/month Starting date: October 2023
Required qualifications: Master in Signal processing / speech analysis / computer science Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, excellent command of French (interactions with French patients and clinicians), good level of scientific English. Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design. Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts. Applications: To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV, cover letter (describing your personal qualifications, research interests and motivation for applying), contact information of two referees and academic certificates (Master, Bachelor certificates). —— Abstract of the « Autonom-Health » project: Western populations face an increase of longevity which mechanically increases the number of chronic disease patients to manage. Current healthcare strategies will not allow to maintain a high level of care with a controlled cost in the future and E health can optimize the management and costs of our health care systems. Healthy behaviors contribute to prevention and optimization of chronic diseases management, but their implementation is still a major challenge. Digital technologies could help their implementation through numeric behavioral medicine programs to be developed in complement (and not substitution) to the existing care in order to focus human interventions on the most severe cases demanding medical interventions. However, to do so, we need to develop digital technologies which should be: i) Ecological (related to real-life and real-time behavior of individuals and to social/environmental constraints); ii) Preventive (from healthy subjects to patients); iii) Personalized (at initiation and adapted over the course of treatment) ; iv) Longitudinal (implemented over long periods of time) ; v) Interoperated (multiscale, multimodal and high-frequency); vi) Highly acceptable (protecting users’ privacy and generating trustability).
The above-mentioned challenges will be disentangled with the following specific goals: Goal 1: Implement large-scale diagnostic evaluations (clinical and biomarkers) and behavioral interventions (physical activities, sleep hygiene, nutrition, therapeutic education, cognitive behavioral therapies...) on healthy subjects and chronic disease patients. This will require new autonomous digital technologies (i.e. virtual Socially Interactive Agents SIAs, smartphones, wearable sensors). Goal 2: Optimize clinical phenotyping by collecting and analyzing non-intrusive data (i.e. voice, geolocalisation, body motion, smartphone footprints, ...) which will potentially complement clinical data and biomarkers data from patient cohorts. Goal 3: Better understand psychological, economical and socio-cultural factors driving acceptance and engagement with the autonomous digital technologies and the proposed numeric behavioral interventions. Goal 4: Improve interaction modalities of digital technologies to personalize and optimize long-term engagement of users. Goal 5: Organize large scale data collection, storage and interoperability with existing and new data sets (i.e, biobanks, hospital patients cohorts and epidemiological cohorts) to generate future multidimensional predictive models for diagnosis and treatment. Each goal will be addressed by expert teams through complementary work-packages developed sequentially or in parallel. A first modeling phase (based on development and experimental testings), will be performed through this project. A second phase funded via ANR calls will allow to recruit new teams for large scale testing phase. This project will rely on population-based interventions in existing numeric cohorts (i.e KANOPEE) where virtual agents interact with patients at home on a regular basis. Pilot hospital departments will also be involved for data management supervised by information and decision systems coordinating autonomous digital Cognitive Behavioral interventions based on our virtual agents. The global solution based on empathic Human-Computer Interactions will help targeting, diagnose and treat subjects suffering from dysfunctional behavioral (i.e. sleep deprivation, substance use...) but also sleep and mental disorders. The expected benefits from such a solution will be an increased adherence to treatment, a strong self-empowerment to improve autonomy and finally a reduction of long-term risks for the subjects and patients using this system. Our program should massively improve healthcare systems and allow strong technological transfer to information systems / digital health companies and the pharma industry.
| ||||||
6-2 | (2023-06-02) Open faculty position at KU Leuven, Belgium: junior professor in Synergistic Processing of Multisensory Data for Audio-Visual Understanding Open faculty position at KU Leuven, Belgium: junior professor in Synergistic Processing
| ||||||
6-3 | (2023-06-04) PhD in ML/NLP @ Dauphine Université PSL, Paris and Université Grenoble Alpes, France PhD in ML/NLP – Fairness and self-supervised learning for speech processing Salary: ~2000€ gross/month (social security included) Mission: research oriented (teaching possible but not mandatory)
Keywords: speech processing, fairness, bias, self-supervised learning, evaluation metrics
CONTEXT This thesis is in the context of the ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive and Innovative Speech Technologies). Self-supervised learning (SSL) has recently emerged as one of the most promising artificial intelligence (AI) methods as it becomes now feasible to take advantage of the colossal amounts of existing unlabeled data to significantly improve the performances of various speech processing tasks.
PROJECT OBJECTIVES Speech technologies are widely used in our daily life and are expanding the scope of our action, with decision-making systems, including in critical areas such as health or legal aspects. In these societal applications, the question of the use of these tools raises the issue of the possible discrimination of people according to criteria for which society requires equal treatment, such as gender, origin, religion or disability... Recently, the machine learning community has been confronted with the need to work on the possible biases of algorithms, and many works have shown that the search for the best performance is not the only goal to pursue [1]. For instance, recent evaluations of ASR systems have shown that performances can vary according to the gender but these variations depend both on data used for learning and on models [2]. Therefore such systems are increasingly scrutinized for being biased while trustworthy speech technologies definitely represents a crucial expectation.
- First make a survey on the many definitions of robustness, fairness and bias with the aim of coming up with definitions and metrics fit for speech SSL models - Then gather speech datasets with high amount of well-described metadata - Setup an evaluation protocol for SSL models and analyzing the results.
SKILLS
SCIENTIFIC ENVIRONMENT The PhD position will be co-supervised by Alexandre Allauzen (Dauphine Université PSL, Paris) and Solange Rossato and François Portet (Université Grenoble Alpes). Joint meetings are planned on a regular basis and the student is expected to spend time in both places. Moreover, two other PhD positions are open in this project. The students, along with the partners will closely collaborate. For instance, specific SSL models along with evaluation criteria will be developed by the other PhD students. Moreover, the PhD student will collaborate with several team members involved in the project in particular the two other PhD candidates who will be recruited and the partners from LIA, LIG and Dauphine Université PSL, Paris. The means to carry out the PhD will be provided both in terms of missions in France and abroad and in terms of equipment. The candidate will have access to the cluster of GPUs of both the LIG and Dauphine Université PSL. Furthermore, access to the National supercomputer Jean-Zay will enable to run large scale experiments.
INSTRUCTIONS FOR APPLYING Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Alexandre Allauzen (alexandre.allauzen@espci.psl.eu), Solange Rossato (Solange.Rossato@imag.fr) and François Portet (francois.Portet@imag.fr). We celebrate diversity and are committed to creating an inclusive environment for all employees.
REFERENCES: [1] Mengesha, Z., Heldreth, C., Lahav, M., Sublewski, J. & Tuennerman, E. “I don’t Think These Devices are Very Culturally Sensitive.”—Impact of Automated Speech Recognition Errors on African Americans. Frontiers in Artificial Intelligence 4. issn: 2624-8212. https://www.frontiersin.org/article/10.3389/frai.2021.725911 (2021). [2] Garnerin, M., Rossato, S. & Besacier, L. Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech in Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing (2021), 86–92.
| ||||||
6-4 | (2023-06-06) Postdoc in recognition and translation @LABRI, Bordeaux, France In the framework of the European FETPROACT « Fvllmonti » project and the PEPR Santé numérique “Autonom-Health” project, the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) is looking for candidates for a 24-months post-doctoral position. The « Fvllmonti » project is a collaborative project on new transistor architectures applied to speech recognition and machine translation between IMS, LaBRI, LAAS, INL, EPFL, GTS and Namlab. More information on the project is available at www.fvllmonti.eu The « Autonom-Health » project is a collaborative project on digital health between SANPSY, LaBRI, LORIA, ISIR and LIRIS. The abstract of the « Autonom-Health » project can be found at the end of this email. The missions that will be addressed by the retained candidate are among these selected tasks, according to the profile of the candidate: - Data collection tasks:
- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)
- ASR-related tasks
- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,
- Automatic phonetic transcription / alignment using end2end architectures
- Speech analysis tasks:
- Automatic social affect/emotion/attitudes recognition on speech samples
- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.
The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the « Multispeech » (contact: Emmanuel Vincent) and/or the « Sémagramme » (contact: Maxime Amblard) teams at LORIA. Gross salary: approx. 2686 €/month Starting data: As soon as possible
Required qualifications: PhD in Signal processing / speech analysis / computer science / language sciences
Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, good command of French (interactions with French patients and clinicians), good level of scientific English.
Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design.
Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.
Applications: To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV (including publication list), cover letter (describing your personal qualifications, research interests and motivation for applying), evidence for software development experience (active Github/Gitlab profile or similar), two of your key publications, contact information of two referees and academic certificates (PhD, Diploma/Master, Bachelor certificates).
However, to do so, we need to develop digital technologies which should be: i) Ecological (related to real-life and real-time behavior of individuals and to social/environmental constraints); ii) Preventive (from healthy subjects to patients); iii) Personalized (at initiation and adapted over the course of treatment) ; iv) Longitudinal (implemented over long periods of time) ; v) Interoperated (multiscale, multimodal and high-frequency); vi) Highly acceptable (protecting users’ privacy and generating trustability).
The above-mentioned challenges will be disentangled with the following specific goals: Goal 1: Implement large-scale diagnostic evaluations (clinical and biomarkers) and behavioral interventions (physical activities, sleep hygiene, nutrition, therapeutic education, cognitive behavioral therapies...) on healthy subjects and chronic disease patients. This will require new autonomous digital technologies (i.e. virtual Socially Interactive Agents SIAs, smartphones, wearable sensors). Goal 2: Optimize clinical phenotyping by collecting and analyzing non-intrusive data (i.e. voice, geolocalisation, body motion, smartphone footprints, ...) which will potentially complement clinical data and biomarkers data from patient cohorts. Goal 3: Better understand psychological, economical and socio-cultural factors driving acceptance and engagement with the autonomous digital technologies and the proposed numeric behavioral interventions. Goal 4: Improve interaction modalities of digital technologies to personalize and optimize long-term engagement of users. Goal 5: Organize large scale data collection, storage and interoperability with existing and new data sets (i.e, biobanks, hospital patients cohorts and epidemiological cohorts) to generate future multidimensional predictive models for diagnosis and treatment. Each goal will be addressed by expert teams through complementary work-packages developed sequentially or in parallel. A first modeling phase (based on development and experimental testings), will be performed through this project. A second phase funded via ANR calls will allow to recruit new teams for large scale testing phase. This project will rely on population-based interventions in existing numeric cohorts (i.e KANOPEE) where virtual agents interact with patients at home on a regular basis. Pilot hospital departments will also be involved for data management supervised by information and decision systems coordinating autonomous digital Cognitive Behavioral interventions based on our virtual agents. The global solution based on empathic Human-Computer Interactions will help targeting, diagnose and treat subjects suffering from dysfunctional behavioral (i.e. sleep deprivation, substance use...) but also sleep and mental disorders. The expected benefits from such a solution will be an increased adherence to treatment, a strong self-empowerment to improve autonomy and finally a reduction of long-term risks for the subjects and patients using this system. Our program should massively improve healthcare systems and allow strong technological transfer to information systems / digital health companies and the pharma industry.
In the framework of the European FETPROACT « Fvllmonti » project and the PEPR Santé numérique “Autonom-Health” project, the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) is looking for candidates for a 24-months post-doctoral position. The « Fvllmonti » project is a collaborative project on new transistor architectures applied to speech recognition and machine translation between IMS, LaBRI, LAAS, INL, EPFL, GTS and Namlab. More information on the project is available at www.fvllmonti.eu The « Autonom-Health » project is a collaborative project on digital health between SANPSY, LaBRI, LORIA, ISIR and LIRIS. The abstract of the « Autonom-Health » project can be found at the end of this email. The missions that will be addressed by the retained candidate are among these selected tasks, according to the profile of the candidate: - Data collection tasks:
- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)
- ASR-related tasks
- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,
- Automatic phonetic transcription / alignment using end2end architectures
- Speech analysis tasks:
- Automatic social affect/emotion/attitudes recognition on speech samples
- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.
The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the « Multispeech » (contact: Emmanuel Vincent) and/or the « Sémagramme » (contact: Maxime Amblard) teams at LORIA. Gross salary: approx. 2686 €/month Starting data: As soon as possible
Required qualifications: PhD in Signal processing / speech analysis / computer science / language sciences
Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, good command of French (interactions with French patients and clinicians), good level of scientific English.
Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design.
Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.
Applications: To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV (including publication list), cover letter (describing your personal qualifications, research interests and motivation for applying), evidence for software development experience (active Github/Gitlab profile or similar), two of your key publications, contact information of two referees and academic certificates (PhD, Diploma/Master, Bachelor certificates).
However, to do so, we need to develop digital technologies which should be: i) Ecological (related to real-life and real-time behavior of individuals and to social/environmental constraints); ii) Preventive (from healthy subjects to patients); iii) Personalized (at initiation and adapted over the course of treatment) ; iv) Longitudinal (implemented over long periods of time) ; v) Interoperated (multiscale, multimodal and high-frequency); vi) Highly acceptable (protecting users’ privacy and generating trustability).
The above-mentioned challenges will be disentangled with the following specific goals: Goal 1: Implement large-scale diagnostic evaluations (clinical and biomarkers) and behavioral interventions (physical activities, sleep hygiene, nutrition, therapeutic education, cognitive behavioral therapies...) on healthy subjects and chronic disease patients. This will require new autonomous digital technologies (i.e. virtual Socially Interactive Agents SIAs, smartphones, wearable sensors). Goal 2: Optimize clinical phenotyping by collecting and analyzing non-intrusive data (i.e. voice, geolocalisation, body motion, smartphone footprints, ...) which will potentially complement clinical data and biomarkers data from patient cohorts. Goal 3: Better understand psychological, economical and socio-cultural factors driving acceptance and engagement with the autonomous digital technologies and the proposed numeric behavioral interventions. Goal 4: Improve interaction modalities of digital technologies to personalize and optimize long-term engagement of users. Goal 5: Organize large scale data collection, storage and interoperability with existing and new data sets (i.e, biobanks, hospital patients cohorts and epidemiological cohorts) to generate future multidimensional predictive models for diagnosis and treatment. Each goal will be addressed by expert teams through complementary work-packages developed sequentially or in parallel. A first modeling phase (based on development and experimental testings), will be performed through this project. A second phase funded via ANR calls will allow to recruit new teams for large scale testing phase. This project will rely on population-based interventions in existing numeric cohorts (i.e KANOPEE) where virtual agents interact with patients at home on a regular basis. Pilot hospital departments will also be involved for data management supervised by information and decision systems coordinating autonomous digital Cognitive Behavioral interventions based on our virtual agents. The global solution based on empathic Human-Computer Interactions will help targeting, diagnose and treat subjects suffering from dysfunctional behavioral (i.e. sleep deprivation, substance use...) but also sleep and mental disorders. The expected benefits from such a solution will be an increased adherence to treatment, a strong self-empowerment to improve autonomy and finally a reduction of long-term risks for the subjects and patients using this system. Our program should massively improve healthcare systems and allow strong technological transfer to information systems / digital health companies and the pharma industry.
In the framework of the European FETPROACT « Fvllmonti » project and the PEPR Santé numérique “Autonom-Health” project, the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) is looking for candidates for a 24-months post-doctoral position. The « Fvllmonti » project is a collaborative project on new transistor architectures applied to speech recognition and machine translation between IMS, LaBRI, LAAS, INL, EPFL, GTS and Namlab. More information on the project is available at www.fvllmonti.eu The « Autonom-Health » project is a collaborative project on digital health between SANPSY, LaBRI, LORIA, ISIR and LIRIS. The abstract of the « Autonom-Health » project can be found at the end of this email. The missions that will be addressed by the retained candidate are among these selected tasks, according to the profile of the candidate: - Data collection tasks:
- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)
- ASR-related tasks
- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,
- Automatic phonetic transcription / alignment using end2end architectures
- Speech analysis tasks:
- Automatic social affect/emotion/attitudes recognition on speech samples
- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.
The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the « Multispeech » (contact: Emmanuel Vincent) and/or the « Sémagramme » (contact: Maxime Amblard) teams at LORIA. Gross salary: approx. 2686 €/month Starting data: As soon as possible
Required qualifications: PhD in Signal processing / speech analysis / computer science / language sciences
Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, good command of French (interactions with French patients and clinicians), good level of scientific English.
Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design.
Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.
Applications: To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV (including publication list), cover letter (describing your personal qualifications, research interests and motivation for applying), evidence for software development experience (active Github/Gitlab profile or similar), two of your key publications, contact information of two referees and academic certificates (PhD, Diploma/Master, Bachelor certificates).
However, to do so, we need to develop digital technologies which should be: i) Ecological (related to real-life and real-time behavior of individuals and to social/environmental constraints); ii) Preventive (from healthy subjects to patients); iii) Personalized (at initiation and adapted over the course of treatment) ; iv) Longitudinal (implemented over long periods of time) ; v) Interoperated (multiscale, multimodal and high-frequency); vi) Highly acceptable (protecting users’ privacy and generating trustability).
The above-mentioned challenges will be disentangled with the following specific goals: Goal 1: Implement large-scale diagnostic evaluations (clinical and biomarkers) and behavioral interventions (physical activities, sleep hygiene, nutrition, therapeutic education, cognitive behavioral therapies...) on healthy subjects and chronic disease patients. This will require new autonomous digital technologies (i.e. virtual Socially Interactive Agents SIAs, smartphones, wearable sensors). Goal 2: Optimize clinical phenotyping by collecting and analyzing non-intrusive data (i.e. voice, geolocalisation, body motion, smartphone footprints, ...) which will potentially complement clinical data and biomarkers data from patient cohorts. Goal 3: Better understand psychological, economical and socio-cultural factors driving acceptance and engagement with the autonomous digital technologies and the proposed numeric behavioral interventions. Goal 4: Improve interaction modalities of digital technologies to personalize and optimize long-term engagement of users. Goal 5: Organize large scale data collection, storage and interoperability with existing and new data sets (i.e, biobanks, hospital patients cohorts and epidemiological cohorts) to generate future multidimensional predictive models for diagnosis and treatment. Each goal will be addressed by expert teams through complementary work-packages developed sequentially or in parallel. A first modeling phase (based on development and experimental testings), will be performed through this project. A second phase funded via ANR calls will allow to recruit new teams for large scale testing phase. This project will rely on population-based interventions in existing numeric cohorts (i.e KANOPEE) where virtual agents interact with patients at home on a regular basis. Pilot hospital departments will also be involved for data management supervised by information and decision systems coordinating autonomous digital Cognitive Behavioral interventions based on our virtual agents. The global solution based on empathic Human-Computer Interactions will help targeting, diagnose and treat subjects suffering from dysfunctional behavioral (i.e. sleep deprivation, substance use...) but also sleep and mental disorders. The expected benefits from such a solution will be an increased adherence to treatment, a strong self-empowerment to improve autonomy and finally a reduction of long-term risks for the subjects and patients using this system. Our program should massively improve healthcare systems and allow strong technological transfer to information systems / digital health companies and the pharma industry.
In the framework of the European FETPROACT « Fvllmonti » project and the PEPR Santé numérique “Autonom-Health” project, the speech and language research group at the Computer Science Lab in Bordeaux, France (LaBRI) is looking for candidates for a 24-months post-doctoral position. The « Fvllmonti » project is a collaborative project on new transistor architectures applied to speech recognition and machine translation between IMS, LaBRI, LAAS, INL, EPFL, GTS and Namlab. More information on the project is available at www.fvllmonti.eu The « Autonom-Health » project is a collaborative project on digital health between SANPSY, LaBRI, LORIA, ISIR and LIRIS. The abstract of the « Autonom-Health » project can be found at the end of this email. The missions that will be addressed by the retained candidate are among these selected tasks, according to the profile of the candidate: - Data collection tasks:
- Definition of scenarii for collecting spontaneous speech using Social Interactive Agents (SIAs)
- ASR-related tasks
- Evaluate and improve the performances of our end2end ESPNET-based ASR system for French real-world spontaneous data recorded from healthy subjects and patients,
- Automatic phonetic transcription / alignment using end2end architectures
- Speech analysis tasks:
- Automatic social affect/emotion/attitudes recognition on speech samples
- Analysis of vocal biomarkers for different diseases: adaptation of our biomarkers defined for sleepiness, research of new biomarkers targeted to specific diseases.
The position is to be hosted at LaBRI, but depending on the profile of the candidate, close collaboration is expected either with the « Multispeech » (contact: Emmanuel Vincent) and/or the « Sémagramme » (contact: Maxime Amblard) teams at LORIA. Gross salary: approx. 2686 €/month Starting data: As soon as possible
Required qualifications: PhD in Signal processing / speech analysis / computer science / language sciences
Skills: Python programming, statistical learning (machine learning, deep learning), automatic signal/speech processing, good command of French (interactions with French patients and clinicians), good level of scientific English.
Know-how: Familiarity with the ESPNET toolbox and/or deep learning frameworks, knowledge of automatic speech processing system design.
Social skills: good ability to integrate into multi-disciplinary teams, ability to communicate with non-experts.
Applications: To apply, please send by email at jean-luc.rouas@labri.fr a single PDF file containing a full CV (including publication list), cover letter (describing your personal qualifications, research interests and motivation for applying), evidence for software development experience (active Github/Gitlab profile or similar), two of your key publications, contact information of two referees and academic certificates (PhD, Diploma/Master, Bachelor certificates).
However, to do so, we need to develop digital technologies which should be: i) Ecological (related to real-life and real-time behavior of individuals and to social/environmental constraints); ii) Preventive (from healthy subjects to patients); iii) Personalized (at initiation and adapted over the course of treatment) ; iv) Longitudinal (implemented over long periods of time) ; v) Interoperated (multiscale, multimodal and high-frequency); vi) Highly acceptable (protecting users’ privacy and generating trustability).
The above-mentioned challenges will be disentangled with the following specific goals: Goal 1: Implement large-scale diagnostic evaluations (clinical and biomarkers) and behavioral interventions (physical activities, sleep hygiene, nutrition, therapeutic education, cognitive behavioral therapies...) on healthy subjects and chronic disease patients. This will require new autonomous digital technologies (i.e. virtual Socially Interactive Agents SIAs, smartphones, wearable sensors). Goal 2: Optimize clinical phenotyping by collecting and analyzing non-intrusive data (i.e. voice, geolocalisation, body motion, smartphone footprints, ...) which will potentially complement clinical data and biomarkers data from patient cohorts. Goal 3: Better understand psychological, economical and socio-cultural factors driving acceptance and engagement with the autonomous digital technologies and the proposed numeric behavioral interventions. Goal 4: Improve interaction modalities of digital technologies to personalize and optimize long-term engagement of users. Goal 5: Organize large scale data collection, storage and interoperability with existing and new data sets (i.e, biobanks, hospital patients cohorts and epidemiological cohorts) to generate future multidimensional predictive models for diagnosis and treatment. Each goal will be addressed by expert teams through complementary work-packages developed sequentially or in parallel. A first modeling phase (based on development and experimental testings), will be performed through this project. A second phase funded via ANR calls will allow to recruit new teams for large scale testing phase. This project will rely on population-based interventions in existing numeric cohorts (i.e KANOPEE) where virtual agents interact with patients at home on a regular basis. Pilot hospital departments will also be involved for data management supervised by information and decision systems coordinating autonomous digital Cognitive Behavioral interventions based on our virtual agents. The global solution based on empathic Human-Computer Interactions will help targeting, diagnose and treat subjects suffering from dysfunctional behavioral (i.e. sleep deprivation, substance use...) but also sleep and mental disorders. The expected benefits from such a solution will be an increased adherence to treatment, a strong self-empowerment to improve autonomy and finally a reduction of long-term risks for the subjects and patients using this system. Our program should massively improve healthcare systems and allow strong technological transfer to information systems / digital health companies and the pharma industry.
| ||||||
6-5 | (2023-06-02) Transcriptors for ELDA Paris France ELDA (Evaluations and Language resources Disctribution Agency) looks for full/part time transcriptors for transcription of phone calls in the financial domain. Location: ELDA (Paris-France) Latest starting date: July 2023 Languages and mission details
| ||||||
6-6 | (2023-06-08) Postdoc @ ENS,Paris, France DRhyaDS A new framework for understanding the Dynamic Rhythms and Decoding of Speech Job Title - Postdoctoral Researcher Disciplines and Areas of Research - Speech science, Psycholinguistics, Psychoacoustics Contract Duration - 1 Year Research Overview: The DRhyaDS project aims to develop a new framework for understanding the dynamic rhythms and decoding of speech. It focuses on exploring the temporal properties of speech and their contribution to speech perception. The project challenges the conventional view that speech rhythm perception relies on a one-to-one association between specific modulation frequencies in the speech signal and linguistic units. One of the key objectives of the project is to investigate the impact of language-specific temporal characteristics on speech dynamics. The project team will analyze two corpora of semi-spontaneous speech data from French and German, representing syllable-timed and stress-timed languages, respectively. Various acoustic analyses will be conducted on these speech corpora to explore the variability of slow temporal modulations in speech at an individual level. This comprehensive acoustic exploration will involve extracting and analyzing prosody, spectral properties, temporal dynamics, and rhythmic patterns. By examining these acoustic parameters, the project aims to uncover intricate details about the structure and variation of speech signals across languages and speakers, contributing to a more nuanced understanding of the dynamic nature of spoken language and its role in human communication. Environment: The selected candidate will be an integral part of an international research team and will work in a collaborative and stimulating lab environment. The project brings together a FrancoGerman team of experts in linguistics, psychoacoustics and cognitive neuroscience, led by Dr. Léo Varnet (CNRS, ENS Paris) and Dr. Alessandro Tavano (Max Planck Institute, Goethe University Frankfurt). The successful candidate will work under the supervision of Dr Léo Varnet, at the Laboratoire des Systèmes Perceptifs (ENS Paris). Job description: This is a one-year postdoctoral contract position, offering a net salary in accordance with French legislation (~2500€/month + social and medical benefits). Women and minorities are strongly encouraged to apply. The successful candidate will participate in research activities, collaborate with team members, and contribute to scientific publications and communications. Additionally, they will have the autonomy to suggest and implement their own analysis techniques and approaches. Their responsibilities will include: - Taking a lead role in collecting a comprehensive corpus of French speech data, adhering to a rigorous data collection protocol - Collaborating closely with the German team to leverage the existing German speech corpus for comparative analysis and cross-linguistic investigations - Conducting in-depth acoustic analysis of the corpora, employing advanced techniques to investigate the variability and dynamics of slow temporal modulations in speech - Actively participating in team meetings, workshops, and conferences to present research progress, exchange ideas, and contribute to the intellectual growth of the project - Engaging in science outreach activities to promote the project's research outcomes and facilitate public understanding of speech perception and language processing. Qualifications: - A recently obtained PhD in a relevant field (e.g., linguistics, psychology, neuroscience, computational sciences) - Strong expertise in linguistics, speech perception, acoustic analysis, and statistical methods - Proficiency in programming languages commonly used in speech research. Knowledge of MATLAB would be particularly valuable for data processing and analysis within the project. - Strong written and verbal communication skills in English. Candidates with proficiency in French and/or German language skills would be particularly appreciated, as it would enable a deeper understanding of the linguistic characteristics of the respective corpora. Application process To apply for this position, please submit a CV and a cover letter (in French or English) along with the names and contact information of 2 referees to Léo Varnet (leo.varnet@cnrs.fr). The application deadline is 31th July 2023. Interviews will be conducted in September. The ideal start date is October-November 2023, with some flexibility allowed. Feel free to get in touch informally to discuss this position
| ||||||
6-7 | (2023-06-16) PhD funded position@ INRIA France Inria is opening a fully funded PhD position on multimodal speech
| ||||||
6-8 | (2023-06-16) Post doc @ IMAG, Grenoble, France Call for postdoc applications in Natural Language Processing for the automatic detection of gender stereotypes in the French media (Grenoble Alps University, France) Starting date: flexible, November 30, 2023, at the latest Duration: full-time position for 12 months Salary: according to experience (up to 4142€/ month) Application Deadline: Open until filled Location: The position will be based in Grenoble, France. This is not a remote work. Keywords: natural language processing, gender stereotypes bias, corpus analysis, language models, transfer learning, deep learning *Context* The University of Grenoble Alps (UGA) has an open position for a highly motivated postdoc researcher to joint the multidisciplinary GenderedNews project. Natural Language Processing models trained on large amount of on-line content, have quickly opened new perspectives to process on-line large amount of on-line content for measuring gender bias in a daily basis (see our project https://gendered-news.imag.fr/ ). Regarding research on stereotypes, most recent works have studied Language Models (LM) from a stereotype perspective by providing specific corpora such as StereoSet (Nadeem et al., 2020) or CrowS-Pairs (Nangia et al. 2020). However, these studies are focusing on the quantifying of bias in the LM predictions rather than bias in the original data (Choenni et al., 2021). Furthermore, most of these studies ignore named entities (Deshpande et al., 2022) which account for an important part of the referents and speakers in news. In this project, we intend to build corpora, methods and NLP tools to qualify the differences between the language used to describe groups of people in French news. *Main Tasks* The successful postdoc will be responsible for day-to-day running of the research project, under the supervision of François Portet (Prof UGA at LIG) and Gilles Bastin (prof UGA at PACTE). Regular meetings will take place every two weeks. - Defining the dimensions of stereotypes to be investigated and the possible metrics that can be processed from a machine learning perspective. - Exploring, managing and curating news corpora in French for stereotypes investigation, with a view to making them widely available to the community to favor reproducible research and comparison. - Studying and developing new computational models to process large number of texts to reveal stereotype bias in news. Make use of pretrained models for the task. - Evaluate the methods on curated focused corpus and apply it to the unseen real longitudinal corpus and analyze the results with the team. - Preparing articles for submission to peer-reviewed conferences and journals. - Organizing progress meetings and liaising between members of the team. The hired person will interact with PhD students, interns and researchers being part of the GenderedNews project. According to his/her background his/her own interests and in accordance with the project's objective, the hired person will have the possibility to orient the research in different directions. *Scientific Environment* The recruited person will be hosted within the GETALP teams of the LIG laboratory (https://lig-getalp.imag.fr/), which offers a dynamic, international, and stimulating environment for conducting high-level multidisciplinary research. The person will have access to large datasets of French news, GPU servers, to support for missions as well as to the scientific activities of the labs. The team is housed in a modern building (IMAG) located in a 175-hectare landscaped campus that was ranked as the eighth most beautiful campus in Europe by the Times Higher Education magazine in 2018. The person will also closely work with Gilles Bastin (PACTE, a Sociology lab in Grenoble) and Ange Richard (PhD at LIG and PACTE). The project also includes an informal collaboration with 'Prenons la une' (https://prenonslaune.fr/) a journalists’ association which promotes a fair representation of women in the media. *Requirements* The candidate must have a PhD degree in Natural Language Processing or computer science or in the process of acquiring it. The successful candidate should have - Good knowledge of Natural Language Processing - Experience in corpus collection/formatting and manipulation. - Good programming skills in Python - Publication record in a close field of research - Willing to work in multidisciplinary and international teams - Good communication skills - Good mastering of French is required *Instructions for applying* Applications will be considered on the fly and must be addressed to François Portet (Francois.Portet@imag.fr). It is therefore advisable to apply as soon as possible. The application file should contain - Curriculum vitae - References for potential letter(s) of recommendation - One-page summary of research background and interests for the position - Publications demonstrating expertise in the aforementioned areas - Pre-defense reports and defense minutes; or summary of the thesis with the date of defense for those currently in doctoral studies *References* Deshpande et al. (2022). StereoKG: Data-Driven Knowledge Graph Construction for Cultural Knowledge and Stereotypes. arXiv preprint arXiv:2205.14036. Choenni et al. (2021). Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you? arXiv preprint arXiv:2109.10052. Nadeem et al. (2020) StereoSet: Measuring stereotypical bias in pretrained language models. ArXiv. Nangia et al. (2020) CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. In EMNLP2020.
| ||||||
6-9 | (2023-06-16) 6 Post-doc positions in 'Education and AI in the 21st century. Technology-enabled innovations in subject-specific teaching settings (PostdocTEIFUN)' , Tübingen and Stuttgart, Germany
the new Postdoc-Kolleg 'Education and AI in the 21st century. Technology-enabled innovations in subject-specific teaching settings (PostdocTEIFUN)' of the Tübingen School of Education (TüSE) and Professional School of Education Stuttgart-Ludwigsburg (PSE) will start in 2024.
It is offering six Post-doc positions funded for six years (100% TVL E14) to conduct interdisciplinary research in the field of education and AI.
The official announcement can be found here:
Among the various potential research fields, the following could be of interest here:
'Development of didactically informed Intelligent Language Tutoring approaches targeting spoken language learning for the English classroom. Potential directions include but are not limited to the realization of perceptual training, input enhancement, automatic corrective feedback, or prosodic complexity analysis.'
Interested? Please contact us at: Sabine Zerbian, sabine.zerbian@ifla.uni-stuttgart.de Detmar Meurers, detmar.meurers@uni-tuebingen.de
Note that there is a tight deadline for applications: 30.06.2023
| ||||||
6-10 | (2023-06-17) PhD @ NTNU, Trondheim, Norway The announcement is here:
Deadline is 2023-07-31.
| ||||||
6-11 | (2023-06-20) PhD position @ University of Applied Sciences, Hochschule Hannover, Germany
In a joint research project *VidQA*, our collaboration partner from University of Applied Sciences (Prof. Christian Wartena, HsH Hochschule Hannover) offers a full position (three years, as soon as possible) for a PhD student:
*VidQA* is a joint research project of the Institute for Applied Data Science Hannover (DATA|H) of the HsH, the Research Center L3S of the Leibniz University and the TIB – Leibniz Information Centre for Science and Technology. The goal of the project is the development and evaluation of new methods for semi-automatic generation of questions and answers for learning videos. Here, we pursue several research questions, such as, among others, the aspect of multimodality of video-based learning media, the generation of distractors ('wrong answers') for multiple-choice questions, and the automatic evaluation of answers for open-ended question formats.
*What you can expect*: - Develop and implement procedures for generating (multiple choice) questions, answers, and distractors. - Development and implementation of procedures for scoring free text answers - Collaboration in the development and evaluation of a system for examing comprehension of learning videos - Publication of research and development results at conferences and in professional journals - Participation in the organization the project
*Requirements* Research Field: Computer science Education Level: Master Degree or equivalent Skills/Qualifications - Master's degree (or equivalent) in computer science or computational linguistics - In-depth knowledge in the field of artificial intelligence and machine learning - Proven knowledge in Natural Language Processing (NLP) - Gender and diversity skills
Languages: ENGLISH Level: Excellent
Languages: GERMAN Level: Good
Website for additional job details: https://karriere.hs-hannover.de/bewerbung/beschreibung-900000104-10057.html
Where to apply: https://karriere.hs-hannover.de/bewerbung/beschreibung-900000104-10057.html *First Contact* ************************************ Prof. Dr. Christian Wartena University of Applied Sciences (HsH Hochschule Hannover) Institute for Applied Data Science Hannover (DATA|H) E-Mail: christian.wartena@hs-hannover.de
Postal address City: Hannover Website: https://www.hs-hannover.de/forschung/forschungsaktivitaeten/forschungscluster/smart-data-analytics Street: Expo Plaza 12 Postal Code: 30539
| ||||||
6-12 | (2023-06-26) These CIFRE pleinement financée, IMAG et Eloquant, Grenoble, France
| ||||||
6-13 | (2023-06-27) Research Associate in Integrated Multitask Neural Speech Labelling, University of Sheffield, UK Deadline July 13th, 2023
| ||||||
6-14 | (2023-07-01) Offre de thèse en 'Apprentissage profond pour l'identification du locuteur et séparation de la parole', CNRS Pour plus d'informations voir :
| ||||||
6-15 | (2023-07-19) Ingénieur chef de projet ressources et technologies linguistiques, INRIA, Nancy,France Ingénieur chef de projet ressources et technologies linguistiques Ville : Nancy, France
Date de prise de fonction souhaitée : 2023-10-01 Type de contrat : CDD 4 ans Niveau de diplôme exigé : BAC+5 ou équivalent Niveau d’expérience souhaité : de 3 à 5 ans Pour postuler : https://jobs.inria.fr/public/classic/fr/offres/2023-06574 Pour plus d’informations, contacter : Slim.Ouni@loria.fr Description complète du poste : https://jobs.inria.fr/public/classic/fr/offres/2023-06574 Poste : Ingénieur chef de projet ressources et technologies linguistiques CONTEXTE Ce poste se place dans le cadre du Défi Inria COLaF (Corpus et Outils pour les Langues de France), qui est une collaboration entre les équipes ALMAnaCH et MULTISPEECH. L’objectif du Défi est de développer et mettre à disposition des technologies numériques linguistiques pour la francophonie et les langues de France, en contribuant à la création de corpus de données inclusifs, de modèles, et de briques logicielles. L’équipe ALMAnaCH focalise sur le texte et l’équipe MULTISPEECH sur la parole multimodale. Les deux principaux objectifs de ce projet sont : (1) La collecte de corpus de données francophones, massifs et inclusifs : Il s’agit de constituer de très grands corpus textuels et de parole, avec des métadonnées riches pour améliorer la robustesse des modèles face à la variation linguistique, avec une place particulière pour la variation géographico-dialectale dans le contexte de la francophonie, dont une partie pourra être multimodale (audio, image, vidéo), voire spécifique à la langue des signes française (LSF). Les données liées à la parole multimodale concerneront entre autres les dialectes, les accents, la parole des personnes âgées, des enfants et des adolescents, la LSF et les autres langues largement parlées en France. La collecte de corpus sera basée prioritairement sur les données existantes. Ces données (parole multimodale) peuvent provenir des archives de l’INA et des radio-télévisions régionales ou étrangères, mais rarement sous une forme directement exploitable, ou bien auprès des spécialistes, mais sous forme de petits corpus dispersés. La difficulté consiste d’une part à identifier et pré-traiter les données pertinentes afin d’obtenir des corpus homogènes, et d’autre part à clarifier (et si possible assouplir) les contraintes légales et les contreparties financières régissant leur usage afin d’assurer l’impact le plus large possible. Lorsque les contraintes légales ne permettent pas d’utiliser les données existantes, un effort supplémentaire de collecte de données sera nécessaire. Ce sera probablement le cas des enfants (applications à l’éducation) et les personnes âgées (applications à la santé). Selon la situation, cet effort sera sous-traité à des linguistes de terrain ou mènera à une campagne à grande échelle. Cela sera conduit en collaboration avec Le VoiceLab et la DGLFLF. (2) Le développement et la mise à disposition de technologies linguistiques inclusives : Les technologies linguistiques considérées dans ce projet par l’équipe MULTISPEECH sont la reconnaissance et la synthèse de la parole, et la génération de la langue des signes. De nombreuses technologies sont déjà commercialisées. Il s’agit donc de ne pas réinventer ces outils, mais leur apporter les modifications nécessaires, afin qu’ils puissent exploiter les corpus inclusifs créés. Les technologies qui seront utilisées dans le cadre de ce projet portent sur, y compris, mais sans s’y limiter, les tâches suivantes : • Identification et prétraitement (semi-)automatique des données pertinentes au sein de masses de données existantes. Cela inclut la détection et le remplacement d’entités nommées à des fins d’anonymisation.
• Architectures neuronales et approches adaptées aux scénarios à faibles ressources (augmentation de données, apprentissage par transfert, apprentissage faiblement/non supervisé, apprentissage actif, et combinaison entre ces diverses formes d’apprentissage)
MISSIONS L’ingénieur chef de projet aura deux missions principales : • La gestion du projet et la coordination pratique de la contribution de l’équipe MULTISPEECH au Défi Inria. L’ingénieur chef de projet travaillera en étroite collaboration avec un ingénieur « junior », un chercheur et deux doctorants, tous travaillant dans le cadre de ce projet. Il assurera un encadrement rapproché de l’ingénieur « junior » et une interaction très fréquente avec le chercheur et les doctorants. Il sera en contact également avec les membres de l’équipe MULTISPEECH. Il y aura certainement une concertation et une collaboration solide avec son homologue au sein de l’équipe ALMAnaCH.
• La collecte de données et création de corpus de parole multimodale (cela comprend : certains dialectes, les accents, les personnes âgées, les enfants et adolescents, la LSF et certaines langues largement parlées en France autre que le français). Une grande partie de la collecte des données se fera auprès d’associations de locuteurs, des producteurs de contenus et tout partenaire pertinent pour la récupération de données. L’ingénieur chef de projet sera amené à discuter, notamment les aspects juridiques, avec nos interlocuteurs.
ACTIVITÉS PRINCIPALES• Définition des différents types de corpus à collecter (identifier les corpus potentiellement exploitables, établir une priorité et un planning de collecte)
• Collecte de corpus de parole auprès de producteurs de contenus ou de tout autre partenaire. (s'assurer que les données respectent les normes et les standards de qualité)
• Négociation des contrats d'utilisation des données, en veillant à respecter les aspects juridiques (négocier les conditions d'utilisation des données avec les producteurs de contenus ou les partenaires, en veillant à ce que les droits de propriété intellectuelle soient respectés et que les aspects juridiques soient pris en compte).
• Création et mise à disposition des technologies linguistiques pour le traitement de ces corpus : une fois collectées, les données doivent être analysées et traitées de manière à en extraire des informations utiles. L’ingénieur chef de projet doit proposer des technologies et des outils parmi l’existant, nécessaires à cette analyse, et s'assurer qu'ils sont accessibles aux utilisateurs.
• Encadrement rapproché de l’ingénieur junior : accompagnement et conseil au niveau des choix techniques et stratégiques de développement.
• Concertation et animation des échanges entre les membres du projet : (1) avec le chercheur et les deux doctorants (réflexions et échanges sur les données, et leurs adéquations au Défi.) ; (2) coordination avec les membres du projet au sein de l’équipe ALMAnaCH.
• Veille technologique, en particulier dans le domaine du ce défi.
• Rédaction et présentation de documentation technique
Note : Il s’agît ici d’une liste indicative d’activités qui pourra être adaptée dans le respect de la mission telle que libellée plus haut.COMPÉTENCES PROFIL RECHERCHÉ : • Diplômé en informatique, linguistique ou toute autre formation relevant du domaine du traitement automatique de la parole ou des langues
• Expérience confirmée en gestion de projet et en communication
• Connaissance approfondie des technologies linguistiques
• Capacité à travailler en équipe et à respecter les délais
• Bonne connaissance de l'anglais
SAVOIRS • Capacité à rédiger, à publier et à présenter en français et en anglais
• Maitrise des techniques de conduite des projets et de négociation
• Bases juridiques (données personnelles, propriété intellectuelle, droit des affaires)
SAVOIR-FAIRE • Capacités d'analyse, rédactionnelles et de synthèse
• Savoir accompagner et conseiller
• Savoir développer un réseau relationnel
• Savoir mener de front différents projets en même temps
• Capacités de négociation
• Sens des responsabilités et autonomie
• Sens du contact et goût pour le travail en équipe
• Rigueur, sens des priorités et du reporting
• Qualités relationnelles (écoute- diplomatie- pouvoir de conviction)
• Appétence pour la négociation (Le VoiceLab, DGLFLF, etc.)
• Capacité d’anticipation
• Esprit d’initiative et curiosité d’esprit
Poste à temps complet, à pourvoir dès que possible. Rémunération selon l’expérience. Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti. AVANTAGES • Restauration subventionnée
• Transports publics remboursés partiellement
• Congés: 7 semaines de congés annuels + 10 jours de RTT (base temps plein) + possibilité d'autorisations d'absence exceptionnelle (ex : enfants malades, déménagement)
• Équipements professionnels à disposition (visioconférence, prêts de matériels informatiques, etc.)
• Prestations sociales, culturelles et sportives (Association de gestion des œuvres sociales d'Inria)
• Accès à la formation professionnelle
• Sécurité sociale
2765€ brut/mois (selon l’expérience)
À PROPOS D'INRIA Inria est l’institut national de recherche en sciences et technologies du numérique. La recherche de rang mondial, l’innovation technologique et le risque entrepreneurial constituent son ADN. Au sein de 200 équipes-projets, pour la plupart communes avec les grandes universités de recherche, plus de 3 500 chercheurs et ingénieurs y explorent des voies nouvelles, souvent dans l’interdisciplinarité et en collaboration avec des partenaires industriels pour répondre à des défis ambitieux. Inria soutient la diversité des voies de l’innovation : de l’édition open source de logiciels à la création de startups technologiques (Deeptech). À PROPOS DU CENTRE INRIA NANCY – GRAND EST Le centre Inria Nancy – Grand-Est est un des huit centres d’Inria regroupant 400 personnes, réparties dans 22 équipes de recherche, et 8 services d’appui à la recherche. Toutes ces équipes de recherche sont communes avec des partenaires académiques, et trois d’entre elles sont basées à Strasbourg. Ce centre de recherche est un acteur majeur et reconnu dans le domaine des sciences numériques. Il est au cœur d'un riche écosystème de R&D et d’innovation : PME fortement innovantes, grands groupes, start-up, incubateurs & accélérateurs, pôles de compétitivité, acteurs de la recherche et de l’enseignement supérieur, instituts de recherche technologique. ENVIRONNEMENT DE TRAVAIL L’ingénieur chef de projet travaillera au sein de l’équipe projet MULTISPEECH au Centre de recherche Inria Nancy. Les recherches de MULTISPEECH sont centrées sur la parole multimodale, notamment sur son analyse et sa génération dans le contexte de l'interaction homme-machine. Un point central de ces travaux est la conception de modèles et de techniques d'apprentissage automatique pour extraire des informations sur le contenu linguistique, l'identité et les états du locuteur, et l'environnement de la parole, et pour synthétiser la parole multimodale en utilisant des quantités limitées de données étiquetées. Pour postuler :
| ||||||
6-16 | (2023-09-29) Offre de thèse à l'IRCAM, Paris, France Offre de thèse sur la conversion neuronale de la parole financée dans le cadre du projet ANR EVA “Explicit Voice Attributes'
| ||||||
6-17 | (2023-08-10) Postdocs in Natural Language Processing for the automatic detection of gender, Alps University, Grenoble, France) Call for postdoc applications in Natural Language Processing for the automatic detection of gender stereotypes in the French media (Grenoble Alps University, France) Starting date: flexible, November 30, 2023, at the latest Duration: full-time position for 12 months Salary: according to experience (up to 4142€/ month) Application Deadline: Open until filled Location: The position will be based in Grenoble, France. This is not a remote work. Keywords: natural language processing, gender stereotypes bias, corpus analysis, language models, transfer learning, deep learning *Context* The University of Grenoble Alps (UGA) has an open position for a highly motivated postdoc researcher to joint the multidisciplinary GenderedNews project. Natural Language Processing models trained on large amount of on-line content, have quickly opened new perspectives to process on-line large amount of on-line content for measuring gender bias in a daily basis (see our project https://gendered-news.imag.fr/ ). Regarding research on stereotypes, most recent works have studied Language Models (LM) from a stereotype perspective by providing specific corpora such as StereoSet (Nadeem et al., 2020) or CrowS-Pairs (Nangia et al. 2020). However, these studies are focusing on the quantifying of bias in the LM predictions rather than bias in the original data (Choenni et al., 2021). Furthermore, most of these studies ignore named entities (Deshpande et al., 2022) which account for an important part of the referents and speakers in news. In this project, we intend to build corpora, methods and NLP tools to qualify the differences between the language used to describe groups of people in French news. *Main Tasks* The successful postdoc will be responsible for day-to-day running of the research project, under the supervision of François Portet (Prof UGA at LIG) and Gilles Bastin (prof UGA at PACTE). Regular meetings will take place every two weeks. - Defining the dimensions of stereotypes to be investigated and the possible metrics that can be processed from a machine learning perspective. - Exploring, managing and curating news corpora in French for stereotypes investigation, with a view to making them widely available to the community to favor reproducible research and comparison. - Studying and developing new computational models to process large number of texts to reveal stereotype bias in news. Make use of pretrained models for the task. - Evaluate the methods on curated focused corpus and apply it to the unseen real longitudinal corpus and analyze the results with the team. - Preparing articles for submission to peer-reviewed conferences and journals. - Organizing progress meetings and liaising between members of the team. The hired person will interact with PhD students, interns and researchers being part of the GenderedNews project. According to his/her background his/her own interests and in accordance with the project's objective, the hired person will have the possibility to orient the research in different directions. *Scientific Environment* The recruited person will be hosted within the GETALP teams of the LIG laboratory (https://lig-getalp.imag.fr/), which offers a dynamic, international, and stimulating environment for conducting high-level multidisciplinary research. The person will have access to large datasets of French news, GPU servers, to support for missions as well as to the scientific activities of the labs. The team is housed in a modern building (IMAG) located in a 175-hectare landscaped campus that was ranked as the eighth most beautiful campus in Europe by the Times Higher Education magazine in 2018. The person will also closely work with Gilles Bastin (PACTE, a Sociology lab in Grenoble) and Ange Richard (PhD at LIG and PACTE). The project also includes an informal collaboration with 'Prenons la une' (https://prenonslaune.fr/) a journalists’ association which promotes a fair representation of women in the media. *Requirements* The candidate must have a PhD degree in Natural Language Processing or computer science or in the process of acquiring it. The successful candidate should have - Good knowledge of Natural Language Processing - Experience in corpus collection/formatting and manipulation. - Good programming skills in Python - Publication record in a close field of research - Willing to work in multidisciplinary and international teams - Good communication skills - Good mastering of French is required *Instructions for applying* Applications will be considered on the fly and must be addressed to François Portet (Francois.Portet@imag.fr). It is therefore advisable to apply as soon as possible. The application file should contain - Curriculum vitae - References for potential letter(s) of recommendation - One-page summary of research background and interests for the position - Publications demonstrating expertise in the aforementioned areas - Pre-defense reports and defense minutes; or summary of the thesis with the date of defense for those currently in doctoral studies *References* Deshpande et al. (2022). StereoKG: Data-Driven Knowledge Graph Construction for Cultural Knowledge and Stereotypes. arXiv preprint arXiv:2205.14036. Choenni et al. (2021). Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you? arXiv preprint arXiv:2109.10052. Nadeem et al. (2020) StereoSet: Measuring stereotypical bias in pretrained language models. ArXiv. Nangia et al. (2020) CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. In EMNLP2020.
| ||||||
6-18 | (2023-09-04) Project Manager@ELDA, Paris, France The Evaluations and Language resources Distribution Agency (ELDA), a company specialized in Human Language Technologies within an international context is currently seeking to fill an immediate vacancy for the permanent position: Project Manager - Intellectual Property, Personal Data Protection for AI and Language Technologies. Job description Under the CEO’s supervision, the Project Manager will handle legal issues related to the compilation, use and distribution of language datasets in a European and international environment. This yields excellent opportunities for creative, and motivated candidates wishing to participate actively in the Language Engineering field.
Their main tasks will consist of:
The position is based in Paris. Salary: Commensurate with qualifications and experience (between 40-60K€). Other benefits: complementary health insurance and meal vouchers. Required profile:
About
ELDA is an SME established in 1995 to promote the development and exploitation of Language Resources (LRs). Language Resources include all data necessary for language engineering, such as monolingual and multilingual lexica, text corpora, speech databases and terminology. ELDA’s role is to produce LRs, to collect and to validate them and, foremost, make them available to users in compliance with applicable regulations and ethical requirements. For further information about ELDA, visit: http://www.elda.org Applicants should email a cover letter addressing the points listed above together with a curriculum vitae to: ELDA
| ||||||
6-19 | (2023-09-08) 2 Research and Teaching Associates – PhD Positions –Signal Processing and Speech Communication Laboratory (TU Graz), Austria The Signal Processing and Speech Communication Laboratory (https://www.spsc.tugraz.at) of
| ||||||
6-20 | (2023-09-10) PhD position, MIAI, Université de Grenoble, France Job Offer: PhD Self-supervised models for transcribing the spontaneous speech of for 3- to 6-year-old children in French Starting date: between October 1st and December 1st, 2023 (flexible) Application deadline: From now until the position is filled Interviews: from September or latter if the position is still open Salary: ~2000€ gross/month (social security included) Mission: research oriented (teaching possible but not mandatory) Place of work: Laboratoire d'Informatique de Grenoble, CNRS, Grenoble,France Keywords: deep learning, natural language processing, speech recognition for children's voices, documentation of language development Description As part of the Artificial Intelligence & Language Chair at the Multidisciplinary Institute in Artificial Intelligence (; https://miai.univ-grenoble-alpes.fr/research/chairs/perception-interaction/artificialintelligence-language-850480.kjsp?RH=6499588038450843), we offer a PhD thesis topic devoted to the enriched automatic transcription of the spontaneous speech of 3- to 6-year-old children using an architecture based on self-supervised models [1]. These methods have emerged as one of the most successful approaches in artificial intelligence (AI), as they allow to exploit a colossal amount of existing unlabeled data and so achieve significantly higher performance for many domains. As part of the DyLNet project (Language dynamics, linguistic learning, and sociability at preschool: benefits of wireless proximity sensors in collecting big data ; https://dylnet.univ-grenoblealpes.fr/), coordinated by A. Nardy, a children's speech collection was carried out in a socially mixed preschool over a period of 2 and a half years [2]. Each year, around 100 children wore a box fitted with microphones that continuously recorded their speech. These boxes were worn for one week a month. We thus collected ~ 30,000 hours of recordings, 815 of which were transcribed and annotated by linguists. This unprecedentedly large corpus of children's spoken French will enable to meet the technical challenges associated with automatic speech processing. While continuous and unsupervised collection methods are now available, another challenge is the automatic transcription of children's voices, made difficult by their acoustic characteristics. The aim of the thesis is to design a transcription system for researchers as well as child development professionals (teachers, speech therapists, etc.). The aim of the thesis is therefore to: - review the state-of-the-art models and the performances achieved by automatic transcription tools for children's voices - implement processes to exploit the mass of audio data collected, and the associated metadata (sociodemographic information on participants, contexts of enunciation, interlocutors, etc.). - design and develop a system for transcribing children's speech using self-supervised tools, as proposed by Speechbrain [3]. The best system obtained will be made available to the language acquisition research community and child development professionals. - set up a system evaluation protocol based on transcribed data - propose tools for automating some of the linguistic analyses to enrich the obtained transcriptions and document oral language development in 3- to 6-year-old children. Skills : Master degree in Computer Science, Artificial Intelligence or Data Science Mastering Python programming and deep learning frameworks. Experience in automatic natural language processing will be really appreciated Excellent communication skills in French or, failing that, in English Scientific environment : The PhD. position will be co-supervised by Benjamin Lecouteux, Solange Rossato (LIG, Univ. Grenoble Alpes) et Aurélie Nardy (Lidilem, Univ. Grenoble Alpes). The recruited person will be part of the GETALP team of the LIG laboratory (https://lig-getalp.imag.fr/) which has extensive expertise and experience in the field of Natural Language Processing. The GETALP team offers a stimulating, multinational working environment, and provides the resources needed to complete the thesis in terms of equipment and scientific exchanges. Regular meetings with the three supervisors will take place throughout the thesis. Instructions for applying Application forms must contain: CV + letter/message of motivation + master + notes + be ready to provide letter(s) of recommendation. They should be addressed to Benjamin Lecouteux (benjamin.lecouteux@univ-grenoble-alpes.fr), Solange Rossato (solange.rossato@univ-grenoble-alpes.fr) and Aurélie Nardy (aurelie.nardy@univ-grenoblealpes.fr). [1] Evain, S., Nguyen, H., Le, H., Boito, M. Z., Mdhaffar, S., Alisamir, S., ... & Besacier, L. (2021). Lebenchmark: A reproducible framework for assessing self-supervised representation learning from speech. https://doi.org/10.48550/arXiv.2104.11462 [2] Nardy, A., Bouchet, H., Rousset, I., Liégeois, L., Buson, L., Dugua, C., Chevrot, J.-P. (2021). Variation sociolinguistique et réseau social : constitution et traitement d’un corpus de données orales massives. Corpus, 22 [en ligne]. https://doi.org/10.4000/corpus.5561 [3] Ravanelli, M., Parcollet, T., Plantinga, P., Rouhe, A., Cornell, S., Lugosch, L., ... & Bengio, Y. (2021). SpeechBrain: A general-purpose speech toolkit. https://arxiv.org/abs/2106.0462
| ||||||
6-21 | (2023-09-12) Assistant Professor (tenure track) position, University of Rochester, NY, USA Anticipated Start Date: (Mid August, 2024) DETAILED JOB DESCRIPTION: The Department of Psychology at the Rochester Institute of Technology (RIT; www.rit.edu/psychology) invites candidates to apply for a tenure-track Assistant Professor position starting in August 2024. We are seeking an energetic and enthusiastic psychologist who will serve as an instructor, researcher, and mentor to students in our undergraduate (Psychology, Neuroscience) and graduate programs (Masters in Experimental Psychology, Ph.D. in Cognitive Science). We are particularly looking to build a cohort of faculty who can contribute to the interdisciplinary Ph.D. program in Cognitive Science and contribute to research, mentoring, and teaching using computational and laboratory methods. Candidates should have expertise in an area of Cognitive Science such as cognitive or behavioral neuroscience, AI, computational/psycho-linguistics, cognitive psychology, comparative psychology, or related areas. We are particularly interested in individuals whose area of research expertise expands the current expertise of the faculty. Candidates who can teach courses in natural language processing or computational modeling courses are especially encouraged to apply. The Department of Psychology at RIT serves a rapidly expanding student population at a technical university. The position requires a strong commitment to teaching and mentoring, active research and publication, and a strong potential to attract external funding. Teaching and research are priorities for faculty at RIT, and all faculty are expected to mentor students through advising, research and in-class experiences. The successful candidate will be able to teach courses in our undergraduate cognitive psychology track (Memory & Attention, Language & Thought, Decision Making, Judgement & Problem Solving), will be expected to teach research methods/statistics courses at the undergraduate and graduate level, and teach and mentor students in our graduate programs. In addition, candidates must be able to do research and work effectively within the department’s existing lab space. RIT provides many opportunities for collaborative research across the institute in many diverse disciplines such as AI, Digital Humanities, Human-Centered Computing, and Cybersecurity. We are seeking individuals who have the ability and interest in contributing to a community committed to student-centeredness; professional development and scholarship; integrity and ethics; respect, diversity and pluralism; innovation and flexibility; and teamwork and collaboration. Select to view links to RIT’s core values, honor code, and statement of diversity. THE COLLEGE/ DEPARTMENT: The Department of Psychology at RIT offers B.S., M.S. degrees, Advanced Certificates, minors, immersions, electives, and a new interdisciplinary Ph.D. degree program in Cognitive Science. The B.S. degree provides a general foundation in psychology with specialized training in one of five tracks: biopsychology, clinical psychology, cognitive psychology, social psychology, and developmental psychology. The M.S. degree is in Experimental Psychology, with an Advanced Certificate offered in Engineering Psychology. We offer accelerated BS/MS programs with AI, Sustainability, and Experimental Psychology. The Ph.D. degree is in Cognitive Science and the program is broadly interdisciplinary with several partner units across the university. We also offer joint B.S. degrees in Human Centered Computing and Neuroscience. The College of Liberal Arts is one of nine colleges within Rochester Institute of Technology. The College has over 150 faculty in 13 departments in the arts, humanities and social sciences. The College currently offers fourteen undergraduate degree programs and five Master degrees, serving over 800 students. THE UNIVERSITY: Founded in 1829, Rochester Institute of Technology is a diverse and collaborative community of engaged, socially conscious, and intellectually curious minds. Through creativity and innovation, and an intentional blending of technology, the arts and design, we provide exceptional individuals with a wide range of academic opportunities, including a leading research program and an internationally recognized education for deaf and hard-of-hearing students. Beyond our main campus in Rochester, New York, RIT has international campuses in China, Croatia, Dubai, and Kosovo. And with more than 19,000 students and more than 125,000 graduates from all 50 states and over 100 nations, RIT is driving progress in industries and communities around the world. Find out more at www.rit.edu . REQUIRED MINIMUM QUALIFICATIONS:
HOW TO APPLY: Apply online at http://careers.rit.edu/faculty; search openings, then Keyword Search 8262BR. Please submit your application, curriculum vitae, cover letter addressing the listed qualifications and upload the following attachments:
You can contact the chair of the search committee, Caroline DeLong, Ph.D. with questions on the position at: cmdgsh@rit.edu. Review of applications will begin October 1, 2023 and will continue until an acceptable candidate is found.
| ||||||
6-22 | (2023-09-15) Postdoc at IRISA, Rennes, France Team Expression at IRISA is hiring a post-doc for 18 months in Speech synthesis. For more information, just follow this link:
| ||||||
6-23 | (2023-10-05) Professor at Saarland University, Saarbrücken, Germany The Department of Language Science and Technology of Saarland University
seeks to hire a Professor of Speech Science (W2 with tenure track to W3). For details see <https://www.uni-saarland.de/fileadmin/upload/verwaltung/stellen/Wissenschaftler/W2283_W2TTW3_Speech_Science.pdf>. --
| ||||||
6-24 | (2023-09-23) Assistant or Associate Professor of Computer Science, UTEP, El Paso, Texas Assistant or Associate Professor of Computer Science, UTEP, El Paso, Texas
The University of Texas at El Paso (UTEP) invites applications for a tenured/tenure-track Associate Professor position and three tenure-track Assistant Professor Positions in Computer Science (CS) starting in fall 2024. We invite applicants from all areas of CS. For three of the positions (including the Associate Professor position), preference will be given to those with demonstrated expertise in Artificial Intelligence.
Candidates are expected to have a record of high-quality scholarship and should be able to demonstrate the potential for excellence in both research and teaching. The department values both interdisciplinary research and industry/government experience.
More information and application instructions are at https://www.utep.edu/cs/news/news-2023/assistantassociateprofessor.html
| ||||||
6-25 | (2023-09-25) Professor (W2 with Tenure Track to W3) for Speech Science (m|f|x) ar Saarland University, Saarbrücken, Germany Saarland University is a campus-based university with a strong international focus and a research-oriented profile. Numerous research institutes on campus and the systematic promotion of collaborative projects make Saarland University an ideal environment for innovation and technology transfer. To further strengthen this excellence in research and teaching, the Department of Language Science and Technology seeks to hire a
Professor (W2 with Tenure Track to W3) for Speech Science (m|f|x) Reference n° W2283 Six-year tenure track position, starting April 2025, with the possibility of promotion to a permanent professorship (W3). We are looking for a highly motivated researcher in the field of phonetics, speech science, and speech technology, with extensive knowledge of speech production, perception and acoustics. The successful candidate is expected to have expertise in experimental and computational approaches to research on spoken language. A focus on spoken dialog and conversational speech and/or multimodal aspects of communication is particularly welcome. The Department of Language Science and Technology is internationally recognized for collaborative and interdisciplinary research, and the successful candidate is expected to contribute to relevant joint research initiatives. A demonstrated ability to attract external funding of research projects is therefore highly desired. Phonetics, speech science and speech technology are core elements of our study programs on the M.Sc. and B.Sc./B.A. level, and the successful candidate is expected to teach the associated courses within these programs. What we can offer you: Tenure track professors (W2) have faculty status at Saarland University, including the right to supervise Bachelor’s, Master’s and PhD students. The successful candidate will focus on carrying out world-class research, will lead their own research group, and will undertake teaching and supervision responsibilities. Tenure track professors (W2) with outstanding performance will receive tenure as a full professor (W3) provided a positive tenure evaluation is made. Decisions regarding tenure are made no later than six years after taking up the tenure track position. The position offers excellent working conditions in a lively and international scientific community. Saarland University is one of the leading centers for language science and computational linguistics in Europe, and offers a dynamic and stimulating research environment. The Department of Language Science and Technology organizes about 100 research staff in nine research groups in the fields of computational linguistics, psycholinguistics, phonetics and speech science, speech processing, and corpus linguistics (https://www.uni-saarland.de/en/department/lst.html). The department serves as the focal point of the Collaborative Research Center 1102 'Information Density and Linguistic Encoding' (http://www.sfb1102.uni-saarland.de). It is part of the Saarland Informatics Campus (https://saarland-informatics-campus.de/en), which brings together 800 researchers and 2,000 students from 81 countries and collaborates closely with world-class research institutions on campus, such as the Max Planck Institute for Informatics, the Max Planck Institute for Software Systems, and the German Research Center for Artificial Intelligence (DFKI). Qualifications: The appointment will be made in accordance with the general provisions of German public sector employment law. Applicants will have a PhD or doctorate in an appropriate subject and will have demonstrated a proven track record of independent academic research (e.g. as a junior or assistant professor, or by having completed an advanced, post-doctoral research degree (Habilitation) or equivalent academic activity at a university or research institution). They will typically have completed a period of postdoctoral research and have teaching experience at the university level. They must have demonstrated outstanding research capabilities and have the potential to successfully lead their own research group. The successful candidate will be expected to actively contribute to departmental research and teaching, including introductory lectures in phonetics and phonology, speech science, as well as more advanced lectures. The teaching language is English (in the MSc programs) and German (in the BSc/BA programs). We expect that the successful candidate has, or is willing to acquire within an appropriate period, sufficient proficiency to teach in both languages. Your Application: Applications should be submitted online at www.uni-saarland.de/berufungen. No additional paper copy is required. The application must contain: • a cover letter and curriculum vitae (including phone number and email address) • a full list of publications • a full list of third-party funding (own shares shown) • your proposed research plan (2-5 pages) • a teaching statement (1 page) • copies of your degree certificates • full-text copies of your 5 most important publications • a list of 3 academic references (including email addresses), at least one of whom must be a person who is outside the group of your current or former supervisors or colleagues. Applications must be received no later than 12 October 2023. The search committee will decide in its first meeting on late applications. Please include the job reference number W2283 when you apply. Please contact crocker@lst.uni-saarland.de if you have any questions. Saarland University regards internationalization as an institution-wide process spanning all aspects of university life and it therefore encourages applications that align with its internationalization strategy. Members of the university's professorial staff are therefore expected to engage in activities that promote and foster further internationalization. Special support will be provided for projects that continue with or expand on collaborative interactions within existing international cooperative networks, e.g. projects with partners in the European University Alliance Transform4Europe (www.transform4europe.eu) or the University of the Greater Region (www.uni-gr.eu). Saarland University is an equal opportunity employer. In accordance with its affirmative action policy, Saarland University is actively seeking to increase the proportion of women in this field. Qualified women candidates are therefore strongly encouraged to apply. Preferential consideration will be given to applications from disabled candidates of equal eligibility. When you submit a job application to Saarland University you will be transmitting personal data. Please refer to our privacy notice (https://www.uni-saarland.de/verwaltung/datenschutz/) for information on how we collect and process personal data in accordance with Art. 13 of the General Data Protection Regulation (GDPR). By submitting your application, you confirm that you have taken note of the information in the Saarland University privacy notice.
--
| ||||||
6-26 | (2023-10-02) PhD position at IMT Atlantique, Brest, France PhD Title: Summarization of activities of daily living using sound-based activity recognition
| ||||||
6-27 | (2023-10-04) Transcripteurs de langue tchèque @ELDA, Paris, France Dans le cadre de ses activités de production de ressources linguistiques, ELDA recherche des transcripteurs (f/h) de langue maternelle tchèque à temps plein ou partiel pour la transcription de 1500 heures d’enregistrements audio et/ou la révision des transcriptions. Le nombre total d'heures à transcrire ou à réviser sera adapté selon les disponibilités du candidat ou de la candidate. La mission aura lieu dans les locaux d'ELDA (Paris 13e) ou à distance via un espace sécurisé. La mission peut démarrer dès à présent. ELDA (Agence pour la Distribution des ressources Linguistiques et l'Evaluation)
| ||||||
6-28 | (2023-10-04) Transcripteurs de langue estonienne@ELDA, Paris, France Dans le cadre de ses activités de production de ressources linguistiques, ELDA recherche des transcripteurs (f/h) de langue maternelle estonienne à temps plein ou partiel pour la transcription de 1500 heures d’enregistrements audio et/ou la révision des transcriptions. Le nombre total d'heures à transcrire ou à réviser sera adapté selon les disponibilités du candidat ou de la candidate. La mission aura lieu dans les locaux d'ELDA (Paris 13e) ou à distance via un espace sécurisé. La mission peut démarrer dès à présent. ELDA (Agence pour la Distribution des ressources Linguistiques et l'Evaluation)
| ||||||
6-29 | (2023-10-14) Internship @ Orange, Lannion, France Dans le cadre de ses activités de développement de technologies vocales pour l'Afrique subsaharienne, l'entité DATA&AI d'Orange Innovation propose aux étudiants en 2è année de Master un stage de recherche de 6 mois.
Ce stage a pour objectif le développement de systèmes de compréhension automatique de la parole de bout-en-bout pour des langues africaines (End-to-End Spoken Language Understanding systems in Sub-Saharan African Languages) et pourra débuter dès le mois de janvier 2024. Le stage se déroulera dans les locaux d'Orange Innovation à Lannion.
Pour davantage d'informations sur l'offre et candidater, rendez-vous sur cette page : https://orange.jobs/jobs/v3/offers/129806?lang=fr
| ||||||
6-30 | (2023-10-16) Ingenieur d'études, DDL, Université de Lyon, France Le laboratoire DDL de l l'Université de Lyon recherche un ingénieur d'étude pour une durée de 4 mois. L'offre est accessible au lien suivant : https://emploi.cnrs.fr/Offres/CDD/UMR5596-MELCAN-001/Default.aspx
| ||||||
6-31 | (2023-10-16) Fully funded PhD positions @ University of Colorado Boulder, CO,USA The Human Bio-Behavioral Signals (HUBBS) Lab at University of Colorado Boulder is seeking outstanding candidates for fully funded positions within the Ph.D. program in Computer Science, specializing in affective computing, human-computer interaction, biomedical health analytics, and human-centered machine learning.
Ideal candidates should possess the following qualifications: - Proficiency in data analytics and/or machine learning - Prior research experience in machine learning or human-computer interaction, preferably prior publication(s) and/or a Masters (with thesis). - Prior experience in human-centered applications
Interested candidates should apply using the following link and list Dr. Theodora Chaspari as a potential Ph.D. advisor:
Join the Human Bio-Behavioral Signals (HUBBS) Lab at CU Boulder! The goal of HUBBS Lab is to make fundamental contributions to human-centered machine intelligence (ML) and to promote scientific advancement in trustworthy artificial intelligence (AI). This endeavor highly draws from interdisciplinary collaborations in health and psychological sciences, social sciences, and learning sciences, and leads to interdisciplinary scientific contributions. Main research areas include trustworthy and responsible human-centered AI, including dimensions of privacy-preservation, explainability, and fairness; human-AI teaming and collaborative decision-making; intelligent assistive interfaces for personalized user feedback in education and health; assistive computational technologies for combatting racism and bias; multimodal data analytics.
Join Our PhD Program in Computer Science at CU Boulder! We are committed to providing students with the best possible resources and support throughout their’ PhD journey. Our Ph.D. offer includes:
The Computer Science Department and CU Boulder have consistently risen in rankings over the years. We are the fastest-growing and largest department within the College of Engineering and Applied Science. The department is home to 50 tenured and tenure-track faculty plus 25 instructional faculty and now hosts nearly 1,700 undergraduate and 500 graduate students.
Highlights
We are committed to fostering a diverse, inclusive, and academically excellent community. At the college level, our undergraduate student population included 30% who self-identified as women, 27% who self-identified as persons of color, 15% who were first-generation college students, and 14% who were international students. Our graduate student body included 32% of students who self-identified as women, 16% who self-identified as a person of color, and 29% international students. In 2022, 26% of our tenured/tenure track faculty self-identify as women, and 22% self-identify as people of color.
Location, location, location! Boulder is located at the base of the Rocky Mountains in north-central Colorado. Boulder is consistently ranked as one of the happiest and healthiest places to live in the United States, known for its active, outdoorsy lifestyle and a strong sense of community. The city enjoys 300 days of sunshine annually and has over 150 miles of hiking and biking trails, along with easy access to skiing, rock climbing, and other outdoor activities. Major companies like Google, Microsoft, Ball Aerospace, and Lockheed Martin have offices in Boulder. The Denver metropolitan area, a 30-minute drive from Boulder, is a world-class city with an international airport with direct flights to Tokyo, Paris, London, and Mexico City.
| ||||||
6-32 | (2023-10-18) Post doc researcher @ University of Glasgow, UK We are looking to recruit a postdoctoral researcher (up to 3 years) with experience in deep learning + multimodal information processing (vision +audio/text). The researcher will contribute to an exciting interdisciplinary project that will develop human-centric AI models for analysing complex, audiovisual data to understand diversity and inclusion in screen media.
Please find details and link to apply for the post here: https://www.jobs.ac.uk/job/DDG225/research-assistant-associate
Information queries can be sent to tanaya.guha@glasgow.ac.uk
| ||||||
6-33 | (2023-10-15) Fully funded PhD positions at CUBES, University of Memphis, TN, USA The CUBES (Computational Understanding of Behaviors, Experiences, and Subjective states) Lab at the University of Memphis is seeking outstanding candidates for fully funded positions within the PhD program in Computer Science. We are looking for students interested in research at the intersection of machine learning, psychology, and signal processing and specializing in affective computing, human-computer interaction, and ethical human-centered machine learning.
Ideal candidates should have experience in: - Data analytics, machine learning, and/or trustworthy and fair AI - Human-centered applications either in research or industry - Research communication (e.g., publications, presentations, master’s thesis)
Interested candidates should apply to the PhD program at UofM and list Dr. Brandon Booth as a potential PhD advisor. Priority screening of applications for the Spring 2024 term will begin on November 1st, 2023, and applications will be accepted through December 1st, 2023. We particularly welcome candidates from groups that are historically underrepresented in computer science.
Join the CUBES Lab at UofM! The goal of CUBES Lab is to use multimodal AI to understand, track, and promote beneficial mental states/experiences/behaviors in real-life settings without perpetuating group biases. We conduct basic research to realize informative and interpretable representations of experiences, and we aim to close the loop in interactive experience monitoring, scale up experience modeling systems, and find unique ways for human-AI teams to succeed where either alone may fail. Our work is inclusive of new ideas and directions, especially in interdisciplinary and translational research in education, health, procedural fairness, and other prosocial domains.
Join our PhD Program in Computer Science at UofM! The University of Memphis is a top-tier research university with a Carnegie R1 designation, and we are committed to engaging PhD students in cutting-edge research in their area of interest. Our PhD program builds both breadth (through core graduate courses) and depth (via a rich selection of advanced courses and requiring participation in research projects), and it provides opportunities to work with interdisciplinary teams on federally funded research collaborations. For example, CS faculty lead the NIH-funded mDOT Biomedical Technology Resource Center and the Center for Information Assurance (CfIA). In addition, CS faculty work closely with multidisciplinary centers at the university such as the Institute for Intelligent Systems (IIS).
Memphis Highlights Known as America’s Number 1 logistics hub, Memphis has been ranked as one of the “World’s Greatest Places” by TIME, as America’s 4th best city for jobs by Glassdoor, and 4th in “Best Cost of Living”. Memphis metropolitan area has a population of 1.3 million. It boasts a vibrant culture and has a pleasant climate with an average temperature of 63 degrees.
| ||||||
6-34 | (2023-13-23) Ph position @ Saarland University, Germany : Machine Learning for Natural Languages and other sequence data, Saarland University, Germany ========================================================= ==== (Computer Science, Computational Linguistics, Physics or similar) The research group is focusing on getting a deeper understanding of how modern deep learning methods can be applied to natural languages or other sequence papers. Our recent achievements include a best paper award at COLING 2022 and a best theme paper award at ACL 2023. We offer a PhD position that is topically open and should have a strong focus on applying machine learning techniques to natural language data or other sequence data (e.g., string representations of chemical compounds). The ideal candidate for the position would have: 1. Excellent knowledge of machine learning and deep learning 2. Excellent programming skills 3. Masters degree in Computer Science, Computational Linguistics, Physics, or similar Salary: The PhD position will be 75% of full time on the German E13 scale (TV-L) which is about 3144€ per month before tax and social security contributions. The appointments will be for three years with a possible extension at 50%. About the department: The department of Language Science and Technology is one of the leading departments in the speech and language area in Europe. The flagship project at the moment is the CRC on Information Density and Linguistic Encoding. It also runs a significant number of European and nationally funded projects. In total, it has seven faculty and around 50 postdoctoral researchers and PhD students. The department is part of the Saarland Informatics Campus. With 900 researchers, two Max Planck institutes and the German Research Center for Artificial Intelligence, it is one of the leading locations for Informatics in Germany and Europe. How to apply: Please send us a letter of motivation, a research plan (max one page), your CV, your transcripts, if available a list of publications, and the names and contact information of at least two references, as a single PDF or a link to a PDF if the file size is more than 5 MB. Please apply latest by November 20th, 2023. Earlier applications are welcome and will be processed as they come in. Contact: Applications and any further inquiries regarding the project should be directed to
--
| ||||||
6-35 | (2023-10-23) Post Doc position: Natural Language Processing, Saarland University, Germany Post Doc position: Natural Language Processing, Saarland University, Germany ============================================================= (Computer Science, Computational Linguistics or similar) The research group is focusing on getting a deeper understanding of how modern deep learning methods can be applied to natural languages. Our recent achievements include a best paper award at COLING 2022 and a best theme paper award at ACL 2023. We offer a PostDoc position that is topically open and should have a strong focus on applying machine learning techniques to natural language data. The research should on the one hand be connected to ongoing research of PhD students and at the same time pursue a clear new direction. The ideal candidate for the position would have: 1. Solid experience in natural language processing 2. Excellent knowledge of machine learning and deep learning 3. Be involved, knowledgeable and generous in scientific discussions with all group members 4. Excellent programming skills 5. Doctoral degree in Computer Science, Computational Linguistics or similar Salary: The PostDoc position will be 100% of full time on the German E13 scale (TV-L) which is about 4188€ per month before tax and social security contributions. The appointments will be for two years with a possible extension. About the department: The department of Language Science and Technology is one of the leading departments in the speech and language area in Europe. The flagship project at the moment is the CRC on Information Density and Linguistic Encoding. It also runs a significant number of European and nationally funded projects. In total, it has seven faculty and around 50 postdoctoral researchers and PhD students. The department is part of the Saarland Informatics Campus. With 900 researchers, two Max Planck institutes and the German Research Center for Artificial Intelligence, it is one of the leading locations for Informatics in Germany and Europe. How to apply: Please send us a letter of motivation, a research plan (max one page), your CV, your transcripts, a list of publications, and the names and contact information of at least two references, as a single PDF or a link to a PDF if the file size is more than 5 MB. Please apply latest by November 20th, 2023. Earlier applications are welcome and will be processed as they come in. Contact: Applications and any further inquiries regarding the project should be directed to dietrich.klakow at lsv.uni-saarland.de
| ||||||
6-36 | (2023-11-15) Master 2 Internship, LPC Marseille, France Master 2 Internship Proposal Advisors: Jules Cauzinille, Benoˆıt Favre, Arnaud Rey November 2023 Deep transfer knowledge from speech to primate vocalizations Keywords: Computational bioacoustics, deep learning, self-supervised learning, transfer knowledge, efficient fine-tuning, primate vocalizations 1 Context This internship takes part in a multidisciplinary research project aimed at bridging the gap between state of the art deep leaning methods developed for speech processing and computational bioacoustics. Computational bioacoustics is a relatively new research filed which proposes to tackle the study of animal acoustic communication with computational approaches Stowell [2022]. Recently, bioacousticians are showing increasing interest for the deep learning revolution embodied in transformer architectures and self-supervised pre-trained models, but much investigation still needs to be carried out. We propose to test the viability of self-supervision and knowledge transfer as a bioacoustic tool by pre-training models on speech and using them for primate vocalisation analysis. 2 Problem Statement Speech based models are able to reach convincing performance on primate-related tasks including segmentation, individual identification or call type classification Sarkar and Doss [2023] as they are with many different downstream tasks (such as vocal emotion recognition Wang et al. [2021]). We have tested publicly available models such as HuBERT Hsu et al. [2021] and Wav2Vec2 [Schneider et al., 2019], two self-supervised speech-based architectures, on some of these tasks with Gibbon vocalizations. Our method involves probing and traditional fine-tuning of these models. As to ensure true knowledge transfer from pre-training speech datasets to the downstream classification tasks, the goal of this internship will be to implement efficient fine-tuning methods in a similar fashion. These will allow to limit and control the amount of information lost in the finetuning process. Depending on the interests of the candidate, the methods can include prompt tuning Lester et al. [2021], attention prompting Gao et al. [2023], low rank adaptation Hu et al. [2021] or adversarial reprogramming Elsayed et al. [2018]. The candidate will also be free to explore other methods relevant to the question at hand, either on Gibbons or other species data-sets currently being collected. 3 Profile The intern will propose and implement the efficient fine-tuning solutions on an array of (preferably self-supervised) acoustic models pre-trained on speech or general sound such as HuBERT, Wav2vec, WavLM, VGGish, etc. Exploring adversial re-programming of models pre-trained on other modalities (images, videos, etc.) could also be carried out. The work will be implemented using pytorch.The candidate must have the following qualities : • Excellent knowledge of deep learning methods • Extensive experience with PyTorch models • An interest in processing bioacoustic data • An interest in reading and writing scientific papers as well as some curiosity for research challenges The internship will last 6 months at the LIS and LPC laboratories in Marseille during spring 2024. The candidate will work in close collaboration with Jules Cauzinille as part of his thesis on “Self-supervised learning for primate vocalization analysis”. The candidate will also be in contact with the researchers community of the ILCB. 4 Contact Please send a CV, transcripts and a letter of application to jules.cauzinille@lis- lab.fr, benoit.favre@lislab.fr, and arnaud.rey@cnrs.fr. Do not hesitate to contact us if you have any question (or if you want to hear what our primates sound like). References Gamaleldin F. Elsayed, Ian Goodfellow, and Jascha Sohl-Dickstein. Adversarial reprogramming of neural networks, 2018. Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, and Yu Qiao. Llama-adapter v2: Parameter-efficient visual instruction model, 2023. Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, PP:1–1, 2021. doi: 10.1109/TASLP.2021.3122291. Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021. Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning, 2021. Eklavya Sarkar and Mathew Magimai Doss. Can Self-Supervised Neural Networks Pre-Trained on Human Speech distinguish Animal Callers?, May 2023. arXiv:2305.14035 [cs, eess]. Steffen Schneider, Alexei Baevski, Ronan Collobert, and Michael Auli. wav2vec: Unsupervised Pre-Training for Speech Recognition. In Proc. Interspeech 2019, pages 3465–3469, 2019. doi: 10.21437/Interspeech.2019-1873. Dan Stowell. Computational bioacoustics with deep learning: a review and roadmap. 10:e13152, 2022. ISSN 2167-8359. doi: 10.7717/peerj.13152. URL https://peerj.com/articles/13152. Yingzhi Wang, Abdelmoumene Boumadane, and Abdelwahab Heba. A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding. CoRR, abs/2111.02735, 2021. doi: 10.48550/arXiv.2111.02735
| ||||||
6-37 | (2023-11-23) Post-doctoral research position - L3i - La Rochelle France -- Post-doctoral research position - L3i - La Rochelle France --------------------------------------------------------------------------------------------------------------------------- Title : Emotion detection by semantic analysis of the text in comics speech balloons
The L3i laboratory has one open post-doc position in computer science, in the specific field of natural language processing in the context of digitised documents.
Duration: 12 months (an extension of 12 months will be possible) Position available from: as soon as possible Salary: approximately 2100 € / month (net) Place: L3i lab, University of La Rochelle, France Specialty: Computer Science/ Document Analysis/ Natural Language Processing Contact: Jean-Christophe BURIE (jcburie [at] univ-lr.fr) / Antoine Doucet (antoine.doucet [at] univ-lr.fr)
Position Description The L3i is a research lab of the University of La Rochelle. La Rochelle is a city in the south west of France on the Atlantic coast and is one of the most attractive and dynamic cities in France. The L3i works since several years on document analysis and has developed a well-known expertise in ‘Bande dessinée”, manga and comics analysis, indexing and understanding. The work done by the post-doc will take part in the context of SAiL (Sequential Art Image Laboratory) a joint laboratory involving L3i and a private company. The objective is to create innovative tools to index and interact with digitised comics. The work will be done in a team of 10 researchers and engineers. The team has developed different methods to extract and recognise the text of the speech balloons. The specific task of the recruited researcher will be to use Natural Language Processing strategies to analyse the text in order to identify emotions expressed by a character (reacting to the utterance of another speaking character) or caused by it (talking to another character). The datasets will be collections of comics in French and English.
Qualifications Candidates must have a completed PhD and a research experience in natural language processing. Some knowledge and experience in deep learning is also recommended.
General Qualifications • Good programming skills mastering at least one programming language like Python, Java, C/C++ • Good teamwork skills • Good writing skills and proficiency in written and spoken English or French
Applications Candidates should send a CV and a motivation letter to jcburie [at] univ-lr.fr and antoine.doucet [at] univ-lr.fr.
| ||||||
6-38 | (2023-11-25) M2 Master Internship, Nancy, France M2 Master Internship Automatic Alsatian speech recognition 1 Supervisors Name: Emmanuel Vincent Team and lab: Multispeech team, Inria research center at Université de Lorraine, Nancy Email: emmanuel.vincent@inria.fr Name: Pascale Erhart Team and lab: Language/s and Society team, LiLPa, Strasbourg Email: pascale.erhart@unistra.fr 2 Motivation and context This internship is part of the Inria COLaF project (Corpora and tools for the languages of France), whose objective is to develop and disseminate inclusive language corpora and technologies for regional languages (Alsatian, Breton, Corsican, Occitan, Picard, etc.), overseas languages and non-territorial immigration languages of France. With few exceptions, these languages are largely ignored by language technology providers [1]. However, such technologies are keys to the protection, promotion and teaching of these languages. Alsatian is the second regional language spoken in France in terms of number of speakers, with 46% of Alsace residents saying they speak Alsatian fairly well or very well [2]. However, it remains an underresourced language in terms of data and language technologies. Attempts at machine translation have been made as well as data collection [3]. 3 Objectives The objective of the internship is to design an automatic speech recognition system for Alsatian based on sound archives (radio, television, web, etc.). This raises two challenges: i) Alsatian is not a homogeneous language but a continuum of dialectal varieties which are not always written in a standardized way, ii) the textual transcription is often unavailable or differs from the pronounced words (transcription errors , subtitles, etc.). Solutions will be based on i) finding a suitable methodology for choosing and preparing data, ii) designing an automatic speech recognition system using end-to-end neural networks which can rely on the adaptation of an existing multilingual system like Whisper [4] in a self-supervised manner from a number of untranscribed recordings [5] and in a supervised manner from a smaller number of transcribed recordings, or even from text-only data [6]. The work will be based on datasets collected by LiLPa and the COLaF project’s engineers, which include the television shows Sunndi's Kater [7] and Kùmme Mit [8] whose dialogues are scripted, some radio broadcasts from the 1950s–1970s with their typescripts [9], as well as untranscribed radio broadcasts of France Bleu Elsass. Dictionaries of Alsatian such as the Wörterbuch der elsässischen Mundarten which can be consulted via the Woerterbuchnetz portal [10] or phonetization initiatives [11] could be exploited, for example using Orthal spelling [12]. The internship opens the possibility of pursuing a PhD thesis funded by the COLaF project. 4 Bibliography [1] DGLFLF, Rapport au Parlement sur la langue française 2023, https://www.culture.gouv.fr/Media/Presse/Rapport-au-Parlement-surla-langue-francaise-2023 [2] https://www.alsace.eu/media/5491/cea-rapport-esl-francais.pdf [3] D. Bernhard, A-L Ligozat, M. Bras, F. Martin, M. Vergez-Couret, P. Erhart, J. Sibille, A. Todirascu, P. Boula de Mareüil, D. Huck, “Collecting and annotating corpora for three under-resourced languages of France: Methodological issues”, Language Documentation & Conservation, 2021, 15, pp.316-357. [4] A. Radford, J.W. Kim, T. Xu, G. Brockman, C. McLeavey, I. Sutskever, “Robust speech recognition via large-scale weak supervision”, in 40th International Conference on Machine Learning, 2023, pp. 28492-28518. [5] A. Bhatia, S. Sinha, S. Dingliwal, K. Gopalakrishnan, S. Bodapati, K. Kirchhoff, “Don't stop selfsupervision: Accent adaptation of speech representations via residual Adapters”, in Interspeech, 2023, pp. 3362-3366. [6] N. San, M. Bartelds, B. Billings, E. de Falco, H. Feriza, J. Safri, W. Sahrozi, B. Foley, B. McDonnell, D. Jurafsky, “Leveraging supplementary text data to kick-start automatic speech recognition system development with limited transcriptions”, in 6th Workshop on Computational Methods for Endangered Languages, 2023, pp. 1-6. [7] https://www.france.tv/france-3/grand-est/sunndi-s-kater/ [8] https://www.france.tv/france-3/grand-est/kumme-mit/toutes-les-videos/ [9] https://www.ouvroir.fr/cpe/index.php?id=1511 [10] https://woerterbuchnetz.de/?sigle=ElsWB#0 [11] 10.5281/zenodo.1174213 [12] https://orthal.fr/ 5 Profile MSc in speech processing, natural language processing, computational linguistics, or computer science. Strong programming skills in Python/Pytorch. Knowledge of Alsatian and/or German is a plus, but in no way a prerequisite
| ||||||
6-39 | (2023-11-26) Stage Université du Mans, Le Mans, France Evaluation des systèmes de synthèse de la parole dans un environnement bruyant Sujet L’´evaluation perceptive est capitale dans de nombreux domaines li´es au technologie de la parole dont la synth`ese de la parole. Elle permet d’´evaluer la qualit´e de la synth`ese de mani`ere subjective en demandant `a un jury[5] de noter la qualit´e d’un stimuli de parole synth´etis´ee[1, 2]. De r´ecent travaux ont permis de d´evelopper un mod`ele d’intelligence artificielle[3, 4] qui permet de pr´edire l’´evaluation subjective d’un segment de parole synth´etis´ee, ainsi permettant de s’affranchir d’un test par jury. Le probl`eme majeur de cette ´evaluation est l’interpr´etation du mot “qualit´e”. Certains peuvent baser leur jugement sur les caract´eristiques intrins`eques de la parole (tel que le timbre, le d´ebit de parole, la ponctuation, etc) alors que d’autres peuvent baser leur jugement sur les caract´eristiques li´es au signal audio (comme la pr´esence ou non de distorsion). Ainsi, l’´evaluation subjective de la parole peut ˆetre biais´ee par l’interpr´etation de la consigne par les auditeurs. Par cons´equent, le mod`ele d’intelligence artificielle mentionn´e ci-dessus peut ˆetre ainsi bas´e sur des mesures biais´ees. Le projet a pour but de r´ealiser un travail exploratoire pour ´evaluer la qualit´e de la synth`ese de la parole d’une mani`ere plus robuste que celle ayant ´et´e propos´e jusqu’ici. Pour ceci, nous partons de l’hypoth`ese que la qualit´e de la synth`ese de la parole peut ˆetre estim´ee par le biais de sa d´etection dans un environnement r´eel. En d’autre termes, un signal synth´etis´e parfaitement pour reproduire un signal de parole humaine ne devrait pas ˆetre d´etect´e dans un environnement de la vie quotidienne. Bas´e sur cette hypoth`ese, nous proposons donc de monter une exp´erience de perception de la parole en milieu bruyant. Il existe des m´ethodes de reproduction de milieu sonore qui permettent de simuler un environnement existant au casque. L’avantage de ces m´ethodes c’est qu’il est ´egalement possible de jouer un enregistrement d’un milieu r´eel au casque tout en ajoutant des signaux comme s’il avait ´et´e pr´esent dans la sc`ene sonore enregistr´ee. Ceci implique d’une part une campagne de mesure acoustique dans des environnement bruyant de la vie quotidienne (transport, open space, cantine, etc). Ensuite, une g´en´eration de parole synth´etis´ee sera n´ecessaire tout en prenant en compte le contexte des enregistrements. Il sera ´egalement pertinent de faire varier les param`etres de la parole synth´etis´ee tout en gardant la mˆeme s´emantique. Les enregistrements de la vie quotidienne seront ensuite mix´es aux signaux de parole synth´etis´ee pour ´evaluer la d´etection de cette derni`ere. Nous utiliserons le pourcentage de fois que la parole synth´etis´ee sera d´etect´ee comme indicateur de qualit´e. Ces pourcentages de d´etection seront ensuite compar´es au pr´ediction du mod`ele d’intelligence artificielle mentionn´e ci-dessus. Ainsi, nous pourrons conclure (1) si les m´ethodes sont ´equivalentes ou compl´ementaires et (2) quel(s) param`etre(s) de la parole synth´etis´ee engendre une d´etection de cette derni`ere en milieu bruyant. Informations compl´ementaires: • Encadrement: Le stage sera co-encadr´e par Aghilas Sini, maˆıtre de conf´erence au Laboratoire d’Informatique de l’Universit´e du Mans (aghilas.sini@univ-lemans.fr) et Thibault Vicente, maˆıtre de conf´erence au Laboratoire d’Acoustique de l’Universit´e du Mans (thibault.vicente@univ-lemans.fr) • Niveau requis: Stage de M2 recherche • P´eriode envisag´ee: 6 mois (F´evrier `a Juillet 2024) • Lieu: Le Mans Universit´e • mots-cl´es: parole synth´etis´ee, synth`ese sonore binaurale, test par jury References [1] Y.-Y. Chang. Evaluation of tts systems in intelligibility and comprehension tasks. In Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing (ROCLING 2011), pages 64–78, 2011. [2] J. Chevelu, D. Lolive, S. Le Maguer, and D. Guennec. Se concentrer sur les diff´erences: une m´ethode d’´evaluation subjective efficace pour la comparaison de syst`emes de synth`ese (focus on differences: a subjective evaluation method to efficiently compare tts systems*). In Actes de la conf´erence conjointe JEP-TALN-RECITAL 2016. volume 1: JEP, pages 137–145, 2016. [3] C.-C. Lo, S.-W. Fu, W.-C. Huang, X. Wang, J. Yamagishi, Y. Tsao, and H.-M. Wang. MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion. In Proc. Interspeech 2019, pages 1541–1545, 2019. [4] G. Mittag and S. M¨oller. Deep learning based assessment of synthetic speech naturalness. arXiv preprint arXiv:2104.11673, 2021. [5] M. Wester, C. Valentini-Botinhao, and G. E. Henter. Are we using enough listeners? no!—an empirically-supported critique of interspeech 2014 tts evaluations. In 16th Annu. Conf. Int. Speech Commun. Assoc., 2015.
| ||||||
6-40 | (2023-11-27) Internship @ Telecom Paris, Paris, France ANR Project «REVITALISE» Automatic speech analysis of public talks.
Description. Today, humanity has reached a stage at which extremely important aspects (such as information exchange) are tied not only to actual so-called hard skills, but also to soft skills. One such important skill is public speaking. Like many forms of interaction between people, the assessment of public speaking depends on many factors (often subjectively perceived). The goal of our project is to create an automatic system which can take into account these different factors and evaluate the quality of the performance. This requires understanding which elements can be assessed objectively and which vary depending on the listener [Hemamou, Wortwein, Chollet21]. For such an analysis, it is necessary to analyze public speaking at various levels: high-level (audio, video, text), intermediate (voice monotony, auto-gestures, speech structure, and etc.) and low-level (fundamental frequency, action units, POS / tags, and etc.) [Barkar]. This internship offers an opportunity to analyze the audio component of a public speech. The student is asked to solve two main problems. The engineering task is to create an automatic speech transcription system that detects speech disfluency. To do this, it is proposed to collect a bibliography on this topic and come up with an engineering solution. The second, research task, is to use audio cues to automatically analyze the success of a performance of a talk. This internship will give you an opportunity to solve an engineering problem as well as learn more about research approaches. By the end you will have expertise in audio processing as well and machine learning methods for multimodal analysis. If the internship is successfully completed, an article may be published. PhD position fundings on Social Computing will be accessible in the team at the end of the internship (at INRIA). Registration & Organisation. Name of organization: Institut Polytechnique de Paris, Telecom-Paris Website of organization: https://www.telecom-paris.fr Department: IDS/LTCI/ Address: Palaiseau, France Supervision. Supervision will include weekly meetings with the main supervisor and regular meetings (every 2-3 weeks) with co-supervisors. Telecom-Paris, 2023-2024 ANR Project «REVITALISE» Name of supervisor: Alisa BARKAR Name of co-supervisor: Chloe Clavel, Mathieu Chollet, Béatrice BIANCARDI Contact details: alisa.barkar@telecom-paris.fr Duration & Planning. The internship is planned as a 5-6 month full-time internship for the spring semester 2024. 6 months considers 24 weeks within which it will be covered following list of activities: ● ACTIVITY 1(A1): Problem description and integration to the working environment ● ACTIVITY 2(A2): Bibliography overview ● ACTIVITY 3(A3): Implementation of the automatic transcription with detected discrepancies ● ACTIVITY 4(A4): Evaluation of the automatic transcription ● ACTIVITY 5(A5): Application of the developed methods to the existing data ● ACTIVITY 6(A6): Analysis of the importance of para-verbal features for the performance perception ● ACTIVITY 7(A7): Writing the report Selected references of the team. 1. [Hemamou] L. Hemamou, G. Felhi, V. Vandenbussche, J.-C. Martin, C. Clavel, HireNet: a Hierarchical Attention Model for the Automatic Analysis of Asynchronous Video Job Interviews. in AAAI 2019, to appear 2. [Ben-Youssef] Atef Ben-Youssef, Chloé Clavel, Slim Essid, Miriam Bilac, Marine Chamoux, and Angelica Lim. Ue-hri: a new dataset for the study of user engagement in spontaneous human-robot interactions. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, pages 464–472. ACM, 2017. 3. [Wortwein] Torsten Wörtwein, Mathieu Chollet, Boris Schauerte, Louis-Philippe Morency, Rainer Stiefelhagen, and Stefan Scherer. 2015. Multimodal Public Speaking Performance Assessment. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (ICMI '15). Association for Computing Machinery, New York, NY, USA, 43–50. 4. [Chollet21] Chollet, M., Marsella, S., & Scherer, S. (2021). Training public speaking with virtual social interactions: effectiveness of real-time feedback and delayed feedback. Journal on Multimodal User Interfaces, 1-13. 5. [Barkar] Alisa Barkar, Mathieu Chollet, Beatrice Biancardi, and Chloe Clavel. 2023. Insights Into the Importance of Linguistic Textual Features on the Persuasiveness of Public Speaking. In Companion Publication of the 25th International Conference on Multimodal Interaction (ICMI '23 Companion). Association for Computing Machinery, New York, NY, USA, 51–55. https://doi.org/10.1145/3610661.3617161 Telecom-Paris, 2023-2024 ANR Project «REVITALISE» Other references. 1. Dinkar, T., Vasilescu, I., Pelachaud, C. and Clavel, C., 2020, May. How confident are you? Exploring the role of fillers in the automatic prediction of a speaker’s confidence. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8104-8108). IEEE. 2. Whisper: Robust Speech Recognition via Large-Scale Weak Supervision, Radford A. et al., 2022, url: https://arxiv.org/abs/2212.04356 3. Romana, Amrit and Kazuhito Koishida. “Toward A Multimodal Approach for Disfluency Detection and Categorization.” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023): 1-5. 4. Radhakrishnan, Srijith et al. “Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition.” ArXiv abs/2310.06434 (2023): n. pag. 5. Wu, Xiao-lan et al. “Explanations for Automatic Speech Recognition.” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023): 1-5. 6. Min, Zeping and Jinbo Wang. “Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study.” ArXiv abs/2307.06530 (2023): n. pag. 7. Ouhnini, Ahmed et al. “Towards an Automatic Speech-to-Text Transcription System: Amazigh Language.” International Journal of Advanced Computer Science and Applications (2023): n. pag. 8. Bigi, Brigitte. “SPPAS: a tool for the phonetic segmentations of Speech.” (2023). 9. Rekesh, Dima et al. “Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition.” ArXiv abs/2305.05084 (2023): n. pag. 10. Arisoy, Ebru et al. “Bidirectional recurrent neural network language models for automatic speech recognition.” 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015): 5421-5425. 11. Padmanabhan, Jayashree and Melvin Johnson. “Machine Learning in Automatic Speech Recognition: A Survey.” IETE Technical Review 32 (2015): 240 - 251. 12. Berard, Alexandre et al. “End-to-End Automatic Speech Translation of Audiobooks.” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018): 6224-6228. 13. Kheir, Yassine El et al. “Automatic Pronunciation Assessment - A Review.” ArXiv abs/2310.13974 (2023): n. pag. Telecom-Paris, 2023-2024
| ||||||
6-41 | (2023-11-30) Internship Université du Mans, Le Mans, France Title: Predictive Modeling of Subjective Disagreement in Speech Annotation/Evaluation Host laboratory : LIUM Location : Le Mans Supervisors : Meysam Shamsi, Anthony Larcher Beginning of internship : February 2024 Application deadline : 10/01/2024 Keywords: Subjective Disagreement Modeling, Synthetic Speech Quality Evaluation, Speech Emotion Recognition In the context of modeling subjective tasks, where diverse opinions, perceptions, and judgments exist among individuals, such as in speech quality or speech emotion recognition, addressing the challenge of defining ground truth and annotating a training set becomes crucial. The current practice of aggregating all annotations into a single label for modeling a subjective task is neither fair nor efficient [1]. The variability in annotations or evaluations can stem from various factors [2], broadly categorized into those associated with corpus quality and those intrinsic to the samples themselves. In the first case, the delicate definition of a subjective task introduces sensitivity into the annotation process, potentially leading to more errors, especially where the annotation tools and platform lack precision or annotators experience fatigue. In the second case, the inherent ambiguity in defining a subjective task and different perception may result in varying annotations and disagreements. Developing a predictive model to understand annotator/evaluator disagreement is crucial for engaging in discussions related to ambiguous samples and refining the definition of subjective concepts. Furthermore, this model can serve as a valuable tool for assessing the confidence of automatic evaluations [3,4]. This modeling approach will contribute to the automatic evaluation of corpus annotations, identification of ambiguous samples for reconsideration or re-annotation, automatic assessment of subjective models, and the detection of underrepresented samples and biases in the dataset. The proposed research involves utilizing a speech dataset such as MS-Podcast [5], SOMOS [6], VoiceMOS [7], for a subjective task with multiple annotations per sample. The primary objective is to predict the variation in assigned labels, measured through disagreement scores, entropy, or distribution. Reference: [1]. Davani, A. M., Díaz, M., & Prabhakaran, V. (2022). Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics, 10, 92-110. [2]. Kreiman, J., Gerratt, B. R., & Ito, M. (2007). When and why listeners disagree in voice quality assessment tasks. The Journal of the Acoustical Society of America, 122(4), 2354-2364. [3]. Wu, W., Chen, W., Zhang, C., & Woodland, P. C. (2023). It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation. arXiv preprint arXiv:2310.00486. [4]. Han, J., Zhang, Z., Schmitt, M., Pantic, M., & Schuller, B. (2017, October). From hard to soft: Towards more human-like emotion recognition by modelling the perception uncertainty. In Proceedings of the 25th ACM international conference on Multimedia (pp. 890-897). [5]. Lotfian, R., & Busso, C. (2017). Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings. IEEE Transactions on Affective Computing, 10(4), 471-483. [6]. Maniati, G., Vioni, A., Ellinas, N., Nikitaras, K., Klapsas, K., Sung, J.S., Jho, G., Chalamandaris, A., Tsiakoulis, P. (2022) SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis. Proc. Interspeech 2022, 2388-2392 [7]. Cooper, E., Huang, W. C., Tsao, Y., Wang, H. M., Toda, T., & Yamagishi, J. (2023). The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains. arXiv preprint arXiv:2310.02640. Applicant profile : Candidate motivated by artificial intelligence, enrolled in a Master's degree in Computer Science or related fields For application: Send CV + cover letter to : meysam.shamsi@univ-lemans.fr or anthony.larcher@univ-lemans.fr before 10/01/2024
| ||||||
6-42 | (2023-12-02) Senior Data Scientist at the University of Chicago, IL, USA Senior Data Scientist at the University of Chicago
Please apply at https://uchicago.wd5.myworkdayjobs.com/External/job/Chicago-IL/Sr-Data-Scientist_JR24587
About the Department The TMW Center for Early Learning + Public Health (TMW Center) develops science-based interventions, tools, and technologies to help parents and caregivers interact with young children in ways that maximize brain development. A rich language environment is critical to healthy brain development, however few tools exist to measure the quality or quantity of these environments. Access to this type of data allows caregivers to enhance interactions in real-time and gives policy-makers insight in how to best build policies that have a population-level impact.
The job works independently to perform a variety of activities relating to software support and/or development. Analyzes, designs, develops, debugs, and modifies computer code for end user applications, beta general releases, and production support. Guides development and implementation of applications, web pages, and user-interfaces using a variety of software applications, techniques, and tools. Solves complex problems in administration, maintenance, integration, and troubleshooting of code and application ecosystem currently in production.
Responsibilities
Education: Minimum requirements include a college or university degree in related field. --- Minimum requirements include knowledge and skills developed through 5-7 years of work experience in a related job discipline. --- --- Preferred Qualifications Education:
Experience:
Technical Skills or Knowledge:
Application Documents
| ||||||
6-43 | 52023-12-05) Post-doc et ingénieur d'étude dans le dans le cadre de l’ANR-JCJC RESSAC, LPNC, Grenoble, France
| ||||||
6-44 | (2023-12-05) Postdoctoral Scholar, Penn State University, PA, USAPostdoctoral Scholar | Data Sciences and Artificial Intelligence at Penn State University
The Data Sciences and Artificial Intelligence (DS/AI) group at Penn State invites applications for a Postdoctoral Scholar position, set to commence in Fall 2024. This role is centered on cutting-edge research at the nexus of machine learning, deep learning, computer vision, psychology, and biology, with foci on psychology-inspired AI and addressing significant biological questions using AI. Qualifications:
About the position: The successful candidate will be designated as a Postdoctoral Scholar at the College of Information Sciences and Technology (IST) of The Pennsylvania State University. The initial term of the position is for one year, with the possibility of renewal upon performance and fund availability. The scholar will be engaged in two interdisciplinary projects funded by the National Science Foundation, receiving mentorship from Professors James Wang (IST), Brad Wyble (Psychology), and Charles Anderson (Biology). The scholar will collaborate with highly motivated and talented graduate students and benefit from strong career development support, which includes training in teaching, grant proposal writing, and other collaborative work. Qualified candidates will have the ability to teach in IST after successfully completing one semester with approval from college leadership.
To apply:
COMMITMENT TO DIVERSITY: The College of IST is strongly committed to a diverse community and to providing a welcoming and inclusive environment for faculty, staff and students of all races, genders, and backgrounds. The College of IST is committed to making good faith efforts to recruit, hire, retain, and promote qualified individuals from underrepresented minority groups including women, persons of color, diverse gender identities, individuals with disabilities, and veterans. We invite applicants to address their engagement in or commitment to inclusion, equity, and diversity issues as they relate to broadening participation in the disciplines represented in the college as well as aligning with the mission of the College of IST in a separate statement.
CAMPUS SECURITY CRIME STATISTICS: Pursuant to the Jeanne Clery Disclosure of Campus Security Policy and Campus Crime Statistics Act and the Pennsylvania Act of 1988, Penn State publishes a combined Annual Security and Annual Fire Safety Report (ASR). The ASR includes crime statistics and institutional policies concerning campus security, such as those concerning alcohol and drug use, crime prevention, the reporting of crimes, sexual assault, and other matters. The ASR is available for review here.
Employment with the University will require successful completion of background check(s) in accordance with University policies.
EEO IS THE LAW Penn State is an equal opportunity, affirmative action employer, and is committed to providing employment opportunities to all qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. If you are unable to use our online application process due to an impairment or disability, please contact 814-865-1473.
| ||||||
6-45 | (2023-12-07) Post-doc @ Université du Mans, Le Mans, France
| ||||||
6-46 | (2023-12-07) Stages (M1, M2, PFE ingénieur) @ IRIT, Toulouse, France L’équipe SAMoVA de l’IRIT à Toulouse propose plusieurs stages (M1, M2, PFE ingénieur) en 2024 autour des thématiques suivantes (liste non exhaustive) : - Génération Automatique De Partitions Musicales Dans Le Style Choro - Compréhension De La Parole Et IA Au Service De L’Analyse Sensorielle
- Caractérisation Du Comportement Alimentaire Par Des Analyses Vidéo Et Multimodale
- Adaptations De Systèmes De Reconnaissance Automatique De Parole En Contexte Pathologique
- Traitement De Signal Et IA Pour Révéler Des Troubles Articulatoires En Production De Parole Atypique
- End-To-End Speech Recognition For Assessing Comprehension Skills Of Children Learning To Read
- Active Learning For Speaker Diarization
- Modélisation Automatique Du Rythme De La Parole
- Transcription de Verbalisations pour l’Analyse du Discours lors de Scénarios en Réalité Virtuelle
- Mise en œuvre d’un prototype de reconnaissance vocale comparative appliqué à l’apprentissage du langage oral
https://www.irit.fr/SAMOVA/site/jobs/
| ||||||
6-47 | (2023-12-07) Stage INA, Bry-sur-Marne, France Nous proposons un stage de recherche (Bac+5) au service recherche de l'Institut National de l'Audiovisuel (INA). Le stage porte sur la détection de l'activité vocale dans des corpus audiovisuels à l'aide de représentations auto-supervisées. D'autres stages sont également proposés au sein de l'INA, l'ensemble des sujets peuvent être retrouvés sur la page suivante : https://www.ina.fr/institut-national-audiovisuel/equipe-recherche/stages.
Détection de l'activité vocale dans des corpus audiovisuels à l'aide de représentations auto-supervisées Stage de fin d’études d’Ingénieur ou de Master 2 – Année académique 2023-2024
Mots clés : deep learning, machine learning, self supervised models, voice activity detection, speech activity detection, wav2vec 2.0 Contexte L’Institut National de l’Audiovisuel (INA) est un établissement public à caractère industriel et commercial (EPIC), dont la mission principale consiste à sauvegarder et promouvoir le patrimoine audiovisuel français à travers la vente d’archives et la gestion du dépôt légal. À ce titre, l’Institut capte en continu 180 chaînes de télévision et radio et stocke plus de 25 millions d’heures de contenu audiovisuel. L’INA assure également des missions de formation, de production et de recherche scientifique. Le service de la recherche de l’INA mène depuis plus de 20 ans des travaux de recherche dans le domaine de l’indexation et de la description automatique de ces fonds selon l’ensemble des modalités : textes, sons et images. Le service participe à de nombreux projets collaboratifs de recherche que ce soit dans un cadre national et européen et accueille des stages de Master ainsi que des doctorants en co-encadrement avec des laboratoires nationaux d’excellence. Ce stage est proposé au sein de l’équipe de recherche (https://recherche.ina.fr) et se place dans le cadre d’un projet collaboratif financé par l’ANR : Gender Equality Monitor (GEM). D’autres sujets de stage sont également proposés dans l’équipe : https://www.ina.fr/institut-national-audiovisuel/equipe-recherche/stages Objectifs du stage La détection d’activité vocale (Voice Activity Detection - VAD) est une tâche d’analyse audio qui vise à identifier les portions d’enregistrement contenant de la parole humaine, les distinguant des autres parties du signal contenant du silence, des bruits de fond ou de la musique. Souvent considérée comme un prétraitement, cette méthode utilisée en amont des tâches de reconnaissance automatique de la parole, des locuteurs ou des émotions. Si les outils VAD existants permettent d’obtenir d’excellents résultats sur les programmes d’information ou les émissions de plateau [Dou18a, Bre23], les recherches récentes menées à l’INA ont révélé que les performances des systèmes état-de-l’art sont moindres pour un grand nombre de matériaux peu représentés dans les corpus de parole annotés. Ces contenus, qui ont fait l’objet d’une campagne d’annotation interne, incluent des émissions musicales, des dessins animés, du sport, des fictions, des jeux télévisés et des documentaires. L'objectif du stage est de développer des modèles de détection d'activité vocale (VAD) en adoptant une approche fondée sur le paradigme d'apprentissage auto-supervisé et s’appuyant sur les architectures transformerstelles que wav2vec 2.0 [Bae20]. Les modèles basés sur ces architectures permettent d’obtenir des résultats état de l'art sur de nombreuses tâches de traitement de la parole à l’aide de quantités d’exemples annotés limitées : transcription, compréhension, traduction, détection d'émotions, reconnaissance de locuteur, détection du langage, etc [Li22, Huh23, Par23]. Plusieurs études récentes ont démontré l’efficacité des approches auto-supervisées pour la VAD [Gim21, Kun23], mais ont à ce jour été entraînées et évaluées sur des données ne reflétant pas la diversité des contenus audiovisuels. Le stage proposé vise à exploiter les millions d'heures de contenu audiovisuel conservés à l’INA pour l'entraînement et l’amélioration des modèles. Les modèles réalisés seront intégrés au logiciel open-source inaSpeechSegmenter, utilisé entre autres pour le décompte du temps de parole des femmes et des hommes dans les programmes à des fins de recherche ou de régulation du paysage audiovisuel [Dou18b, Arc23]. Valorisation du stage Différentes stratégies de valorisation des travaux seront envisagées, en fonction de leur degré de maturité et des orientations envisagées pour la suite des travaux : ● Diffusion des modèles réalisés sous licence open-source sur HuggingFace et/ou le dépôt Github de l’INA : https://github.com/ina-foss ● Rédaction de publications scientifiques Conditions du stage Le stage se déroulera sur une période de 4 à 6 mois, au sein du service de la Recherche de l’Ina. Il aura lieu sur le site Bry 2, situé au 28 Avenue des frères Lumière, 94360 Bry-sur-Marne.La·le stagiaire sera encadré·e par Valentin Pelloin et David Doukhan. Un ordinateur équipé d’un GPU sera fourni ainsi qu’un accès au cluster de calcul de l’Institut. Gratification : 760 € brut / mois + 50 % pass navigo Télétravail : possible une journée par semaine Contact Pour soumettre votre candidature à ce stage, ou pour solliciter davantage d’informations, nous vous invitons à envoyer votre CV et votre lettre de motivation par e-mail aux adresses suivantes : vpelloin@ina.fr et ddoukhan@ina.fr. Profil recherché ● Étudiant·e en dernière année d’un bac +5 dans le domaine de l’informatique et de l'IA ● Forte appétence pour la recherche académique ● Intérêt pour le traitement automatique de la parole ● Maîtrise de Python et expérience dans l’utilisation de bibliothèques de ML ● Capacité à effectuer des recherches bibliographiques ● Rigueur, Synthèse, Autonomie, Capacité à travailler en équipe Bibliographie [Arc23] ARCOM (2023). “La représentation des femmes à la télévision et à la radio - Rapport sur l'exercice 2022” [en ligne]. [Bae20] A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” Neural Information Processing Systems, Jun. 2020. [Bre23] Bredin, H. (2023). pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe, in INTERSPEECH 2023, ISCA, pp. 1983–1987. [Dou18a] Doukhan, D., Carrive, J., Vallet, F., Larcher, A., & Meignier, S. (2018, April). An open-source speaker gender detection framework for monitoring gender equality. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5214-5218). IEEE. [Dou18b] Doukhan, D., Poels, G., Rezgui, Z., & Carrive, J. (2018). Describing gender equality in french audiovisual streams with a deep learning approach. VIEW Journal of European Television History and Culture, 7(14), 103-122. [Gim21] P. Gimeno, A. Ortega, A. Miguel, and E. Lleida, “Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021,” in Interspeech 2021, ISCA, Aug. 2021, pp. 4359–4363. [Huh23] Huh, J., Brown, A., Jung, J. W., Chung, J. S., Nagrani, A., Garcia-Romero, D., & Zisserman, A. (2023). Voxsrc 2022: The fourth voxceleb speaker recognition challenge. arXiv preprint arXiv:2302.10248. [Kun23] M. Kunešová and Z. Zajíc, “Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2023, pp. 1–5. [Li22] Li, M., Xia, Y., & Lin, F. (2022, December). Incorporating VAD into ASR System by Multi-task Learning. In 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 160-164). IEEE. [Par23] Parcollet, T., Nguyen, H., Evain, S., Boito, M. Z., Pupier, A., Mdhaffar, S., ... & Besacier, L. (2023). LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech. arXiv preprint arXiv:2309.05472.
|