ISCApad #276 |
Tuesday, June 15, 2021 by Chris Wellekens |
6-1 | (2021-01-03) Postdoc researcher, Seikei University, Japan We are seeking a highly motivated and ambitious post-doctoral researcher for the project of ?socially and culturally aware human-agent interaction,? led by Prof. Yukiko Nakano at Seikei University in Tokyo, Japan. The project is part of a larger government funded project for a human-avatar symbiotic society. The mission of our group (http://iui.ci.seikei.ac.jp/en/) is to research and develop technologies for human-agent/robot interaction, behavior generation, and behavior adaptation by focusing on social and cultural awareness.
| ||
6-2 | (2021-01-04) Open positions at IDIAP, Martigny, Suisse There is a fully funded PhD position open at Idiap Research Institute on 'Neural
| ||
6-3 | (2021-01-04) research scientist in spoken language processing at Naver Labs Europe, Grenoble, France We are seeking to recruit a research scientist in spoken language processing at Naver Labs Europe (Grenoble, France) - https://europe.naverlabs.com
More details below (you can apply online here as well)
DESCRIPTIONNAVER LABS Europe's mission is to create new ways to interact with digital and physical agents, while paving the way for these innovations into a number of NAVER flagship products and services. This includes research in models and algorithms to give humans faster and better access to data and to allow them to interact with technology in simpler and more natural ways. To fulfill our vision of intelligent devices communicating seamlessly with us, we need to considerably improve existing technology and methods that solve natural language processing problems. We are looking for applications from research scientists to make outstanding contributions to the invention, development and benchmarking of spoken language processing techniques. The research scientist would be part of the Natural Language Processing group of NAVER LABS Europe and her mission would be to develop research on one or more of the following themes: spoken language translation, speech recognition, text-to-speech synthesis, voice-based conversational search (with potential collaborations with the Search&Recommendation group). At NAVER LABS we encourage participation in the academic community. Our researchers collaborate closely with universities and regularly publish in venues such as ACL, EMNLP, Interspeech, KDD, SIGIR, ICLR, ICML and NeurIPS. REQUIRED SKILLS- Ph.D. in spoken language processing, speech processing, NLP or machine learning.
- Knowledge of latest developments in statistical and deep learning as applied to NLP and speech. - Strong publication record in top-tier NLP, speech or machine learning conferences. - Strong development skills, preferably in python and knowledge of relevant frameworks (tensorflow, pytorch, etc). APPLICATION INSTRUCTIONSYou can apply for this position online. Don't forget to upload your CV and cover letter before you submit. Incomplete applications will not be accepted.
ABOUT NAVER LABSNAVER LABS Europe has full-time positions, PhD and PostDoc opportunities throughout the year which are advertised here and on international conference sites that we sponsor such as CVPR, ICCV, ICML, NeurIPS, EMNLP, ACL etc. NAVER LABS Europe is an equal opportunity employer. NAVER LABS are in Grenoble in the French Alps. We have a multi and interdisciplinary approach to research with scientists in machine learning, computer vision, artificial intelligence, natural language processing, ethnography and UX working together to create next generation technology and services that deeply understand users and their contexts.
| ||
6-4 | (2021-01-07) Speech-NLP Master 2 Internship Year 2020-2021 at LISN (ex LIMSI), University Paris-Saclay, France Speech-NLP Master 2 Internship Year 2020-2021 Speech Segmentation and Automatic Detection of Conflicts in Political Interviews LISN – Université Paris-Saclay Internship for Last Year Engineer or Master 2 Students Keywords: Machine Learning, Diarization, Digital Humanities, Political Speech, Prosody, Expressive Speech Context This internship is part of the Ontology and Tools for the Annotation of Political Speech (OOPAIP 2018), a transdisciplinary project funded under the DIM-STCN (Text Sciences and New Knowledge) by the Regional Council of Ile de France. The project is carried out by the European Center for Sociology and Political Science (CESSP) of the University of Paris 1 Panthéon-Sorbonne, the National Audiovisual Institute (INA), and the LISN. Its objective is to design new approaches to develop detailed, qualitative, and quantitative analyzes of political speech in the French media. Part of the project concerns the study of the dynamics of conflicting interactions in interviews and political debates, which requires a detailed description and a large corpus to allow for the models’ generalization. Some of the main challenges concern the performance of speaker and speech style segmentation, e.g., improving the segmentation accuracy, detecting superimposed speech, measuring vocal effort and other expressive elements. Objectives The main objective of the internship is to improve the automatic segmentation of political interviews. In this context, we will be particularly interested in the detection of hubbub (strong and prolonged overlapped speech). More precisely, we would like to extract features from the speech signal (Eyben et al. 2015) correlated with the level of conflictual content in the exchanges, based, for example, on the arousal level in the speaker’s voice—intermediate level between the speech signal analysis and the expressivity description (Rilliard, d’Alessandro, and Evrard 2018)—or vocal effort (Liénard 2019). The internship will initially be based on two corpora of 30 political interviews manually annotated in speech turns and speech acts—within the framework of the OOPAIP project. It will begin with a state of the art review of speech diarization and overlapped speech detection (Chowdhury et al. 2019). The aim will then be to propose solutions based on recent frameworks (Bredin et al. 2020) to improve the precise localization of speaking segments, in particular when the frequency of speaker changes is high. In the second part of the internship, we will look at a more detailed measurement and prediction of the conflicting level of exchanges. We will search for the most relevant features to describe the conflicting level and by adapting or developing a neural network architecture for its modeling. The programming language used for this internship will be Python. The candidate will have access to the LISN computing resources (servers and clusters with recent generation GPUs).
Publications Depending on the degree of maturity of the work carried out, we expect the applicant to: • Distribute the tools produced under an open-source license • Write a scientific publication Conditions The internship will take place over a period of 4 to 6 months at the LISN (formerly LIMSI) in the TLP group (spoken language processing). The laboratory is located near the plateau de Saclay, university campus building 507, rue du Belvédère, 91400 Orsay. The candidate will be supervised by Marc Evrard (evrard@limsi.fr). Allowance under the official standards (service-public.fr). Applicant profile • Student in the last year of a 5-years diploma in the field of computer science (AI is a plus) • Proficiency in Python language and experience in using ML libraries (Scikit-Learn, Tensor- Flow, PyTorch) • Strong interest in digital humanities and political science in particular • Experience in automatic speech processing is preferred • Ability to carry out a bibliographic study from scientific articles written in English To apply: Send an email to evrard@limsi.fr including a résumé and a cover letter. Bibliography Bredin, Hervé, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, and Marie-Philippe Gill. 2020. “Pyannote. Audio: Neural Building Blocks for Speaker Diarization.” In ICASSP. IEEE. Chowdhury, Shammur Absar, Evgeny A Stepanov, Morena Danieli, and Giuseppe Riccardi. 2019. “Automatic Classification of Speech Overlaps: Feature Representation and Algorithms.” Computer Speech & Language 55: 145–67. Eyben, Florian, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y Devillers, et al. 2015. “The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing.” IEEE Transactions on Affective Computing 7 (2): 190–202. Liénard, Jean-Sylvain. 2019. “Quantifying Vocal Effort from the Shape of the One-Third Octave Long-Term-Average Spectrum of Speech.” The Journal of the Acoustical Society of America 146 (4): EL369–75. OOPAIP. 2018. “(Ontologie Et Outil Pour l’annotation Des Interventions Politiques).” DIM STCN (Sciences du Texte et connaissances nouvelles) Conseil régional d’Ile de France. http://www.dim-humanites-numeriques.fr/projets/oopaip-ontologie-et-outilspour- lannotation-des-interventions-politiques/. Rilliard, Albert, Christophe d’Alessandro, and Marc Evrard. 2018. “Paradigmatic Variation of Vowels in Expressive Speech: Acoustic Description and Dimensional Analysis.” The Journal of the Acoustical Society of America 143 (1): 109–22.
| ||
6-5 | (2021-01-13) PhD position at CWI, Amsterdam, The Netherlands We have a PhD position available here in CWI, on the topic of user-centered optimisation for immersive media.
The full ad, including the link to apply, can be found here: https://www.cwi.nl/jobs/vacancies/868111
I would like to ask you if you could disseminate the call within your network. You can also redirect any potential candidate to me, if they have any questions: irene@cwi.nl
The deadline for applications is February 1st.
| ||
6-6 | (2021-01-19) Associate professor, Telecom Paris, France Telecom Paris is hiring an associate professor in machine learning for distributed/multi-view machine listening and audio content analysis
| ||
6-7 | (2021-02-15) Ingenieur contractuel Police Technique et Scientifique France
Un poste d'ingénieur contractuel à la section audio de la police technique et scientifique est à pourvoir.
| ||
6-8 | (2021-03-08) Fully funded PhD at KTH, Stockholm, Sweden A fully funded PhD position in Deep Learning for Conversational AI KTH, Royal Institute of Technology, Stockholm, Sweden. Apply here (deadline 2/4) https://www.kth.se/en/om/work-at-kth/lediga-jobb/what:job/jobID:379667/where:4/
| ||
6-9 | (2021-03-08) PhD and RA positions at University of Trento, Italy PhD and RA Positions in Conversational AI in the Health Domain? at University of Trento, Italy
and add this link :
| ||
6-10 | (2021-03-08) Two PhD positions at NTNU, Trondheim, Norway. Two Two PhD positions are open at NTNU Trondheim, Norway
| ||
6-11 | (2021-03-09) Associate professor at Telecom Paris, France
Note that you do *not* need to speak French to apply.
| ||
6-12 | (2021-03-16) PhD position at INRIA, Nancy, France ********** PhD position *************
Title: Robust and Generalizable Deep Learning-based Audio-visual Speech EnhancementThe PhD thesis will be jointly supervised by Mostafa Sadeghi (Inria Starting Faculty Position) and Romain Serizel (Associate Professor, Université de Lorraine).
Contacts: Mostafa Sadeghi (mostafa.sadeghi@inria.fr) and Romain Serizel (romain.serizel@loria.fr)
Context: Audio-visual speech enhancement (AVSE) refers to the task of improving the intelligibility and quality of a noisy speech utilizing the complementary information of visual modality (lips movements of the speaker) [1]. Visual modality can help distinguish target speech from background sounds especially in highly noisy environments. Recently, and due to the great success and progress of deep neural network (DNN) architectures, AVSE has been extensively revisited. Existing DNN-based AVSE methods are categorized into supervised and unsupervised approaches. In the former category, a DNN is trained to map noisy speech and the associated video frames of the speaker into a clean estimate of the target speech. The unsupervised methods [2] follow a traditional maximum likelihood-based approach combined with the expressive power of DNNs. Specifically, the prior distribution of clean speech is learned using deep generative models such as variational autoencoders (VAEs) and combined with a likelihood function based on, e.g., non-negative matrix factorization (NMF), to estimate the clean speech in a probabilistic way. As there is no training on noisy speech, this approach is unsupervised. Supervised methods require deep networks, with millions of parameters, as well as a large audio-visual dataset with diverse enough noise instances to be robust against acoustic noise. There is also no systematic way to achieve robustness to visual noise, e.g., head movements, face occlusions, changing illumination conditions, etc. Unsupervised methods, on the other hand, show a better generalization performance and can achieve robustness to visual noise thanks to their probabilistic nature [3]. Nevertheless, their test phase involves a computationally demanding iterative process, hindering their practical use.
Objectives: Project description: In this PhD project, we are going to bridge the gap between supervised and unsupervised approaches, benefiting from both worlds. The central task of this project is to design and implement a unified AVSE framework having the following features: 1- Robustness to visual noise, 2- Good generalization to unseen noise environments, and 3- Computational efficiency at test time. To achieve the first objective, various techniques will be investigated, including probabilistic switching (gating) mechanisms [3], face frontalization [4], and data augmentation [5]. The main idea is to adaptively lower bound the performance by that of audio-only speech enhancement when the visual modality is not reliable. To accomplish the second objective, we will explore techniques such as acoustic scene classification combined with noise modeling inspired by unsupervised AVSE, in order to adaptively switch to different noise models during speech enhancement. Finally, concerning the third objective, lightweight inference methods, as well as efficient generative models, will be developed. We will work with the AVSpeech [6] and TCD-TIMIT [7] audio-visual speech corpora.
References: [1] D. Michelsanti, Z. H. Tan, S. X. Zhang, Y. Xu, M. Yu, D. Yu, and J. Jensen, ?An overview of deep-learning based audio-visual speech enhancement and separation,? arXiv:2008.09586, 2020. [2] M. Sadeghi, S. Leglaive, X. Alameda-Pineda, L. Girin, and R. Horaud, ?Audio-visual speech enhancement using conditional variational auto-encoders,? IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 28, pp. 1788 ?1800, 2020. [3] M. Sadeghi and X. Alameda-Pineda, ?Switching variational autoencoders for noise-agnostic audio-visual speech enhancement,? in ICASSP, 2021. [4] Z. Kang, M. Sadeghi, R. Horaud, ?Face Frontalization Based on Robustly Fitting a Deformable Shape Model to 3D Landmarks,? arXiv:2010.13676, 2020. [5] S. Cheng, P. Ma, G. Tzimiropoulos, S. Petridis, A. Bulat, J. Shen, M. Pantic, ?Towards Pose-invariant Lip Reading,? in ICASSP, 2020. [6] A. Ephrat, I. Mosseri, O. Lang, T. Dekel, K. Wilson, A. Hassidim, W.T. Freeman, M. Rubinstein, ?Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation,? SIGGRAPH 2018. [7] N. Harte and E. Gillen, ?TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech,? IEEE Transactions on Multimedia, vol.17, no.5, pp.603-615, May 2015. Skills:
Benefits package:
Remuneration:Salary: 1982? gross/month for 1st and 2nd year. 2085? gross/month for 3rd year. Monthly salary after taxes: around 1596,05? for 1st and 2nd year. 1678,99? for 3rd year. (medical insurance included).
| ||
6-13 | (2021-03-20) Post-doc at Nara Institute for Science and Technology, Japan [Postdoctoral researcher, Nara Institute of Science and Technology]
| ||
6-14 | (2021-04-05) Researchers in Speech, Text and Multimodal Machine Translation @ DFKI Saarbrücken, Germany Researchers in Speech, Text and Multimodal Machine Translation @ DFKI Saarbrücken, Germany
| ||
6-15 | (2021-04-02) PhD at Université d'Avignon, France **** If you don't read French and are interested in a PhD position in AI/NLP please Les réponses doivent nous parvenir de préférence **avant le 10 mai**. PROPOSITION SUJETS DE THESES CONTRATS DOCTORAUX 2021-2024 Appel cible (merci de cocher la case correspondante): X Contrat doctoral ministeriel ED 536 â–¡ Contrat doctoral ministeriel ED 537 ------------------------------------------------------------------------------------------------------------------------ Directeur de these : Fabrice LEFEVRE Co-directeur eventuel : Co-encadrant eventuel : Bassam JABAIAN Titre en francais : Transformer et renforcer pour le transfert et l’apprentissage en ligne des agents conversationnels vocaux Titre en anglais : Transformer and Reinforce for transfer and online learning of vocal conversational agents Mots-cles : IA, natural language processing , human-machine vocal interactions, deep learning, deep reinforcement learning, transfer learning Co tutelle : XXX - Non Pays : Opportunites de mobilite a l’international du doctorant dans le cadre de sa these : oui Profil du candidat : Le candidat doit avoir un master en informatique avec une composante sur les méthodes d'apprentissage automatique et/ou sur l’ingénierie de la langue. La bourse de thèse fera l’objet d’un concours au sein de l’Ecole Doctorale 536 de l’université d’Avignon, avec une audition du candidat retenu par les encadrants de thèse. Pour postuler merci d’envoyer un mail avant le 10 mai 2021 à Fabrice Lefèvre (fabrice.lefevre@univ-avignon.fr) et Bassam Jabaian (bassam.jabaian@univ-avignon.fr) incluant : votre CV, une lettre de motivation avec votre positionnement sur les propositions d’études ci-dessous, d’éventuelles lettres de recommandation et vos relevés de notes. Presentation detaillee du sujet : Domaine / Thematique : IA/NLP Objectif : Permettre le transfert et l'apprentissage en ligne des agents conversationnels vocaux avec une combinaison Transformers/Renforcement Contexte et enjeux : Parmi les activités de recherche en intelligence artificielle, améliorer l'interaction vocale avec les machines reste un défi majeur d’actualité. Le LIA traite de multiples aspects liés à l’interaction vocale mais cherche à travers cette thèse à approfondir en particulier la recherche sur les techniques d’apprentissage des agents conversationnels vocaux à base de réseaux de neurones profonds supervisés et renforcés. De tels agents dialoguant sont un enjeu primordial afin d’améliorer les capacités de nos sociétés à gérer une distanciation sociale contrôlée, notamment par la délégation de certaines tâches risquées à des artefacts matériels efficients, et bien acceptés par le grand public. Les récentes évolutions en réseaux de neurones ont permis d’élaborer des systèmes de génération de texte (ou modèles de langage) de grande qualité. Ils sont pour cela appris sur des quantités gigantesques de documents, mais permettent en contrepartie une couverture très large du langage humain. Les représentants les plus avancés dans ce domaine sont les Transformers, qui permettent d’éliminer le recours à la récurrence dans les réseaux (couteux en calcul) en privilégiant un mécanisme d’attention démultipliée (multi-head self-attention). De nombreux dérivés de ces modèles existent et ont permis des gains conséquents en performance sur de nombreuses tâches impliquant la génération de texte en langage naturel. Ainsi BERT [1] et GPT forment les grandes familles (et leurs multiples descendants distilBERT, alBERT, GPT-2…). Mais si de tels modèles permettent de porter à un plus haut niveau de performance nos capacités de modélisation du langage, il reste encore à savoir les mettre en oeuvre pour des tâches plus spécifiques ou exigeantes, comme les systèmes d’interaction orale. Ainsi le problème de leur application au cas des agents conversationnels reste ouvert car à la fois l’interaction directe avec les humains accentue l’impact des erreurs et imperfections des modèles et d’autre part la gestion des interactions se fait dans un contexte finalisé, où l’objectif n’est pas le simple échange de données langagières mais la réussite d’un objectif latent (obtenir une information précise, réaliser ou faire réaliser une action…). Aussi le challenge principal que nous souhaitons porter dans la thèse est de permettre une adaptation sur une tache particuliere des capacites d’un Transformer pre-entraine, notamment pour l’elaboration d’un agent conversationnel. Des approches par transfert d’apprentissage ont déjà été initiées mais leurs résultats sont contrastés et doivent être renforcés [2]. Nous identifions deux axes majeurs pour la thèse : Axe 1/ Transfert et apprentissage en ligne / Tout d’abord les approches de transfert reposent toujours sur le recours à de nouvelles données pré-collectées auxquelles sont confrontés les modèles [2]. Ainsi, dans la continuité de nos précédents travaux sur l’apprentissage en ligne des systèmes de dialogue, nous souhaiterions élaborer et évaluer des strategies efficaces pour permettre le recours a des apprentissages par renforcement [3, 4]. Pour rendre les systèmes artificiels capables d'apprendre à partir des données, deux hypothèses fortes sont généralement faites : (1) la stationnarité du système (l'environnement de la machine ne change pas avec le temps), (2) l'interdépendance entre la collecte des données et le processus d'apprentissage (l'utilisateur ne modifie pas son comportement dans le temps). Or les utilisateurs ont une tendance naturelle à adapter leur comportement en fonction des réactions de la machine, ce qui gêne la convergence de l'apprentissage vers un équilibre lui permettant de satisfaire en permanence les attentes de l'utilisateur. Aussi les interfaces vocales doivent évoluer vers une nouvelle génération de systèmes interactifs, capables d'apprendre dynamiquement sur le long terme à partir d'interactions, tout en anticipant les variations du comportement des humains, étant eux-mêmes vu comme des systèmes évolutifs. L’enjeu est alors, dans le contexte de l’apprentissage par renforcement profond [5] de pouvoir démontrer l’optimalité de la convergence des algorithmes utilisés pour mettre à jour les poids de certaines couches du modèle au fur et à mesure des interactions avec des utilisateurs, sans prendre le risque d’une atténuation des performances initiales. La détermination optimale des paramètres à modifier doit pouvoir être automatisée. Ce projet s’inscrit aussi dans le cadre de l’apprentissage en continu (continual learning) [6] d’un agent conversationnel. Axe 2/ Modelisation de l’oral / Ensuite l’essentiel des modèles pré-cités modélisent exclusivement le langage écrit et intègrent peu de mécanismes dédiés à la nature du langage parlé. Aussi nous souhaiterions augmenter les capacités de telles machines à faire face à : 1) des entrées utilisateurs plus naturelles, et comprenant donc de nombreux écarts vis-à-vis de l’écrit (agrammaticalité, confusions, reprises, corrections, hésitations…) et 2) des erreurs dans les transcriptions dues au composant de reconnaissance de la parole. Il est donc nécessaire de pouvoir interfacer le composant d’analyse de la parole avec la chaine de modelisation du langage qui suit (analyse sémantique, suivi de l’état de dialogue, gestion du dialogue, génération et synthèse de parole) de manière à prendre en compte les multiples hypotheses realistes (et non plus seulement la meilleure). Et enfin permettre un arbitrage entre ces hypothèses qui prenne en compte les traitements suivants, en conformité avec le processus cognitif humain équivalent (capable de re-traiter ses hypothèses acoustiques les plus probables en cas de conflit avec ses inférences sémantiques). Cette étude pourra être menée dans plusieurs cadres applicatifs, à préciser au démarrage de la thèse : par exemple un robot Pepper dialoguant affecté à la gestion de l’accueil d’un lieu public (par exemple dans un hôpital ou un musée). Il sera alors possible de déléguer des tâches de premier contact et d’orientation à des artefacts insensibles aux transmissions biologiques, ce qui constitue un atout hautement stratégique afin d’améliorer la gestion d’une situation de crise, du type de la pandémie mondiale de coronavirus en cours. [1] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv.org, Oct. 2018. [2] T. Wolf, V. Sanh, J. Chaumond, and C. Delangue, “TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents,” arXiv.org, Jan. 2019. [3] E. Ferreira, B. Jabaian, and F. Lefèvre, “Online adaptative zero-shot learning spoken language understanding using word-embedding,” in Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, 2015, pp. 5321–5325. [4] M. Riou, B. Jabaian, S. Huet, and F. Lefèvre, “Joint On-line Learning of a Zero-shot Spoken Semantic Parser and a Reinforcement Learning Dialogue Manager,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019, 2019, pp. 3072–3076. [5] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “A Brief Survey of Deep Reinforcement Learning,” IEEE SIGNAL Process. Mag. Spec. ISSUE Deep Learn. IMAGE Underst., Aug. 2017. [6] Z. Chen and B. Liu, Lifelong Machine Learning, Second Edition, vol. 12, no. 3. Morgan & Claypool Publishers, 2018. Les sujets devront être adressés à secretariat-ed@univ-avignon.fr
| ||
6-16 | (2021-04-15) Director, Center for Language and Speech Processing, Baltimore, MA, USA POSITION: Director, Center for Language and Speech Processing REPORTS TO: Ed Schlesinger, Benjamin T. Rome Dean Johns Hopkins University, Whiting School of Engineering INSTITUTION: Johns Hopkins University, Baltimore, MD https://engineering.jhu.edu/ 2.23.21 The Whiting School of Engineering at Johns Hopkins University invites nominations and applications for the position of Director of the Center for Language and Speech Processing (CLSP). The Director will be appointed as a full-time tenured faculty member in the Whiting School of Engineering and will be encouraged to remain active in research, with strategic leadership of the Center as their top priority. This is an outstanding opportunity for an accomplished scholar with leadership experience to further strengthen an exceptional interdisciplinary research center at the nation’s first research university. The best candidates will embody the intellectual distinction, entrepreneurial capacity, collaborative spirit, transparency, inclusiveness, and creativity that characterize the School’s culture and will bring a scholarly record deserving appointment as tenured professor at The Johns Hopkins University. The Center for Language and Speech Processing CLSP is one of the Whiting School’s 25 Interdisciplinary Centers and Institutes. The Center currently comprises over 25 tenure-line and research faculty whose primary appointments are in the Whiting School of Engineering or in other closely related schools, along with over 70 PhD students. CLSP was established in 1992 and grew to prominence under the directorship of the late Frederick Jelinek. It aims to understand how human language is used to communicate ideas, and to develop technology for machine analysis, translation, and transformation of multilingual speech and text. In 2007 CLSP gained a sibling, the national Human Language Technology Center of Excellence (https://hltcoe.jhu.edu), a governmentfunded research center at Johns Hopkins that develops critical speech and language technology for government use; several HLTCOE researchers are tightly integrated into CLSP. Recently, CLSP has further expanded its research portfolio by adding several prominent researchers in computer vision and related fields. As part of its educational mission, CLSP coordinates a full complement of courses dealing with a diverse array of topics in language and speech. It offers a weekly seminar featuring prominent visiting speakers in speech and language processing. It also runs the Fred Jelinek Memorial Workshop in Speech and Language Technology (JSALT), a widely-known residential research workshop that annually assembles teams of researchers from around the world to spend 6 summer weeks conducting intensive research on fundamental problems. Held annually since 1995, the workshop has produced many important advances in speech and language technology. Opportunities for the Center Director The CLSP Director will work with colleagues in and beyond CLSP to increase its impact by both enhancing its historic strengths and positioning it as a central element of a set of AI-related initiatives across the Whiting School and the University more broadly. To these ends, the Director will identify ways in which the Center will continue to grow and evolve and through which the Center, the Whiting School, and Hopkins can recruit, sustain, and deploy the human and financial resources needed to further distinguish itself.The Director will work to maintain the Center’s position as the disciplinary and intellectual hub of language and speech processing research within the University, enabling CLSP to contribute to and benefit from the success of significant institutional investment in artificial intelligence and machine learning more broadly, including potential applications to key societal problems such as healthcare and scientific endeavors such as linguistics and neuroscience. Collaborations with the Applied Physics Lab (www.jhuapl.edu) present opportunities to bring additional resource, expertise, and scale to advance CLSP research including potentially in classified research. Beyond Hopkins, CLSP’s Director will foster connections with industry as part of the Center’s efforts to expand its base of resources and relationships, to disseminate knowledge and discoveries, and to develop and transfer technologies that may have an impact in the world. In these various external activities, the Director will work with the University’s technology ventures office (https://ventures.jhu.edu), with faculty and students, and with alumni and donors. Specific strategies for enhancing CLSP’s strengths, broadening its impact, and positioning it relative to Hopkins-wide initiatives, along with measures of success and the prioritization of activities designed to achieve success, will be developed by the Director in collaboration with CLSP’s faculty and the Dean. Diversity, equity, and inclusion at the Whiting School WSE has a stated commitment to diversity, equity, and inclusion: “Diversity and inclusion enrich our entire community and are critical to both educational excellence and to the advancement of knowledge. Discovery, creativity, and innovation flourish in an environment where the broadest range of experiences are shared, where all voices are heard and are valued, and where individuals from different cultures and backgrounds can collaborate freely to understand and solve problems in entirely new ways.” As the leader of the Center and within the School, CLSP’s Director will work to enhance and expand diversity and inclusion at all levels and will ensure that the Center is a welcoming and supportive environment for all. Position Qualifications The new Director will be a proven, entrepreneurial leader who can bring faculty, staff, and students together to pursue a compelling vision of CLSP as an international hub for Language and Speech Processing research and as a site of innovation, teaching, and translation. They will have strong skills for mentoring junior faculty and will promote the interests of the Center. Intellectual curiosity and fundraising experience are valued. They will have a dossier that represent a distinguished track record of scholarship and teaching; a passionate commitment to research, discovery, and application; and an interest in and success at academic administration. Expected educational background and qualifications include: • An earned doctorate in an area such as electrical and computer engineering, computer science, or a closely related field and a scholarly record deserving appointment as tenured professor at The Johns Hopkins University; • Recognized leadership in their respective field with a distinguished national and international reputation for research and education; • Excellent communication skills in both internal and external interactions; • Strong commitment to diversity and inclusion at all levels among faculty, students, and staff, along with measurable and sustained impact on the diversity and inclusiveness of organizations they have led or been part of; and • Leadership and administrative experience within a complex research environment or in national/international organizations connected to their respective field.
* The Whiting School of Engineering has engaged Opus Partners (www.opuspartners.net) to support the recruitment of the CLSP Director. Craig Smith, Partner, and Jeff Stafford, Senior Associate, are leading the search. Applicants should submit their CV and a letter of interest outlining their research and leadership experience to Jeffrey.stafford@opuspartners.net. Nominations, expressions of interest, and inquiries should go to the same address. Review of credentials will begin promptly and will continue until the appointment is finalized. Every effort will be made to ensure candidate confidentiality. The Whiting School of Engineering and CLSP are committed to building a diverse educational environment, and women and minorities are strongly encouraged to apply. Johns Hopkins University is an equal opportunity employer and does not discriminate on the basis of gender, marital status, pregnancy, race, color, ethnicity, national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, other legally protected characteristics or any other occupationally irrelevant criteria. The University promotes Affirmative Action for minorities, women, individuals who are disabled, and veterans. Johns Hopkins University is a drug-free, smoke-free workplace.
| ||
6-17 | (2021-04-11) These CIFRE: Système dialogique de questions-réponses contrôlé : application aux forums sur la santé des femmes, LIG, Univ. Grenoble, France Offre de thèse CIFRE: Système dialogique de questions-réponses contrôlé : application aux forums sur la santé des femmes Laboratoire d'Informatique de Grenoble / Université Grenoble Alpes (http://lig-getalp.imag.fr/), Grenoble Société Shesmet (https://www.shesmet.com), Paris L?objectif de cette thèse de doctorat est de concevoir des méthodes permettant à un système de dialogue de répondre précisément à une question concernant la santé intime des femmes. En effet, la santé génésique et sexuelle des femmes est un sujet encore trop peu abordé dans son ensemble et trop souvent résumé à la santé reproductive. Pourtant les femmes ont physiologiquement plusieurs étapes de vie qui vont impacter de manière plus ou moins forte leur bien-être mental et physique : la puberté, la maternité, la ménopause et l?après ménopause. La santé sexuelle des femmes est aussi un enjeu de politique publique qui a évolué au cours des ans et qui reste au c?ur des problématiques de notre société : précarité menstruelle, contraception, accès à l?IVG, violences sexuelles. L?accès à une information de qualité, personnalisée et en tout anonymat est un fort vecteur d?autonomisation et d?égalité de soins pour l?ensemble de la population féminine. Pourtant, aujourd'hui les femmes voulant se renseigner sur ces thèmes sont souvent en prise avec un flot d'informations qui peuvent être discordantes, incomplètes et de sources non vérifiables (p.ex., les forum de santé alimentés par les utilisateurs). C'est pourquoi Shesmet et le laboratoire d'informatique de Grenoble (LIG) s'associent pour proposer une méthode dialogique de question réponse qui permette d'adapter une réponse experte et vérifiée au contexte particulier d'une question de santé exprimée par une utilisatrice. Cette approche est originale dans le sens ou elle tire partie du meilleur des capacités humaines (réponses pertinente et sans erreur) et computationnelles (capacité des modèles profonds à traiter des données à grande échelle). Objectif de la thèse Au cours de la dernière décennie, les systèmes traitement automatique du langage naturel ont fait de grands progrès grâce à l'émergence de l'apprentissage profond. La technique est aujourd'hui suffisamment mature pour être intégrée dans les assistants personnels [Chen et Gao, 2017] et les systèmes de Question/Réponse. L'architecture actuelle des réseaux neuronaux comprend les RNN (LSTM/GRU) [Hochreiter et Schmidhuber, 1997 ; Cho et al., 2014] et les transformer [Vaswani et al., 2017], en combinaison avec les mécanismes d'attention [Bahdanau et al., 2014] pour permettre l'utilisation d'informations contextuelles allant au-delà d'un seul ou de quelques tours de dialogue [Bothe et al., 2018]. Cependant, ces corpus sont entraînés sur des masses de données tellement grandes et peu contrôlées que les modèles ont tendance à reproduire les comportements de ces données. Par exemple, les grands corpus de journaux généralistes font généralement la part belle au genre masculin. De même les systèmes de question/réponse sont généralement limités à trouver un extrait dans un grand corpus ou à générer une réponse à partir d'un modèle profond. Contrairement à ces systèmes de question réponses classiques, l'objectif sera ici de utiliser l'expertise de spécialistes en santé pour adapter une réponse au contexte de la question [Wu2019]. Ainsi, les experts humains conçoivent des réponses de grande qualité et vérifiées tandis que les systèmes profonds les adaptent aux plus grands nombre en évitant les erreurs usuelles des modèles profonds. La tâche est donc de concevoir un système capable : 1. de classifier les énoncés du dialogue et les associés à un ensemble de réponses pré-établies ; 2. d?éditer les réponses pré-établie afin de les adapter à la question et au contexte dialogique ; 3. d'estimer le degré de réassurance nécessaire à insérer dans la réponse ; 4. d'expliquer les réponses données. Dans le cadre de ce programme indicatif de travail, ce doctorat intéressera aux verrous suivants.
Environnement scientifique La thèse sera menée au sein de l'équipe Getalp du laboratoire LIG (https://lig-getalp.imag.fr/). La personne recrutée sera accueillie au sein de l?équipe qui offre un cadre de travail stimulant, multinational et agréable. Par ailleurs, la personne recrutée passera un temps significatif au sein de l'entreprise Shesmet. Shesmet est une startup en e-santé travaillant à la fois sur des projets de recherche et développement et sur des missions d?accompagnement autour de l?innovation en santé auprès d?institutionnels en santé, publics et privés. La société a lancé en 2020 My S Life, une plateforme d'information en santé intime et sexuelle de la femme (www. myslife.co) Les moyens pour mener à bien le doctorat seront assurés tant en ce qui concerne les missions en France et à l?étranger qu?en ce qui concerne le matériel (ordinateur personnel, accès aux serveurs GPU du LIG, Grille de calcul Jean Zay du CNRS). Comment postuler ? Les candidats doivent être titulaires d'un Master en informatique ou en traitement automatique du langage naturel (ou être sur le point d'en obtenir un). Ils doivent avoir une bonne connaissance des méthodes d?apprentissage automatique et idéalement une expérience en collecte et gestion de corpus. Ils doivent également avoir une bonne connaissance de la langue française. Une expérience dans le domaine du dialogue, des systèmes question réponse ou la génération automatique de textes serait un plus. Les candidatures sont attendues jusqu'au 3 mai 2021. Elles doivent contenir : CV + lettre/message de motivation + notes master + lettre(s) de recommandations; et être adressées à François Portet (Francois.Portet@imag.fr), Didier Schwab (Didier.Schwab@imag.fr) et Juliette Mauro (juliette.mauro@shesmet.com). References [Atanasova2020] P Atanasova, JG Simonsen, C Lioma, I Augenstein A Diagnostic Study of Explainability Techniques for Text Classification. Proceedings of EMNLP 2020 [Bahdanau2014] D Bahdanau, K Cho, Y Bengio. 'Neural machine translation by jointly learning to align and translate', arXiv preprint arXiv:1409.0473, 2014 [Bothe2018] Chandrakant Bothe, Cornelius Weber, Sven Magg, Stefan Wermter 'A Context-based Approach for Dialogue Act Recognition using Simple Recurrent Neural Networks', LREC 2018. [Chen2017] Yun-Nung Chen, Jianfeng Gao, Open-Domain Neural Dialogue Systems, IJCNLP 2017 [Cho2014] Cho K., van Merrienboer B., Gülçehre Ç., Bougares F., Schwenk H., Bengio Y., « LearningPhrase Representations using RNN Encoder-Decoder for Statistical Machine Translation », CoRR, 2014. [Garnerin2020] Mahault Garnerin, Solange Rossato, Laurent Besacier: Gender Representation in Open Source Speech Resources. LREC 2020: 6599-6605 [Hochreiter1997] Hochreiter S., Schmidhuber J., « Long Short-Term Memory »,Neural Comput., vol. 9, no8,p. 1735-1780, November, 1997 [Le2020] Le, Hang and Vial, Loic and Frej, Jibril and Segonne, Vincent and Coavoux, Maximin and Lecouteux, Benjamin and Allauzen, Alexandre and Crabbé, Benoit and Besacier, Laurent and Schwab, Didier (2020) FlauBERT: Unsupervised Language Model Pre-training for French, Proceedings of The 12th Language Resources and Evaluation Conference, Marseille, France, 2479--2490. https://github.com/getalp/Flaubert [ParlAI] https://parl.ai/docs/tutorial_basic.html [Vaswani2017] A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, et al. 'Attention is all you need', 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. [Wolf2019] Thomas Wolf and Victor Sanh and Julien Chaumond and Clement Delangue (2019) TransferTransfo: {A} Transfer Learning Approach for Neural Network Based Conversational Agents, arxiv, 2019 https://github.com/huggingface/transfer-learning-conv-ai [Wu2019] Wu, Y., Wei, F., Huang, S., Wang, Y., Li, Z., & Zhou, M. (2019, July). Response generation by context-aware prototype editing. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 7281-7288).
| ||
6-18 | (2021-04-09) Ingénieur.e développement - Inria Bordeaux, France Ingénieur.e développement - Inria Bordeaux Sud-Ouest Thématique : Conception d’une architecture logicielle pour une application en apprentissage statistique (analyse et classification des voix pathologiques) Type de contrat : CDD Début : à partir du 1er juin 2021 et jusqu’au 31 juillet 2021 (possibilité de prolongation) Date limite de candidature : 15 mai 2021 Lieu : Inria Bordeaux Sud-Ouest Niveau de diplôme exigé : Bac + 5 ou équivalent Autre diplôme apprécié : thèse de doctorat Fonction : Ingénieur scientifique contractuel Niveau d'expérience souhaité : 3 à 12 ans Salaire brut mensuel : 2632€ à 3543€, selon diplômes et expérience professionnelle acquise sur poste similaire Responsable : Khalid Daoudi Contexte et atouts du poste Inria, institut national de recherche dédié au numérique, promeut l’excellence scientifique au service du transfert technologique et de la société. Inria emploie 2700 collaborateurs issus des meilleures universités mondiales, qui relèvent les défis des sciences informatiques et mathématiques. Son modèle agile lui permet d’explorer des voies originales avec ses partenaires industriels et académiques, et de répondre aux enjeux pluridisciplinaires et applicatifs de la transition numérique. Engagé auprès des acteurs de l’innovation, Inria crée les conditions de rencontres profitables entre recherche publique, R&D privée et entreprises. Inria transfère vers les startup, les PME et les grands groupes ses résultats et ses compétences, dans des domaines tels que la santé, les transports, l’énergie, la communication, la sécurité et la protection de la vie privée, la ville intelligente, l’usine du futur... Inria développe aussi une culture entrepreneuriale ayant conduit à la création de 120 startup. Le centre Inria Bordeaux Sud-Ouest est un des neuf centres d’Inria et compte 20 équipes de recherche. Le centre Inria est un acteur majeur et reconnu dans le domaine des sciences numériques. Il est au coeur d’un riche écosystème de R&D et d’innovation : PME fortement innovantes, grands groupes industriels, pôles de compétitivité, acteurs de la recherche et de l’enseignement supérieur. GEOSTAT est une équipe de recherche Inria dont la thématique de recherche est le traitement de signaux naturels complexes, notamment en biophysique (geostat.bordeaux.inria.fr/). Mission confiée Plusieurs maladies et pathologies peuvent causer des dysfonctionnements ou des altérations dans la production de la parole. Les plus connues sont les maladies neurodégénératives (telles que les maladies de Parkinson et d’Alzheimer) et les maladies respiratoires (telles que l’asthme, la BPCO ou la Covid-19). On parle alors de troubles de la parole ou de parole pathologique. Il est maintenant établi que certaines de ces maladies se caractérisent par une manifestation précoce des troubles de la parole. Le développement de biomarqueurs objectifs vocaux est devenu ainsi un enjeu majeur pour l’aide au diagnostic et suivi de ces maladies. La mission de l’ingénieur(e) recruté(e) s’inscrit dans ce cadre. L’objectif de la mission est de concevoir une architecture logicielle, en Python, pour : 1 développer une boîte à outils générique de traitement du signal dédiée à l’analyse de la parole pathologique ; 2 implémenter un biomarqueur vocal de la fonction respiratoire en utilisant des techniques d’apprentissage statistique, dont le Deep Learning. Cette dernière tâche s’inscrit dans le cadre d’un projet de recherche clinique en partenariat avec l’AP-HP (Assistance Publique - Hôpitaux de Paris), notamment le service de pneumologie et de réanimation de L'hôpital La Pitié-Salpêtrière. Le but de ce projet est le développement d’un biomarqueur vocal de l’état respiratoire et de son évolution pour l’aide au télé-suivi de patients atteints d’une affection respiratoire, dont la Covid-19. Principales activités Pour des raisons de sécurité et de confidentialité, les données vocales et cliniques des patients sont hébergées sur les serveurs EDS (Entrepôt de Données de Santé) de l’AP-HP. La première tâche sera ainsi de développer une API permettant la communication avec l’infrastructure d’hébergement. La deuxième tâche sera d’implémenter des techniques éprouvées d’analyse de la parole pathologies puis d’autres issus de recherches récentes. Cette tâche s’appuiera, le cas échéant, sur Parselmouth (parselmouth.readthedocs.io/en/stable/) qui est une librairie Python pour Praat (www.fon.hum.uva.nl/praat/). La troisième étape consistera à implémenter et expérimenter des techniques d’apprentissage statistique en utilisant les données de patients. Cette tâche s’appuiera sur les framework habituels de Machine Learning (TensorFlow, PyTorch, Scikitlearn). Encadrement L’ingénieur.e disposera d’un encadrement scientifique, par Khalid Daoudi de l’équipe GEOSTAT, et technique par Dan Dutartre et François Rué du Service d'Expérimentation et de Développement (SED) d’Inria-Bordeaux. Compétences Être titulaire d’un diplôme d’ingénieur et/ou doctorat en sciences du numérique Disposer d’une expérience significative dans le développement ou le pilotage d’un projet logiciel en python. . Disposer d’une formation solide en apprentissage statistique (Machine Learning) ainsi que d’une expérience notable dans ce domaine ; . Disposer d’une expertise solide en développement logiciel pour être en capacité de s’adapter à différents types langages des plus standards (Python, C, C++) ; une forte compétence en python est requise ; . Des connaissances en traitement du signal seraient un plus très apprécié ; . Maîtriser les concepts, la méthodologie et les outils de la qualité logicielle ; . Maîtriser les méthodologies de gestion de projet logiciel collaboratif ; . Maîtriser les méthodologies d’architectures logicielles modulaires ; . Excellent relationnel ; . Savoir travailler en équipe pluridisciplinaires ; . Savoir s’adapter au contexte projet ; . Être autonome dans son organisation personnelle et le reporting ; . Avoir une bonne communication écrite et orale en français ; . Maîtriser l’anglais technique et scientifique. Candidature Le(a) candidat(e) est invité(e) à envoyer sa candidature à khalid.daoudi@inria.fr ; francois.rue@inria.fr ; dan.dutartre@inria.fr
| ||
6-19 | (2021-04-11) ​Proposal for a postdoctoral position at INRIA, Bordeaux, France Proposal for a postdoctoral position at INRIA, Bordeaux, France Title: Sparse predictive models for the analysis and classification of pathological speech Keywords: Pathological speech processing, Sparse modeling, Optimization algorithms, Machine learning, Parkinsonian disorders, Respiratory diseases Contact and Supervisor: Khalid Daoudi (khalid.daoudi@inria.fr) INRIA team: GEOSTAT (geostat.bordeaux.inria.fr) Duration: from 01/11/2021 to 31/12/2022 (could be extended to an advanced or a permanent position) Salary: 2653€ / month Profile: PhD degree obtained after August 2019 or to be defended by the end of 2021. High quality applications with a PhD obtained before August 2019 could be considered for an advanced research position. Required Knowledge and background: A solid knowledge in speech/signal processing; A good mathematical background; Basics of machine learning; Programming in Matlab and Python. Scientific research context During this century, there has been an ever increasing interest in the development of objective vocal biomarkers to assist in diagnosis and monitoring of neurodegenerative diseases and, recently, respiratory diseases because of the Covid-19 pandemic. The literature is now relatively rich in methods for objective analysis of dysarthria, a class of motor speech disorders [1], where most of the effort has been made on speech impaired by Parkinson’s disease. However, relatively few studies have addressed the challenging problem of discrimination between subgroups of Parkinsonian disorders which share similar clinical symptoms, particularly is early disease stages [2]. As for the analysis of speech impaired by respiratory diseases, the field is relatively new (with existing developments in very specialized areas) but is taking a great attention since the beginning of the pandemic. On the other hand, the large majority of existing processing methods (of pathological speech in general) still heavily rely on a core of feature estimators designed and optimized for healthy speech. There exist thus a strong need for a framework to infer/design speech features and cues which remain robust to the perturbations caused by (classes of) disordered speech. The first and main objective of this proposal is to explore the framework of sparse modeling of speech which allow a certain flexibility in the design and parameter estimation of the sourcefilter model of speech production. This exploration will be essentially based on theoretical advances developed by the GEOSTAT team and which have led to a significant impact in the field of image processing, not only at the scientific level [3] but also at the technological level (www.inria.fr/fr/i2s-geostat-un-innovation-lab-enimagerie- numerique). The second objective of this proposal is to use the resulting representations as inputs to basic machine learning algorithms in order to conceive a vocal biomarker to assist in the discrimination between subgroups of Parkinsonian disorders (Parkinson’s disease, Multiple-System Atrophy, Progressive Supranuclear Palsy) and in the monitoring of respiratory diseases (Covid-19, Asthma, COPD). Both objectives benefit from a rich dataset of speech and other biosignals recently collected in the framework of two clinical studies in partnership with university hospitals in Bordeaux and Toulouse (for Parkinsonian disorders) and in Paris (for respiratory diseases). Work description As stated above, the work to be carried is decomposed in two parts. The main part consists in developing new algorithms, based on sparse modeling, for the analysis of a class of disordered speech. The second part consists in exploring machine learning tools to develop vocal biomarkers for the purpose of (differential) diagnosis and monitoring of the diseases under study. 1. Sparse modeling for disordered speech analysis The first task will be to investigate sparsity in the framework of linear prediction modeling of speech. The latter is indeed one of the building blocks for the estimation of core glottal, phonation and articulatory features. Sparse linear prediction (SLP) has been recently investigated in a convex setting using the L1-norm and applied, essentially, to speech coding [4]. We will start by investigating the potential of this convex setting in disordered speech analysis. We will then explore the use of non-convex penalties that allow sparsity control and a better decoupling the vocal tract filter from excitation source. We will study the spectral properties of the different models and revisit a set of acoustic features which are not robust to perturbations raising in dysarthric speech. We will then explore the potential of SLP in designing new features which could be informative about dysarthria. The algorithmic developments will be evaluated using a rich set of biosignals obtained from patients with Parkinsonian disorders and from healthy controls. The biosignals are electroglottography and aerodynamic measurements of oral and nasal airflow as well as intra-oral and sub-glottic pressure. After dysarthria analysis, we will study speech impairments caused by respiratory deficits. The main goal here will be to automatically identify respiratory patterns and to design features to quantify the impairments. The developments will be evaluated using manual annotations, by an expert phonetician, of speech signals obtained from patients with respiratory deficit and from healthy controls. Depending on the work progress and time constraints, we may also explore sparsity beyond the linear prediction model through existing nonlinear representations of speech. It is well known indeed that the linear source-filter model of speech cannot capture several nonlinearities which exist in the speech production process, particularly in disordered speech. 2. Machine learning for disease diagnosis and monitoring Using the outcomes of the first part, the (experimental) objective of the second part is to apply basic machine learning algorithms (LDA, logistic regression, decision trees, SVM…) using standard tools (such as Scikit- Learn) to conceive robust algorithms that could help, first, in the discrimination between Parkinsonian disorders and, second, in the monitoring of respiratory deficit. 3. Work synergy - The postdoc will interact closely with an engineer who is developing an open-source software architecture dedicated to pathological speech processing. The validated algorithms will be implemented in this architecture by the engineer, under the co-supervision of the postdoc. - Giving the multidisciplinary nature of the proposal, the postdoc will interact with the clinicians participating in the two clinical studies. References: [1] J. Duffy. Motor Speech Disorders Substrates, Differential Diagnosis, and Management. Elsevier, 2013. [2] J. Rusz et al. Speech disorders reflect differing pathophysiology in Parkinson's disease, progressive supranuclear palsy and multiple system atrophy. Journal of Neurology, 262(4), 2015. [3] H. Badri. Sparse and Scale-Invariant Methods in Image Processing. PhD thesis, University of Bordeaux, France, 2015. [4] D. Giacobello et al. Sparse Linear Prediction and Its Applications to Speech Processing. IEEE Transactions on Audio Speech and Language Processing, (20)5, 2012.
| ||
6-20 | (2021-04-19) Technical engineer at ELDA, Paris The European Language resources Distribution Agency (ELDA), a company specialized in Human Language Technologies within an international context, acting as the distribution agency of the European Language Resources Association (ELRA), is currently seeking to fill an immediate vacancy for a Technical Engineer position. Job description Required profile About ELDA
| ||
6-21 | (2021-04-19) Web Developer at ELDA, Paris, France The European Language resources Distribution Agency (ELDA), a company specialized in Human Language Technologies within an international context is currently seeking to fill an immediate vacancy for a permanent Web Developer position. Job description Required profile About
| ||
6-22 | (2021-04-22) Post-doc at GIPSA-Lab Grenoble, FranceInformations généralesRéférence : UMR5216-ALLBEL-024 MissionsCe post-doctorat fait partie du projet ANR GEPETO (GEstures and PEdagogy of InTOnation), dont le but est d'étudier l'utilisation de gestes manuels par le biais d'interfaces humain-machine, pour la conception d'outils et méthodes permettant l'apprentissage du contrôle de l'intonation (mélodie) dans la parole. En particulier, ce poste se place dans le contexte de la rééducation vocale, dans le cas de dégradation ou d'absence de vibration des plis vocaux chez des patients atteints de troubles du larynx. Les solutions médicales actuelles pour remplacer cette vibration consistent à injecter une source sonore artificielle dans le conduit vocal, directement par la bouche ou en transmission par les tissus du cou, grâce à un électrolarynx. Ce vibreur génère une source vocale de substitution sur laquelle l'utilisateur peut articuler normalement de la parole. Une alternative est de capter à l'aide d'un microphone la parole non-voisée produite par une personne en absence de vibration des plis vocaux (par exemple un chuchotement), et d'y ré-introduire le voisement en temps-réel par synthèse vocale. La voix reconstruite est alors jouée en temps-réel sur un haut-parleur. Aujourd'hui, l'ensemble de ces systèmes génèrent des signaux d'intonation (mélodie) relativement constante, conduisant à des voix très robotiques. Le but du projet GEPETO à GIPSA-lab est d'ajouter à ces deux solutions un contrôle de l'intonation en temps-réel par le geste de la main, qui sera capté par diverses interfaces (tablette, accéléromètre, etc.), et d'étudier l'usage de tels systèmes dans des situations d'interactions orales. Le post-doctorat se concentrera sur la solution de conversion chuchotement-parole qui est déjà disponible au laboratoire. Le travail sera divisé en deux tâches. Activités- Prendre en main les différents modules pour la conversion chuchotement-parole disponibles au laboratoire (analyse du chuchotement, moteur de synthèse, gestion des interfaces) dans l'environnement Max/MSP Compétences- Langage C/C++ (connaissance approfondie) Contexte de travailGipsa-lab est une unité de recherche mixte du CNRS, de Grenoble INP, et de l'Université de Grenoble Alpes ; elle est conventionnée avec Inria et l'Observatoire des Sciences de l'Univers de Grenoble.
| ||
6-23 | (2021-05-14) Ph D position at Prosody/Language Acquisition, Sign language: University of Lisbon Prosody/Language Acquisition, Sign language: PhD, University of Lisbon
Applications are invited for one funded PhD position at the Phonetics and Phonology Lab and the Lisbon Baby Lab of the Center of Linguistics of the University of Lisbon (CLUL). The candidate will develop a project on the Prosody of Portuguese Sign Language/Língua Gestual Portuguesa (LGP). Research on this minority language is remarkably scarce. The work will contribute to the knowledge of the unexplored issues of production, perception and/or acquisition of prosody in LGP.
General scientific area: Linguistics, Psychology
Specific scientific area: Phonology (Prosody), Psycholinguistics, Sign language, Language processing, Language acquisition
Applications are invited from candidates holding a Master degree (MA) in Linguistics, Psychology or related areas
The work will be conducted at the Phonetics and Phonology Lab and Lisbon Baby Lab (PhonLab/LBL), under the supervison and/or co-supervision of Marina Vigário, Sónia Frota and/or Marisa Cruz. PhonLab/LBL is a leading group for research on prosody and the acquisition of prosody, with a strong interest in multimodal prosody and sign language, working with a network of partners on visual prosody, gestures and sign language. The research will take advantage of the resources, facilities and human assets available at the Lab. One of two possible PhD programs from the University of Lisbon can be chosen: PhD in Linguistics (School of Arts and Humanities, University of Lisbon) and PhD in Cognitive Sciences (University of Lisbon).
The successful candidate is expected to start in the beginning of July 2021.
Application deadline: 11th June 2021
For more information: http://www.eracareers.pt/opportunities/index.aspx?task=showAnuncioOportunities&jobId=134592&idc=1
Contact: labfon@letras.ulisboa.pt or sfrota@edu.ulisboa.pt
Sónia Frota
Professora catedrática | Professor
Coordenadora Científica - CLUL | Scientific Coordinator - CLUL
Centro de Linguística da Universidade de Lisboa | Center of Linguistics of the University of Lisbon (CLUL)
Faculdade de Letras da Universidade de Lisboa | School of Arts and Humanities Alameda da Universidade 1600-214 Lisboa PORTUGAL
Telefone: 217 920 000 | www.letras.ulisboa.pt Lisbon Baby Lab | http://labfon.letras.ulisboa.pt/babylab/
Editor-in-chief, Journal of Portuguese Linguistics
| ||
6-24 | (2021-05-16) Postdocs at LUDO-VIC, Paris France Recherche de « jeunes docteurs en 1er CDI» en linguistique, didactique des langues ET en Natural Language Processing La société LUDO-VIC a pour devise : « Quels que soient votre langue maternelle et votre niveau de scolarisation, apprenez les bases de n’importe quels concepts : une nouvelle langue, des gestes de santé/sécurité, du savoir-être, etc.. » Ce but est atteint par la contextualisation des éléments des concepts à transmettre grâce à de courtes animations 3D mettant en scène les avatars Ludo et Vic qui ont été spécifiquement conçus pour ne stigmatiser aucune population sur terre et pour promouvoir l’égalité des sexes. Ces saynètes expliquent à l’oral et dans la langue maternelle de l’apprenant les éléments à transmettre, levant ainsi la barrière de l’écrit et celle de la langue vernaculaire. Nous avons développé ainsi une application dénommée BasicFrançais, avec un cofinancement européen, qui permet à des populations allophones d’acquérir les bases du français, initialement au niveau A1.1, et nous nous fixons comme but d’aller jusqu’au niveau A2. Notre recherche de « jeunes docteurs en premier CDI » portent sur une application dérivée, nommée BasicX dans laquelle X est une langue pratiquée sur le territoire français, allant des créoles de Mayotte, à ceux de la Réunion et de l’arc antillais, aux langues amérindiennes de Guyane, au Kanak de Nouvelle Calédonie, au polynésien, et l’ensemble des dialectes de la métropole (alsacien(s), basque, picard, occitan(s), etc…), mais aussi les langues parlées par les migrants. La Direction Générale de la Langue Française et des Langues de France compte environ 75 de ces langues dialectales, et environ 230 langues sont parlées en Europe. Le projet de R&D consiste à créer des scénarios d’interaction dans une langue à apprendre, collecter des données et les analyser, participer au développement des technologies de l’intelligence artificielle dans la langue en question (reconnaissance, synthèse vocale, gestion des dialogues). Tout en étant ambitieux, ce projet relève du faisable puisque la plage lexicale du niveau A1 ne comporte qu’environ 1000 mots et une petite centaine de dialogues très simples. La personne « idéale » est donc compétente en Traitement Automatique de la Parole et Intelligence Artificielle, mais maîtrise également un dialecte parlé sur le territoire français, ou une langue issue de l’immigration. Nous sommes conscients que ce « mouton à cinq pattes » est rare, et considèrerons donc des candidatures venant soit de la didactique des langues, soit du NLP. La société est basée en région parisienne, mais les candidats pourront travailler depuis leur lieu habituel de résidence. Envoyez votre CV à jack@ludo-vic.com LUDO-VIC SAS – 103 Boulevard Macdonald 75019 PARIS RCS 824194492 Paris – http://www.ludo-vic.com
| ||
6-25 | (2021-05-20) PhD position , LIA, Avignon, France Main laboratory: ?Laboratoire Informatique d?Avignon? (LIA)
Start time:? September 2021
Project context
This Ph.D. position is part of the French research project DIETS (Automatic diagnosis of errors of end-to-end speech transcription systems from users perspective) funded by the ANR (French National Research Agency) which aims at analyzing finely recognition errors by taking into account their human reception, and understanding and visualizing how these errors manifest themselves in an end-to-end ASR framework. The main objectives are to propose original automatic approaches and tools to visualize, detect and measure transcription errors from the end-users perspective.
Candidate profile
?The applicant must hold a Master degree in Computer Science. ?Mastery of at least one common object programming language (Java, C++...) and one scripting language (Python, Perl...) are mandatory, furthermore experience in automatic language and speech processing, or machine learning, data mining are appreciated. He or she should also show interest in linguistics and the study of human behavior.
Objectives
The main objective of the thesis is to finely analyze transcription errors from the point of view of their reception by the user. The thesis will have three complementary parts:
1. Approaches for error detection in transcripts of end-to-end ASR systems. This should lead to original confidence measures.
2. Detailed analysis of transcription errors in French, whether human or automatic, with a traditional or end-to-end system, in order to understand how errors are viewed from a human perspective. This will shed light on new classes of errors, guided by their difficulty, or ease, to be understood by end users.
3. Realization of a new body of automatic transcriptions where errors are annotated using precise linguistic information, and information collected during perceptual tests to reflect how users perceive (and possibly correct) these errors. Carrying out different perceptual tests, by confronting humans with these transcription errors.
It will be a question of laying the first bases of a new and transversal research, at the crossroads between linguistics, computer science and cognitive sciences, for the evaluation of automatic systems and the understanding of NLP systems based on deep architectures. The Ph.D. student will then have the opportunity to learn and propose innovative approaches in automatic speech processing for the understanding of architectures with deep neural networks, but also to have an openness and skills in linguistics and on the implementation of perceptual tests.
Interests for the candidate:
- Very favorable and collaborative work environment in an internationally recognized research laboratory in language processing and machine learning.
- Implementation, analysis and proposals for innovative approaches to different ASR systems (classical and end-to-end frameworks).
- Development of complementary metrics to WER that are user-oriented.
- Transdisciplinary scientific work allowing openness to other disciplines (e.g. linguistics and cognitive sciences).
Applications? should be sent to:
- Richard Dufour (?richard.dufour@univ-avignon.fr?) - ?LIA?, ?Avignon University
- Jane Wottawa (?jane.wottawa@univ-lemans.fr?) - ?LIUM?, ?Le Mans University
and should include:
- a detailed CV (education and research experiences),
- a cover letter specifying the candidate?s research interests on this proposed Ph.D. thesis, - Bachelor (Licence) and Master grades in detail,
- at least one reference that could be contacted for recommandation.
Further information can be found here : https://anr-diets.univ-avignon.fr/2021/02/12/open-ph-d-position/
| ||
6-26 | (2021-05-25) Two fully-funded PhD positions, INRIA and Vivoka, Metz, France Inria and Vivoka are offering two fully-funded PhD positions in the context of an
| ||
6-27 | (2021-05-28) Position of Assistant Professor, Univ. Groningen, The Netherlands Job description
We invite applications for an Assistant Professor in Speech Technology. Generally, for this position, you will teach and develop courses, perform research, supervise graduate research, and have an active role in shaping the emerging educational and research programme.
We recognize research as a critical part of the profile of an Assistant Professor, and therefore allocate 40% of your position to do research (provided you teach at least 2 courses/year). That research may dovetail with the courses you teach, to ensure that your expertise is integrated into the programme. Ideally, your research would overlap with that of PhD students ? and, where relevant, graduate students could contribute to your research through their thesis projects. As a team, we are keen on applying for grants in the years ahead to build consortia and further solidify our expertise.
We see teaching as an interactive and engaging process. Consequently, the courses include many individual and group activities and encourage creative, out-of-the-box, hands-on approaches to learning that balance theory and practice. Specifically, given the start-up phase of the programme and potential for growth, this position is open to a range of profiles and contributions. In addition to supervising theses within your area of expertise, you will support the teaching and/or curriculum development of courses in speech synthesis, speech recognition, Python, and machine learning for voice tech (all courses already have detailed week-by-week descriptions but lack student-ready syllabi, giving you some creative freedom -- more information about the courses, including learning outcomes, is available upon request):
? Speech Synthesis I and II
? Speech Recognition I and II
? Python for Voice Technology (and Intro to Python at the undergraduate level)
? Machine Learning for Voice Technology
If you are interested in increasing your appointment to a full-time one, you may also teach Statistics (undergraduate level) under a separate contract.
Qualifications
We are looking for an enthusiastic colleague with demonstrated teaching and research skills and an affinity for interdisciplinary approaches to teaching. Research expertise that involves speech recognition, voice synthesis, and machine learning with audio data is crucial.
The ideal candidate has:
? a PhD in Linguistics, Computer Science, AI or a comparable domain (ideally on topics related to ASR or speech synthesis)
? an ability to develop course content for the courses you will teach
? a capacity to teach master?s students and supervise master?s projects
? the willingness to apply an inter- and transdisciplinary perspective to research and education
? relevant publications
? a speech tech network in academia and/or industry
? a University Teaching Qualification, or the willingness to acquire one within two years after the starting date.
Organisation
The University of Groningen, established in 1614, is one of the oldest and most prestigious European universities. You will work at the university's newest faculty, Campus Fryslân, located in the picturesque capital of Fryslân, Leeuwarden (the European Capital of Culture in 2018). The faculty is dedicated to interdisciplinary and transdisciplinary education and research and provides a stimulating working environment in which mutual support is combined with room for individual initiative. You will become a member of our high-standing academic and international community. We challenge our staff and students to approach issues from multiple disciplines and encourage them to take a different view. We are curious about yours!
Within Campus Fryslân, you will primarily be working in the new Voice Technology Master?s programme. The MSc. Voice Technology is a one-year English language master?s programme with a highly interdisciplinary scope. It was developed in close cooperation with other universities and partners from the private sector (critical input continues to be provided by Dutch SMEs alongside international tech companies like Apple, Mozilla, and Google). This means that scientific scholarship is balanced with applied know-how in the programme. The MSc. Voice Technology is launching for the first time in September 2021 with a small cohort of students from an array of backgrounds, ranging from AI and Computer Science to Linguistics and Humanities.
Conditions of employment
We offer you in accordance with the Collective Labour Agreement for Dutch Universities:
? a salary, depending on qualifications and work experience, with a minimum of ? 3,746 to a maximum of ? 5,127 (salary scale 11) gross per month for a full-time position
? a holiday allowance of 8% gross annual income
? an 8.3% end-of-the-year allowance
? minimum of 29 holidays and additional 12 holidays in case of full-time employment.
The position has a 60-40 percent distribution with regard to teaching-research. The post will be established for a fixed term period of two years. Towards the end of that period there will be a result- and development interview in order to decide whether the appointment will be made permanent.
Application
Do you want to become a member of our team? Please send your application to us, by submitting the following documents:
1. letter of application
2. curriculum vitae
3. a statement on teaching, detailing courses taught or developed
4. email and telephone contact information of at least two referees.
You can submit your application until 13 June 11:59pm / before 14 June 2021 Dutch local time (CET) by means of the application form (click on 'Apply' below on the advertisement on the university website).
Only complete applications submitted by the deadline will be taken into consideration. The starting date for this position is 1 August 2021.
The interview will consist of two parts: the interview (30 minutes) and the mock lecture (15 minutes) during which you will demonstrate your knowledge of the research domain and showcase your teaching capabilities.
We are an equal opportunity employer and value diversity at our University. We are committed to building a diverse faculty so you are encouraged to apply. Our selection procedure follows the guidelines of the Recruitment code (NVP), https://www.nvp-hrnetwerk.nl/sollicitatiecode/ and European Commission's European Code of Conduct for recruitment of researchers, https://euraxess.ec.europa.eu/jobs/charter/code
Unsolicited marketing is not appreciated.
Information
For information you can contact:
?Matt Coler, Program Director - MSc. Voice Technology, m.coler@rug.nl
Please do not use the e-mail address(es) above for applications.
Additional information
?Campus Fryslân https://www.rug.nl/cf/
?MSc. Voice Technology https://www.rug.nl/masters/voice-technology/
| ||
6-28 | (2021-06-02) Rand D engineer at Telepathy Labs, Zurich, Switzerland ASR Research and Development Engineer, Speech To strengthen our Research and Development (R&D) organization, innovate and improve our Automatic Speech Recognition (ASR) products , we need experienced software engineers with specific skills focused on ASR. You will be working with the ASR research and development team, and the position will be based in Zurich, Switzerland. Principal responsibilities * Work together within ASR R&D team to strengthen and extend the quality and the functionality of the existing core engine algorithm and framework. * Document and communicate effectively the design and implementation proposals, and the intermediate and final development results in team internal meetings, and in wider R&D or divisional meetings, when requested. * Define and implement test cases and metrics processes aimed at qualifying the new developments within the team adopted sw development and testing processes. * Follow adopted industry standards and agile development models in place, plus be ready to accommodate rapid customer driven specification changes. Knowledge, Skills and Qualifications: Years of Work Experience: 3 years of professional experience are required Required Skills: The successful candidate is a team player and a fast learner with an analytical mindset and a pragmatic approach to problem solving. Knowledge of main ASR softwares, DSP theory, feature extraction etc. Actual experience within ASR research and development teams. Experience with ASR open source Toolsets such as Kaldi, Sphynx, HTK, Fairseq, NeMo and other Pytorch / Tensorflow based libraries. Experience with high level programming languages such as C, C++, Java. Experience with distributed version control systems (e.g. Git). Working knowledge of Linux Operating system. Excellent oral and written communication skills in English. Preferred Skills: Experience with LSTM and/or Attention Neural Networks and other Deep Learning approaches as applied to ASR domain. Knowledge of embedded software programming in C/C++. Experience with continuous integration and delivery processes. Experience with scripting languages such as Python, Perl, etc. Experience in software development preferably in embedded/small resource software system design and development. Education: Minimum : MSc in computer science, or equivalent Desirable : PhD degree in Computer Science, Artificial Intelligence, Machine Learning, Speech Science. Work Permit: Permit to work in Switzerland (EU-28 or equivalent) required. Contact: Pierre-Edouard Honnet pe.honnet@telepathy.ai Vijeta Avijeet vijeta.avijeet@telepathy.ai
| ||
6-29 | (2021-06-03) Full professor at Radboud University, Nijmegen, The Netherlands At Radboud University we have a position for a full professor Artificial Intelligence & Language, Speech and Communication: https://www.ru.nl/werken-bij/vacature/details-vacature/?recid=1152936&doel=embed&taal=nl Could you include this job position on ISCA's job page:
The website mentions an ultimate date for application of 11 June, but we will be flexible for applications arriving before 16 June if sent to: Prof. José Sanders, Head of Department Language & Communication Tel.: +31 24 361 28 02 Email: jose.sanders@ru.nl
| ||
6-30 | (2021-06-04)PhD and Postdoc positions at University of Bielefeld, Germany PhD position in Phonetics (full time) at Bielefeld University, Germany
Within the newly funded Transregional Collaborative Research Center ?Constructing Explainability?, we are offering a position within the subproject on ?Technically enabled explaining of speaker traits? for a period of 4 years:
https://uni-bielefeld.hr4you.org/job/view/565/research-position-for-the-sfb-trr-318-subproject-c06-pw?page_lang=en ******************************************************************
PostDoc position in Phonetics (full time) at Bielefeld University, Germany
Within the newly funded Transregional Collaborative Research Center ?Constructing Explainability?, we are offering a position within the subproject on ?Monitoring the understanding of explanations? for a period of 4 years:
| ||
6-31 | (2021-06-06) Ph D position at University of Paderborn, Germany https://ei.uni-paderborn.de/fileadmin/elektrotechnik/fg/nth/Stellenangebote/Kennziffer4707.pdf
| ||
6-32 | (2021-06-08) PhD position at University of Bielefeld, Germany The Digital Linguistics Lab (head: JProf. Dr.-Ing. Hendrik Buschmeier) at Bielefeld University is seeking to fill a researcher position (PhD-student, E13 TV-L, 100%, fixed-term until 6/2025) in the newly established collaborative research center TRR 318 ?Constructing Explainability?[^1], sub-project A02 ?Monitoring the understanding of explanations?[^2].
|