ISCApad #274 |
Sunday, April 11, 2021 by Chris Wellekens |
6-1 | (2020-11-02) Fully-funded PhD studentships at the University of Sheffield, Great Britain Fully-funded PhD studentships in Speech and NLP at the University of Sheffield *******************************************************************************************************
UKRI Centre for Doctoral Training (CDT) in Speech and Language Technologies (SLT) and their Applications
Department of Computer Science Faculty of Engineering University of Sheffield, UK
Fully-funded 4-year PhD studentships for research in speech technologies and NLP
** Applications now open for September 2021 intake **
Deadline for applications: 31 January 2021.
Speech and Language Technologies (SLTs) are a range of Artificial Intelligence (AI) approaches which allow computer programs or electronic devices to analyse, produce, modify or respond to human texts and speech. SLTs are underpinned by a number of fundamental research fields including natural language processing (NLP / NLProc), speech processing, computational linguistics, mathematics, machine learning, physics, psychology, computer science, and acoustics. SLTs are now established as core scientific/engineering disciplines within AI and have grown into a world-wide multi-billion dollar industry.
Located in the Department of Computer Science at the University of Sheffield ? a world leading research institution in the SLT field ? the UKRI Centre for Doctoral Training (CDT) in Speech and Language Technologies and their Applications is a vibrant research centre that also provides training in engineering skills, leadership, ethics, innovation, entrepreneurship, and responsibility to society.
Apply now: https://slt-cdt.ac.uk/apply/
The benefits:
About you: We are looking for students from a wide range of backgrounds interested in speech and NLP.
Applying: Applications are now sought for the September 2021 intake. The deadline is 31 January 2021.
Applications will be reviewed within 6 weeks of the deadline and short-listed applicants will be invited to interview. Interviews will be held in Sheffield or via videoconference.
See our website for full details and guidance on how to apply: https://slt-cdt.ac.uk
For an informal discussion about your application please contact us by email at: sltcdt-enquiries@sheffield.ac.uk
| |||||
6-2 | (2020-11-14) Master2 Internship, LORIA-INRIA, Nancy, France Master2 Internship: Semantic information from the past in a speech recognition system: does the past help the present?
Supervisor: Irina Illina, MdC, HDR Team and Laboratory: Multispeech, LORIA-INRIA Contact: illina@loria.fr
Co-Supervisor: Dominique Fohr, CR CNRS Team and Laboratory: Multispeech, LORIA-INRIA Contact : dominique.fohr@loria.fr
Motivation and context
Semantic and thematic spaces are vector spaces used for the representation of words, sentences or textual documents. The corresponding models and methods have a long history in the field of computational linguistics and natural language processing. Almost all models rely on the hypothesis of statistical semantics that states that: statistical schemes of appearance of words (context of a word) can be used to describe the underlying semantics. The most used method to learn these representations is to predict a word using the context in which this word appears [Mikolov et al., 2013, Pennington et al., 2014], and this can be realized with neural networks. These representations have proved their effectiveness for a range of natural language processing tasks. In particular, Mikolov's Skip-gram and CBOW models [Mikolov et al., 2013] and BERT model [Devlin et al., 2019] have become very popular because of their ability to process large amounts of unstructured text data with reduced computing costs. The efficiency and the semantic properties of these representations motivate us to explore these semantic representations for our speech recognition system. Robust automatic speech recognition (ASR) is always a very ambitious goal. Despite constant efforts and some dramatic advances, the ability of a machine to recognize the speech is still far from equaling that of the human being. Current ASR systems see their performance significantly decrease when the conditions under which they were trained and those in which they are used differ. The causes of variability may be related to the acoustic environment, sound capture equipment, microphone change, etc.
Objectives Our speech recognition (ASR) system [Povey et al, 2011] is supplemented by a semantic analysis for detecting the words of the processed sentence that could have been misrecognized and for finding words having a similar pronunciation and matching better the context [Level et al., 2020]. For example, the sentence « Silvio Berlusconi, prince de Milan » can be recognized by the speech recognition system as: « Silvio Berlusconi, prince de mille ans ». A good semantic context representation of the sentence could help to find and correct this error. This semantic analysis re-evaluates (rescores) the N-best transcription hypotheses and can be seen as a form of dynamic adaptation in the case of noisy speech data. A semantic analysis is performed in combining predictive representations using continuous vectors. All our models are The semantic module improves significantly the performance of speech recognition system. But we would like to go beyond the semantic information of the current sentence. Indeed, sometimes the previous sentences could help to understand and to recognize the current sentence. The Master internship will be devoted to the innovative study of the taking into account the past recognized sentences to improve the recognition of the current sentence. Research will be conducted on the combination of semantic information from one or several past sentences with semantic information from current sentence to improve the speech recognition. As deep neural networks (DNNs) can model complex functions and get outstanding performance, they will be used in all our modeling. The performance of the different modules will be evaluated on artificially noisy speech data.
Required skills: background in statistics, natural language processing and computer program skills (Perl, Python). Candidates should email a detailed CV with diploma
Bibliography [Devlin et al., 2019] Devlin, J., Chang, M.-W., Lee, K. and Toutanova K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). [Level et al., 2020] Level S., Illina I., Fohr D. (2020). Introduction of semantic model to help speech recognition. International Conference on TEXT, SPEECH and DIALOGUE. [Mikolov et al., 2013] Mikolov, T., Sutskever, I., Chen, T. Corrado, G.S.,and Dean, J. (2013). Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems, pp. 3111?3119. [Pennington et al., 2014] Pennington, J., Socher, R., and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532-1543. [Povey et al., 2011] Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motl?cek, P., Qian, Y., Schwarz, Y., Silovsky, J., Stemmer, G., Vesely, K. (2011). The Kaldi Speech Recognition Toolkit, Proc. ASRU.
| |||||
6-3 | (2020-11-15) PhD at LORIA-INRIA, Nancy, France Title: Multi-task learning for hate speech classification
Research Lab: MULTISPEECH team, LORIA-INRIA, Nancy, France
Supervisors: Irina Illina, Associate Professor, HDR (illina@loria.fr)Ashwin Geet D?Sa, PhD Thesis student (ashwin-geet.dsa@loria.fr) Dominique Fohr, Research scientist CNRS (dominique.fohr@loria.fr) Motivation and context: During the last years, online communication through social media has skyrocketed. Although most people use social media for constructive purposes, few misuse these platforms to spread hate speech. Hate speech is anti-social communicative behavior and targets a minor section of the society based on religion, gender, race, etc. (Delgado and Stefancic, 2014). It often leads to threats, fear, and violence to an individual or a group. Online hate speech is a punishable offense by the law, and the owners of the platform are held accountable for the hate speech posted by its users. Manual moderation of hate speech by humans is often expensive and time-consuming. Therefore, automatic classification techniques have been employed for the detection of hate speech.Recently, deep learning techniques have outperformed the classical machine learning techniques and have become state-of-the-art methodology for hate speech classification (Badjatiya et al., 2017). These methodologies need a large quantity of annotated data for training. The task of annotation is very expensive. To train a powerful deep neural network based classification system, several corpora can be used.Multi-task learning is a deep learning-based methodology (MT-DNN) which has proven to outperform the single-task based deep learning models, especially in the low-resource setting (Worsham and Jugal, 2020; Liu et al., 2019). This methodology jointly learns a model on multiple tasks such as classification, question-answering, etc. Thus, the information learned in one task can benefit other tasks and improves the performance of all the tasks.Existing hate speech corpora are collected from various sources such as Wikipedia, Twitter, etc. Labeling of these corpora can vary greatly. Some corpus creators combine various forms of hate, such as ?abuse?, ?threat?, etc. and collectively annotate the samples as ?hate speech? or ?toxic speech?. Whereas, other authors create more challenging corpora using fine-grained annotations such as ?hate speech?, ?abusive speech?, ?racism?, etc. (Davidson et al., 2017). Furthermore, the definition of hate speech remains unclear, and corpus creators can use different definitions. Thus, a model trained on one corpus cannot be easily used to classify the comments from another corpus. To take advantage of the different available hate speech corpora and to improve the performance of hate classification, we would like to explore the multi-corpus learning by using the methodology of multi-task learning.Objectives: The objective of this work is to improve the existing deep learning hate speech classifier by developing the multi-task learning system using several hate speech corpora during the training. In the MT-DNN model of (Liu et al., 2019), the multi-task learning model consists of a set of task-specific layers on top of shared layers. The shared layers are the bottom layers of the model and are jointly trained on several corpora. The task-specific layers are built on top of the shared layers and each of these layers is trained on a single task. We want to explore this setup for hate speech detection learning. In this case, shared layers will be jointly trained on several corpora of hate speech. Each task-specific layer will be used to learn a specific hate speech task from a specific corpus. For example, one task-specific layer can use very toxic, toxic, neither, healthy and very healthy classification task and the Wikipedia detox corpus (Wulczyn et al.,). Another task-specific layer can use hateful, abusive, normal classification task and the Founta corpus (Founta et al.). Since shared-layers are jointly trained using multiple corpora, it will improve the performance of the task-specific layers, especially for a task with small amount of data.The work plan for the internship is as follows: at the beginning, the intern will conduct a literature survey on the hate speech classification using deep neural networks. Using the hate speech classification baseline system (CNN-based or Bi-LSTM-based), existing in our team, the student will evaluate the performance of this system on several available hate speech corpora. After this, the student will develop a new methodology based on the MT-DNN model for efficient learning. We can use pre-trained BERT model (Devlin et al., 2019) to initialize the shared layers of MT-DNN. The performance of the proposed MT-DNN model will be evaluated and compared to single corpora learning and multi-corpora learning (grouping all corpora together).Required Skills: Background in statistics, natural language processing and computer program skills (Python).Candidates should email a detailed CV with diploma. Bibliography: Badjatiya P., Gupta S., Gupta M., and Varma V. ?Deep learning for hate speech detection in tweets.? In Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759-760, 2017. Davidson T., Warmsley D., Macy M., and Weber I. ?Automated hate speech detection and the problem of offensive language.? arXiv preprint arXiv:1703.04009, 2017. Delgado R., and Stefancic J. ?Hate speech in cyberspace.? Wake Forest L. Rev. 49: 319, 2014. Devlin J., Chang M., Lee K., and Toutanova K. ?BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.? In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171-4186. 2019. Founta AM, Djouvas C, Chatzakou D, Leontiadis I, Blackburn J, Stringhini G, Vakali A, Sirivianos M, Kourtellis N. Large scale crowdsourcing and characterization of Twitter abusive behavior. ICWSM. 2018. Liu X., He P., Chen W., and Gao J. ?Multi-Task Deep Neural Networks for Natural Language Understanding.? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4487-4496. 2019. Worsham J., and Jugal K. ?Multi-task learning for natural language processing in the 2020s: where are we going?.? Pattern Recognition Letters, 2020. Wulczyn E, Thain N, Dixon L. Ex machina: Personal attacks seen at scale. InProceedings of the 26th International Conference on World Wide Web 2017.
| |||||
6-4 | (2020-11-17) 2 year full-time postdoctoral researcher ar the University of Bordeaux, France The University of Bordeaux invites applications for a 2 year full-time postdoctoral researcher in Automatic Speech Recognition. The position is part of the FVLLMONTI project on efficient speech-to-speech translation on embedded autonomous devices, funded by the European Community.
To apply, please send by email a single PDF file containing a full CV (including publication list), cover letter (describing your personal qualifications, research interests and motivation for applying), evidence for software development experience (active Github/Gitlab profile or similar), two of your key publications, contact information of two referees and academic certificates (PhD, Diploma/Master, Bachelor certificates).
Details on the position are given below:
Job description: Post-doctoral position in Automatic Speech Recognition
Duration: 24 months
Starting date: as early as possible (from March 1st 2021)
Project: European FETPROACT project FVLLMONTI (starts January 2021)
Location: Bordeaux Computer Science Lab. (LaBRI CNRS UMR 5800), Bordeaux, France (Image and Sound team)
Salary: from 2 086,45? to 2 304,88?/month (estimated net salary after taxes, according to experience)
Contact: jean-luc.rouas@labri.fr
Short description:
The applicant will be in charge of developping state-of-the-art Automatic Speech Recognition systems for English and French as well as related Machine Translation systems using Deep Neural Networks. The objective is to provide the exact specifications of the designed systems to the other partners of the project specialized in hardware. Adjustements will have to be made to take into account the hardware constraints (i.e. memory and energy consumption impacting the number of parameters, computation time, ...) while keeping an eye on the performances (WER and BLEU scores). When a satisfactory trade-off is reached, more exploratory work is to be carried out on using emotion/attitude/affect recognition on the speech samples to supply additional information to the translation system.
Context of the project:
The aim of the FVLLMONTI project is to build a lightweight autonomous in-ear device allowing speech-to-speech translation. Today, pocket-talk devices integrate IoT products requiring internet connectivity which, in general, is proven to be energy inefficient. While machine translation (MT) and Natural Language Processing (NLP) performances have greatly improved, an embedded lightweight energy-efficient hardware remains elusive. Existing solutions based on artificial neural networks (NNs) are computation-intensive and energy-hungry requiring server-based implementations, which also raises data protection and privacy concerns. Today, 2D electronic architectures suffer from 'unscalable' interconnect and are thus still far from being able to compete with biological neural systems in terms of real-time information-processing capabilities with comparable energy consumption. Recent advances in materials science, device technology and synaptic architectures have the potential to fill this gap with novel disruptive technologies that go beyond conventional CMOS technology. A promising solution comes from vertical nanowire field-effect transistors (VNWFETs) to unlock the full potential of truly unconventional 3D circuit density and performance.
Role:
The tasks assigned to the Computer Science lab are the design of the Automatic Speech Recognition systems (for French and English) and the Machine Translation (English to French and French to English). Speech synthesis will not be explored in the project but an open-source implementation will be used for demonstration purposes. Both ASR and MT tasks benefit from the use of Transformer architectures over Convolutional (CNNs) or Recurrent (RNNs) neural network architectures. Thus, the role of the applicant will be to design and implement state-of-the-art systems for ASR using Transformer nerworks (e.g. with the ESPNET toolkit) and to assist another post-doctorate for the MT systems. Once the performances reached by these baseline systems are satisfactory, details on the network will be given to our hardware designers partners (e.g. number of layers, value of the parameters, etc.). With the feedback of these partners, adjustements will be made to the network considering the hardware constraints while trying not to degrade the performances too much.
The second part of the project will focus on keeping up with the latest innovations and translating them into hardware specifications. For example, recent research suggest that adding convolutional layers to the transformer architecture (i.e. the 'conformer' network) can help reduce the number of parameters of the model which is critical regarding the memory usage of the hardware system.
Finally, more exploratory work on the detection of social affects (i.e. the vocal expression of the intent of the speaker: 'politeness', 'irony', etc) will be carried out. The additional information gathered using this detection will be added to the translation system for potential usage in the future speech synthesis system.
Required skills:
- PhD in Automatic Speech Recognition (preferred) or Machine Translation using deep neural networks
- Knowledge of most widely used toolboxes/frameworks (tensorflow, pytorch, espnet for example)
- Good programming skills (python)
- Good communication skills (frequent interactions with hardware specialists)
- Interest in hardware design will be a plus
Selected references:
S. Karita et al., 'A Comparative Study on Transformer vs RNN in Speech Applications,' 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), SG, Singapore, 2019, pp. 449-456, doi: 10.1109/ASRU46091.2019.9003750.
Gulati, Anmol, et al. 'Conformer: Convolution-augmented Transformer for Speech Recognition.' arXiv preprint arXiv:2005.08100 (2020).
Rouas, Jean-Luc, et al. 'Categorisation of spoken social affects in Japanese: human vs. machine.' ICPhS. 2019.
---
Jean-Luc Rouas
| |||||
6-5 | (2020-11-19) Ingenieur de recherches (CDD) -Laboratoire ALAIA, Toulouse Le laboratoire Commun ALAIA, destiné à l'Apprentissage des langues Assisté par Intelligence Artificielle, recrute un ingénieur de recherche en CDD (12 mois avec prolongation possible). Le travail à réaliser se fera en coordination avec les deux partenaires impliqués dans le LabCom : l'IRIT (Institut de Recherche en Informatique de Toulouse) et la société Archean Technologies et plus particulièrement son pôle R&D Archean Labs (Montauban 82). ALAIA est centré sur l'expression et la compréhension orale d'une langue étrangère (L2). Les missions consisteront à concevoir, développer et intégrer des services innovants basés sur l'analyse des productions des apprenants L2, la détection et la caractérisation d'erreurs allant du niveau phonétique au niveau linguistique. Les compétences attendues portent sur le traitement automatique de la parole et du langage, le machine learning et le développement d'application web.
Les candidatures sont à adresser à Isabelle Ferrané (isabelle.ferrane@irit.fr) et Lionel Fontan (lfontan@archean.tech). N'hésitez pas à nous contacter pour de plus amples informations.
| |||||
6-6 | (2020-11-26) Full Professor at Dublin City University, Ireland Dublin City University is seeking to make a strategic appointment to support existing academic and research leadership in the area of Multimodal Information Systems within the DCU School of Computing, home of the ADAPT Centre for Digital Content Technology.
| |||||
6-7 | (2020-11-28) Master2 Internship, INA, Bry sur Marne, FranceDétection de locuteur·rice actif·ve dans les flux télévisuelsSujet de Stage - M2 informatique ou école d?ingénieur
Mots clés (Fr): Détection de locuteur actif, analyse multimodale, traitement de la parole, détection du visage, apprentissage automatique, apprentissage profond, audiovisuel, humanités numériques, place des femmes dans les médias, indexation automatique, gender equality monitor Keywords (En): Active speaker detection, multimodal analysis, speech processing, face detection, machine learning, deep learning, audiovisual, digital humanities, women in media, automatic indexing, gender equality monitor ContexteL?Institut national de l?audiovisuel (INA) est en charge du Dépôt Légal de la télévision, de la radio et du web médias. À ce titre, l?INA capte en continu 170 chaînes de télévision et stocke plus de 20 millions d?heures de contenu audiovisuel.
Un processus d?indexation, généralement réalisé par des documentalistes, est nécessaire pour décrire les contenus audiovisuels et retrouver des documents au sein de ces grandes collections. Ce travail consiste, entre autres, à référencer les personnes apparaissant dans les programmes, les sujets évoqués, ou encore produire des résumés des documents. Les activités du service de la recherche de l?INA visent à automatiser certains processus d?indexation: soit en automatisant certaines tâches sans valeur ajoutée humaine (segmentation, repérage de noms propres dans l?image, etc.), soit en réalisant des tâches qui ne sont pas faites par les documentalistes (décompte exhaustif du temps de parole).
Le sujet proposé s?inscrit dans le cadre du projet Gender Equality Monitor (GEM), financé par l?Agence nationale de la recherche, qui vise à décrire les différences de représentation existant entre les femmes et les hommes dans les médias. Dans ce cadre, des campagnes d?indexation automatique massives des fonds INA ont permis de créer de nouvelles connaissances en science humaines en se fondant sur le temps de parole, le temps d?exposition visuelle, ou encore le contenu des incrustations texte [Dou19a, Dou19b, Dou20].
L?amélioration des systèmes d?indexation automatique nécessite de mettre au point des bases d?exemples représentatives de la diversité des matériaux traités, utilisées pour l?entraînement et l?évaluation. La constitution de bases d?exemples est un enjeu stratégique pour la conception de systèmes fondés sur des processus d?apprentissage automatique et des stratégies d?automatisation de constitution des bases peuvent être envisagées [Sal14]. ObjectifsLa détection de locuteur·rice actif·ve (DLA) est une tâche d?analyse multimodale qui consiste à analyser une vidéo, déterminer si les mouvements d?un des visages apparaissant à l?écran correspondent au signal de parole contenu dans la piste audio. La conception de système DLA peut-être envisagée à l?aide d?approches non supervisées [Chun16] ou supervisées [Rot20]. La DLA répond à plusieurs problématiques métier rencontrées par l?INA.
L?objectif général du stage consiste à mettre au point un système DLA, l?évaluer par rapport aux implémentations open-source existantes, le déployer sur les fonds INA pour constituer des bases d?exemples permettant d?améliorer les logiciels inaSpeechSegmenter et inaFaceGender. En fonction des résultats obtenus, le stage peut déboucher sur une diffusion open-source du système réalisé et/ou une publication scientifique. Compétences requises
Conditions du stageLe stage se déroulera sur une période de 4 à 6 mois au sein du service de la Recherche de l?Ina et pourra débuter à partir de Janvier 2021. Il aura lieu sur le site Bry2, situé au 18 Avenue des frères Lumière, 94366 Bry-sur-Marne. Le stagiaire sera encadré par David Doukhan, Ingénieur R&D au service de la recherche et coordinateur du projet GEM. ContactLes candidat·e·s intéressé·e·s peuvent contacter David Doukhan (ddoukhan@ina.fr) pour plus d?informations, ou directement adresser par courriel une lettre de candidature incluant un Curriculum Vitae. Bibliographie
[Chun16] Chung, J. S., & Zisserman, A. (2016). Out of time: automated lip sync in the wild. In Asian conference on computer vision (pp. 251-263). Springer, Cham.
[Dou18] Doukhan, D., Carrive, J., Vallet, F., Larcher, A., & Meignier, S. (2018). An open-source speaker gender detection framework for monitoring gender equality. In ICASSP (pp. 5214-5218).
[Dou19a] Doukhan, D. (2019) À la radio et à la télé, les femmes parlent deux fois moins que les hommes. La revue des médias
[Dou19b] Doukhan, D., Rezgui, Z., Poels, G., & Carrive, J. (2019). Estimer automatiquement les différences de représentation existant entre les femmes et les hommes dans les médias.
[Dou20] Doukhan, D., Méadel, C., Coulomb-Gully, M. (2020) En période de coronavirus, la parole d?autorité dans l?info télé reste largement masculine. La revue des médias
[Nag20] Nagrani, A., Chung, J. S., Xie, W., & Zisserman, A. (2020). Voxceleb: Large-scale speaker verification in the wild. Computer Speech & Language, 60, 101027.
[Rot20] Roth, J., Chaudhuri, S., Klejch, O., Marvin, R., Gallagher, A., Kaver, L., & Pantofaru, C. (2020). Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection. In ICASSP IEEE.
[Sal14] Salmon, F., & Vallet, F. (2014). An Effortless Way To Create Large-Scale Datasets For Famous Speakers. In LREC (pp. 348-352).
| |||||
6-8 | (2020-12-01) Funded PhD Position at University of Edinburgh, Scotland, UK Funded PhD Position at University of Edinburgh
PhD Position: Automatic Affective Behaviour Monitoring through speech and/or multimodal means while preserving user’s privacy
For details please visit: ………………………………………………………………………………………………………………………………………………………………………. About the Project The Advanced Care Research Centre at the University of Edinburgh is a new £20m interdisciplinary research collaboration aiming to transform later life with person centred integrated care The vision of the ACRC is to play a vital role in addressing the Grand Challenge of ageing by transformational research that will support the functional ability of people in later life so they can contribute to their own welfare for longer. With fresh and diverse thinking across interdisciplinary perspectives our academy students will work to creatively embed deep understanding, data science, artificial intelligence, assistive technologies and robotics into systems of health and social care supporting the independence, dignity and quality-of-life of people living in their own homes and in supported care environments. The ACRC Academy will equip future leaders to drive society’s response to the challenges of later life care provision; a problem which is growing in scale, complexity and urgency. Our alumni will become leaders in across a diverse range of pioneering and influential roles in the public, private and third sectors. Automatic affect recognition technologies can monitor a person’s mood and mental health by processing verbal and non-verbal cues extracted from the person’s speech. However, the speech signal contains biometric and other personal information which can, if improperly handled, threaten the speaker’s privacy. Hence there is a need for automatic inference and monitoring methods that preserve privacy for speech data in terms of collection, training of machine learning models and use of such models in prediction. This project will focus on research, implementation and assessment of solutions for handling of speech data in the user’s own environment while protecting their privacy. We are currently studying the use of speech in healthy ageing and care in combination with IoT/Ambient Intelligence technologies in a large research project. This project will build on our research in this area.
The goals of this PhD project are:
Training outcomes include machine learning methods for inference of mental health status, privacy-preserving machine learning and signal processing, and applications of such methods in elderly care.
| |||||
6-9 | (2020-12-03) 6 months internship at GIPSA-Lab, Grenoble, France Deep learning-based speech coding and synthesis in adverse conditions. Projet : Vokkero 2023 Type : Internship, 6 months, start of 2021 Offre : vogo-bernin-pfe-2 Contact : r.vincent@vogo.fr Keywords : Neural vocoding, deep-learning, speech synthesis, training dataset, normalisation. Résumé : The project consists in evaluating the performances of the LPCNet neural vocoder for speech coding and decoding under adverse conditions (noisy environment, varied speech style, etc.) and in proposing learning techniques to improve the quality of synthesis. 1 L’entreprise VOGO, le Gipsa-lab Vogo is an SME based in Montpellier, south of France : www.vogo-group.com. Vogo is the first Sportech listed on Euronext Growth and develops solutions that enrich the experience of fans and professionals during sporting events. Its brand Vokkero is specialized in the design and production of radio communication systems : www.vokkero.com. It offers solutions for teams working in very noisy environments and is notably a world reference in the professional sports refereeing market. Gipsa-lab is a CNRS research unit joint with Grenoble-INP (Grenoble Institute of Technology), and Université Grenoble Alpes. With 350 people, including about 150 doctoral students, Gipsa-lab is a multidisciplinary research unit developing both basic and applied researches on complex signals and systems. Gipsa-lab is internationally recognised for the research achieved in Automatic Control, Signal and Images processing, Speech and Cognition, and develops projects in the strategic areas of energy, environment, communication, intelligent systems, Life and Health and language engineering. 2 Le projet Vokkero 2023 Every 3 years, Vokkero renews its Hardware (radio, cpu) and Software (rte, audio processing) platforms, in order to design new generations of products. The project extends over several years of study and it is within this framework that the internship is proposed. In the form of a partnership with the Gipsalab, the project consists in the study of speech coding using « neural networks » approaches, in order to obtain performances not yet reached by classical approaches. The student will work at the GIPSA-lab in the CRISSP team of the Speech and Cognition cluster under the supervision of Olivier PERROTIN, research fellow at CNRS, and at the R&D of Vogo Bernin, with Rémy VINCENT, project leader on the Vogo side. 3 Context & Objectives The project consists in evaluating the performances of the LPCNet neural vocoder for speech coding and decoding under adverse conditions (noisy environment, varied speech style, etc.) and in proposing learning techniques to improve the quality of synthesis. 3.1 Context Vocoders (voice coders) are models that allow a speech signal to be first reduced to a small set of parameters (this is speech analysis or coding) and then reconstructed from these parameters (this is speech synthesis or decoding). This coding/decoding process is essential in telecommunication applications, where speech is coded, transmitted and then decoded at the receiver. The challenge is to minimise the quantity of information transmitted, while keeping the quality of the reconstructed speech signal as high as possible. Current techniques use high-quality speech signal models, with a constraint on algorithmic complexity to ensure real-time processes in embedded systems. Examples of Codecs widelay used are Speex (Skype) and its little brother, Opus (Zoom). A few orders of magnitude : OPUS converts a sampled stream at 16kHz into a bitstream at 16kbits (i.e. a compression ratio of 1 :16), the reconstructed signal is also at 16kHz and has 20ms of latency. Since 2016 a new type of vocoder has emerged, called neural vocoder. Based on deep neural network architectures, these are able to generate a speech signal from the classical input parameters of a vocoder, without a priori knowledge of an explicit speech model, but using machine learning. The first system, Google’s WaveNet [1], is capable of reconstructing a signal almost identical to natural speech, but at a very high computation cost (20 seconds to generate a sample, 16,000 samples per second). Since then, models have been simplified and are capable of generating speech in real time (WaveRNN [2], WaveGlow [3]). In particular, the LPCNet neural vocoder [4, 5], also developed by Mozilla, is able to convert a 16kHz sampled stream into a 4kbits bitstream, and reconstruct a 16kHz audio signal. This mix of super-compression combined with bandwidth extension leads to much higher equivalent compression ratios than 1 :16 ! However, the ability of these systems to generate high-quality speech has only been evaluated following training on large and homogeneous databases, i.e. 24 hours of speech read by a single speaker and recorded in a quiet environment [6]. On the other hand, in the application of Vokkero, speech is recorded in adverse conditions (very noisy environment), and presents a significant variability (spoken voice, shouted voice, multiplicity of referees, etc.). Is a neural vocoder trained on a read speech database capable of decoding speech of this type? If not, is it possible to train the model on such data, while they are only available in small quantities ? The aim of this internship is to explore the limits of the LPCNet vocoder in application to the decoding of referee speech. Various learning strategies (curriculum training, transfer learning, learning on augmented data, etc.) will then be explored to try to adapt pre-trained models to our data. 3.2 Tasks The student will evaluate the performance of a pre-trained LPCNet vocoder on referee speech data, and will propose learning strategies to adapt the model to this new data, in a coding/re-synthesis scena rio : 1. Get familiar with the system, performance evaluation on an audio-book database (baseline) ; 2. Evaluation of LPCNet on the Vokkero database and identification of the limits (ambient noise, pretreatments, voice styles, etc.) ; 3. Study of strategies to improve system performance by data augmentation : — Creation of synthetic and specific databases : noisy atmospheres, shouted voices ; — Recording campaigns on Vokkero systems, in anechoic rooms and/or real conditions if the sanitary situation allows it ; — Comparison of the 2 approaches according to various learning strategies to learn a new model from this data. 3.3 Required Skills The student is expected to have a solid background in speech signal processing and an interest in Python development. Experience in programming deep learning models in Python is a plus. The student is expected to show curiosity for research, scientific rigour in methodology and experimentation, and show autonomy for technical and organisational aspects. Depending on the candidate’s motivation, and subject to obtaining funding, it is possible to pursue this topic as a PhD thesis. The student will be able to subscribe to the company’s insurance system, will have luncheon vouchers and will receive a monthly gratuity of 800€. Références [1] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. W. Senior et K. Kavukcuoglu, “WaveNet : A Generative Model for Raw Audio”, CoRR, t. abs/1609.03499, 2016. arXiv : 1609.03499 (cf. p. 1). [2] N. Kalchbrenner, E. Elsen, K. Simonyan, S. Noury, N. Casagrande, E. Lockhart, F. Stimberg, A. van den Oord, S. Dieleman et K. Kavukcuoglu, “Efficient Neural Audio Synthesis”, CoRR, t. abs/1802.08435, 2018. arXiv : 1802.08435 (cf. p. 1). [3] R. Prenger, R. Valle et B. Catanzaro, “Waveglow : A Flow-based Generative Network for Speech Synthesis”, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK : IEEE, mai 2019, p. 3617-3621 (cf. p. 1). [4] J.-M. Valin et J. Skoglund, “LPCNET : Improving Neural Speech Synthesis through Linear Prediction”, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), sér. ICASSP ’19, Brighton, UK : IEEE, mai 2019, p. 5891-5895 (cf. p. 1). [5] ——, “A Real-Time Wideband Neural Vocoder at 1.6kb/s Using LPCNet”, in Proceedings of Interspeech, Graz, Austria : ISCA, sept. 2019, p. 3406-3410 (cf. p. 1). [6] P. Govalkar, J. Fischer, F. Zalkow et C. Dittmar, “A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction”,
| |||||
6-10 | (2020-12-04) 6 mois de post-doctorat, IRIT, Toulouse, France Dans le cadre du projet interdisciplinaire INGPRO, projet Région Occitanie sur l'étude de l'Incidence des Gestes sur la PROnonciation, l'IRIT (équipe SAMoVA https://www.irit.fr/SAMOVA/site/) propose 6 mois de post-doctorat pour travailler sur le traitement des donne?es de parole (évaluation manuelle et automatique) recueillies lors d'une expe?rimentation qui se de?roulera au printemps 2021. Cette expe?rimentation implique le recueil de donne?es orales dans différentes conditions expe?rimentales ainsi que l'analyse des données collectées. Ce travail se fera en collaboration avec la société Archean Technologie (http://www.archean.tech/archean-labs-en.html) et le laboratoire Octogone de l'UT2J (https://octogone.univ-tlse2.fr/), partenaires du projet. Si vous êtes intéressé.e, vous trouverez ci-dessous le détail de l'offre proposée. Offre : https://www.irit.fr/SAMOVA/site/wp-content/uploads/2020/12/Ficheposte_PostDoc_INGPRO_2021.pdf Des compléments sur le projet INGPRO sont accessibles ici : https://www.irit.fr/SAMOVA/site/projects/current/ingpro/
Poste à pourvoir : CDD (post-doc catégorie A)
| |||||
6-11 | (2020-12-05) 5/6 months Internship, LIS-Lab, Université Aix-Marseille, France Deep learning for speech perception
| |||||
6-12 | (2020-12-06) chercheur mi-temps, Université de Mons, Belgique Le laboratoire de phonétique de l?UMONS (https://sharepoint1.umons.ac.be/FR/universite/facultes/fpse/serviceseetr/sc_langage/Pages/default.aspx), rattaché à l?Institut de Recherches en Sciences du Langage (https://langage.be/), est un labo SHS partenaire d?un projet R&D porté par la société https://roomfourzero.com/ et financé par la DG06 Wallonie (https://recherche-technologie.wallonie.be/): le projet « SALV », pour « sens à la voix ».
Pour mener à bien ce projet, nous recrutons un chercheur à mi-temps pour une première période de 6 mois ou à temps plein pour une première période de 3 mois à partir de janvier 2021 (avec possibilité d?extension à 1 ou 2 ans).
La mission est double :
Les modalités pratiques de l?engagement sont à ce stade très flexibles, à discuter avec les possibilités/souhaits du candidat retenu. Le salaire est fixé selon le diplôme (idéalement : doctorat en informatique ; autres formations : à discuter) et l?expérience.
Si vous êtes intéressé.e ou si vous voulez simplement en savoir plus : veronique.delvaux@umons.ac.be
**************** SALV
Room 40 (https://roomfourzero.com/) est une société jeune et dynamique qui fournit un ensemble de produits et services incluant notamment l?analyse en temps réel et la détection d?anomalie dans des flux audios et vidéos, ainsi que l?analyse contextuelle fine, basée sur le concept d?ontologie, des significations explicites et implicites de textes et bribes de textes tels que ceux échangés sur les réseaux sociaux ou via SMS.
Le projet SALV a pour objectif de développer une technologie de transcription et d?analyse de contenu vocal incluant un ensemble d?informations paralinguistiques contextuelles (émotions, stress, attitudes). Il se base en partie sur l?utilisation de technologies d?analyse de texte en temps réel et d?ontologies spécifiques que Room 40 a sous licence et commercialise déja?. Pour ce projet, de nouveaux outils de transcription de parole en texte devront e?tre développés afin d?intégrer ces informations paralinguistiques sous forme de métadonnées au texte de la transcription. La conjonction de ces deux types d?information devrait grandement améliorer la qualité du résultat de l?analyse.
Au terme du projet, on vise une solution d?analyse de parole comportant: (i) une approche pour retranscrire des paroles en texte; (ii) une méthode pour annoter le texte de métadonnées reprenant des éléments paralinguistiques; (iii) un syste?me pour l?analyse pertinente du contenu audio combinant la retranscription et des éléments paralinguistiques; enfin (iv) une couche applicative intégrant les éléments ci-dessus et comprenant des algorithmes d?analyse de contenu, des interfaces graphiques spécifiques a? certains segments de marché, ainsi qu?un nombre d?APIs garantissant l?interopérabilité du syste?me avec les infrastructures existantes des partenaires.
| |||||
6-13 | (2020-12-07) Internships at IRIT, Toulouse, France L?équipe SAMoVA de l?IRIT à Toulouse propose plusieurs stages de fin d?étude (M2, ingénieur) en 2021 :
| |||||
6-14 | (2020-11-20) 6 month-Internship, Ludo-Vic SAS, Paris France 6 mois de stage Master 2 Détection de baisse d’engagement VUE D'ENSEMBLE L’objectif principal de ce stage est la détection de baisse d’engagement durant une interaction avec nos agents conversationnels. La solution peut être trouvée en utilisant des modèles à base de règles ou en utilisant des techniques de machine/deep learning [1, 2, 3, 4, 5]. OBJECTIFS 1. Analyser le comportement (mouvement de tête, émotion, ...) pour trouver les caractéristiques de baisse d’engagement. 2. Modélisation et détection de la baisse d’engagement 3. Evaluation 4. Application en temps réel Conditions du stage Le stage se déroulera sur une période de 6 mois dans le département R&D du Ludo-Vic SAS. Des outils de travail à distance sont disponibles au sein de l’entreprise. Profil recherché Bac +5 dans le domaine de l’informatique et de l'IA. Capacité à réaliser des interactions et des animations 3D. Expérience avec Unity3D et compétence en langage C# sont un vrai plus. Rémunération : conditions standards de rémunération de stage. CONTACTS ET CANDIDATURE Merci d’envoyer votre CV (vos relevés de notes, vos rapports de projets/stages…) à : - Jack Amberg : jack@ludo-vic.com - Atef Ben-Youssef : atef@ludo-vic.com
| |||||
6-15 | (2020-12-16) Master 2 / PFE internship at GIPSA-lab (Grenoble) Stage MASTER / PFE 2020-2021 REAL-TIME SILENT SPEECH SYNTHESIS BASED ON END-TO-END DEEP LEARNING MODELS Context Various pathologies affect the voice sound source, i.e. the vibration of the vocal folds, thus preventing any sound production despite the normal functioning of articulators (movements of the jaw, tongue, lips, etc.): this is known as silent speech. Silent speech interfaces [Denby et al., 2010] consist in converting inaudible cues such as articulators movements into an audible speech signal to rehabilitate the speaker’s voice. At GIPSA-lab, we have a system for measuring articulators using ultrasound imaging and video and for converting this data into acoustic parameters that describe a speech signal, using machine learning [Hueber and Bailly, 2016, Tatulli and Hueber, 2017]. The speech signal is then reconstructed from the predicted acoustic parameters using a vocoder [Imai et al., 1983]. Current silent speech interfaces have two main limitations: 1) The intonation (or speech melody), normally produced by the vibration of the vocal folds, is absent in the considered type of pathologies and is difficult to reconstruct from articulatory information only; 2) The generated speech quality is often limited by the type of vocoder used. While the recent emergence of neural vocoders has allowed a leap in the quality of speech synthesis [van den Oord et al., 2016], they have not yet been integrated into silent speech interface, where the constraint of real-time generation is crucial. Objectives Mapping We propose in this internship to address these two problems, by implementing an end-to-end silent speech synthesis system with deep learning models. In particular, it will consist in interfacing our system for articulation measurement and acoustic parameter generation with the LPCNet neural vocoder [Valin and Skoglund, 2019]. The latter takes asinput acoustic parameters coming from articulation on the one hand, and the intonation on the other hand. This distinction offers the possibility of decorrelating both controls, by proposing a gestural control of the intonation for example [Perrotin, 2015]. Regarding the acoustic parameters, the first step will be to adapt the acoustic output of our system to match theinput of LPCNet. Moreover, LPCNet is trained by default on acoustic parameters extracted from natural speech, forwhich large databases are available. However, the acoustic parameters predicted from silent speech are degraded, and produced in small quantities. We will thus study the robustness of LPCNet to a degraded input, and several re-training strategies (adaptation of LPCNet to new data, end-to-end learning, etc.) will be explored. Once the system is functional, the second part of the internship will consist in implementing the system in real-time, so that the speech s ynthesis is generated synchronously with the user’s articulation. All stages of implementation (learning strategies, real-time system) will be evaluated in terms of intelligibility, sound quality, and intonation reconstruction. Tasks The tasks expected during this internship are: Implement the full silent speech synthesis pipeline by interfacing the lab ultrasound system with LPCNet, and explore training strategies. Evaluate the performance of the system regarding speech quality and reconstruction errors. Implement and evaluate a real-time version of the system. Required skills Signal processing and machine learning. Knowledge of Python and C is required for implementation. Knowledge of Max/MSP environment would be a plus for real-time implementation. Strong motivation for methodology and experimentation. Allowance The internship allowance is fixed by ministerial decree (about 570 euros / month). Grenoble Images Parole Signal Automatique UMR CNRS 5216 – Grenoble Campus 38400 Saint Martin d’Hères - FRANCE Stage MASTER / PFE 2020-2021 Contact Olivier PERROTIN + 33 4 76 57 45 36 olivier.perrotin@grenoble-inp.fr Thomas HUEBER + 33 4 76 57 49 40 thomas.hueber@grenoble-inp.fr References [Denby et al., 2010] Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J. M., and Brumberg, J. S. (2010). Silent speech interfaces. Speech Communication, 52(4):270–287. [Hueber and Bailly, 2016] Hueber, T. and Bailly, G. (2016). Statistical conversion of silent articulation into audible speech using fullcovariance hmm. Computer Speech & Language, 36(Supplement C):274–293. [Hueber et al., 2010] Hueber, T., Benaroya, E.-L., Chollet, G., Denby, B., Dreyfus, G., and Stone, M. (2010). Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Communication, 52(4):288–300. [Imai et al., 1983] Imai, S., Sumita, K., and Furuichi, C. (1983). Mel log spectrum approximation (mlsa) filter for speech synthesis. Electronics and Communications in Japan (Part I: Communications), 66(2):10–18. [Perrotin, 2015] Perrotin, O. (2015). Chanter avec les mains: Interfaces chironomiques pour les instruments de musique numériques. PhD thesis, Université Paris-Sud, Orsay, France. [Tatulli and Hueber, 2017] Tatulli, E. and Hueber, T. (2017). Feature extraction using multimodal convolutional neural networks for visual speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP ’17, pages 2971–2975, New Orleans, LA, USA. [Valin and Skoglund, 2019] Valin, J.-M. and Skoglund, J. (2019). Lpcnet: Improving neural speech synthesis through linear prediction. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), ICASSP ’19, pages 5891–5895, Brighton, UK. IEEE. [van den Oord et al., 2016] van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. W., and Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. CoRR, abs/1609.03499.
| |||||
6-16 | (2020-12-18) Ingenieur de recherche (CDD), Lab. ALAIA, France Le laboratoire Commun ALAIA, destiné à l'Apprentissage des langues Assisté par Intelligence Artificielle, recrute un ingénieur de recherche en CDD (12 mois avec prolongation possible). Le travail à réaliser se fera en coordination avec les deux partenaires impliqués dans le LabCom : l'IRIT (Institut de Recherche en Informatique de Toulouse) et la société Archean Technologies et plus particulièrement son pôle R&D Archean Labs (Montauban 82). ALAIA est centré sur l'expression et la compréhension orale d'une langue étrangère (L2). Les missions consisteront à concevoir, développer et intégrer des services innovants basés sur l'analyse des productions des apprenants L2, la détection et la caractérisation d'erreurs allant du niveau phonétique au niveau linguistique. Les compétences attendues portent sur le traitement automatique de la parole et du langage, le machine learning et sont indispensables pour être opérationnel dès la prise de fonction. De bonnes connaissances en développement d'applications web seraient un plus.
Les candidatures sont à adresser à Isabelle Ferrané (isabelle.ferrane@irit.fr) et Lionel Fontan (lfontan@archean.tech). N'hésitez pas à nous contacter pour de plus amples informations.
| |||||
6-17 | (2021-01-03) Postdoc researcher, Seikei University, Japan We are seeking a highly motivated and ambitious post-doctoral researcher for the project of ?socially and culturally aware human-agent interaction,? led by Prof. Yukiko Nakano at Seikei University in Tokyo, Japan. The project is part of a larger government funded project for a human-avatar symbiotic society. The mission of our group (http://iui.ci.seikei.ac.jp/en/) is to research and develop technologies for human-agent/robot interaction, behavior generation, and behavior adaptation by focusing on social and cultural awareness.
| |||||
6-18 | (2021-01-04) Open positions at IDIAP, Martigny, Suisse There is a fully funded PhD position open at Idiap Research Institute on 'Neural
| |||||
6-19 | (2021-01-04) research scientist in spoken language processing at Naver Labs Europe, Grenoble, France We are seeking to recruit a research scientist in spoken language processing at Naver Labs Europe (Grenoble, France) - https://europe.naverlabs.com
More details below (you can apply online here as well)
DESCRIPTIONNAVER LABS Europe's mission is to create new ways to interact with digital and physical agents, while paving the way for these innovations into a number of NAVER flagship products and services. This includes research in models and algorithms to give humans faster and better access to data and to allow them to interact with technology in simpler and more natural ways. To fulfill our vision of intelligent devices communicating seamlessly with us, we need to considerably improve existing technology and methods that solve natural language processing problems. We are looking for applications from research scientists to make outstanding contributions to the invention, development and benchmarking of spoken language processing techniques. The research scientist would be part of the Natural Language Processing group of NAVER LABS Europe and her mission would be to develop research on one or more of the following themes: spoken language translation, speech recognition, text-to-speech synthesis, voice-based conversational search (with potential collaborations with the Search&Recommendation group). At NAVER LABS we encourage participation in the academic community. Our researchers collaborate closely with universities and regularly publish in venues such as ACL, EMNLP, Interspeech, KDD, SIGIR, ICLR, ICML and NeurIPS. REQUIRED SKILLS- Ph.D. in spoken language processing, speech processing, NLP or machine learning.
- Knowledge of latest developments in statistical and deep learning as applied to NLP and speech. - Strong publication record in top-tier NLP, speech or machine learning conferences. - Strong development skills, preferably in python and knowledge of relevant frameworks (tensorflow, pytorch, etc). APPLICATION INSTRUCTIONSYou can apply for this position online. Don't forget to upload your CV and cover letter before you submit. Incomplete applications will not be accepted.
ABOUT NAVER LABSNAVER LABS Europe has full-time positions, PhD and PostDoc opportunities throughout the year which are advertised here and on international conference sites that we sponsor such as CVPR, ICCV, ICML, NeurIPS, EMNLP, ACL etc. NAVER LABS Europe is an equal opportunity employer. NAVER LABS are in Grenoble in the French Alps. We have a multi and interdisciplinary approach to research with scientists in machine learning, computer vision, artificial intelligence, natural language processing, ethnography and UX working together to create next generation technology and services that deeply understand users and their contexts.
| |||||
6-20 | (2021-01-07) Speech-NLP Master 2 Internship Year 2020-2021 at LISN (ex LIMSI), University Paris-Saclay, France Speech-NLP Master 2 Internship Year 2020-2021 Speech Segmentation and Automatic Detection of Conflicts in Political Interviews LISN – Université Paris-Saclay Internship for Last Year Engineer or Master 2 Students Keywords: Machine Learning, Diarization, Digital Humanities, Political Speech, Prosody, Expressive Speech Context This internship is part of the Ontology and Tools for the Annotation of Political Speech (OOPAIP 2018), a transdisciplinary project funded under the DIM-STCN (Text Sciences and New Knowledge) by the Regional Council of Ile de France. The project is carried out by the European Center for Sociology and Political Science (CESSP) of the University of Paris 1 Panthéon-Sorbonne, the National Audiovisual Institute (INA), and the LISN. Its objective is to design new approaches to develop detailed, qualitative, and quantitative analyzes of political speech in the French media. Part of the project concerns the study of the dynamics of conflicting interactions in interviews and political debates, which requires a detailed description and a large corpus to allow for the models’ generalization. Some of the main challenges concern the performance of speaker and speech style segmentation, e.g., improving the segmentation accuracy, detecting superimposed speech, measuring vocal effort and other expressive elements. Objectives The main objective of the internship is to improve the automatic segmentation of political interviews. In this context, we will be particularly interested in the detection of hubbub (strong and prolonged overlapped speech). More precisely, we would like to extract features from the speech signal (Eyben et al. 2015) correlated with the level of conflictual content in the exchanges, based, for example, on the arousal level in the speaker’s voice—intermediate level between the speech signal analysis and the expressivity description (Rilliard, d’Alessandro, and Evrard 2018)—or vocal effort (Liénard 2019). The internship will initially be based on two corpora of 30 political interviews manually annotated in speech turns and speech acts—within the framework of the OOPAIP project. It will begin with a state of the art review of speech diarization and overlapped speech detection (Chowdhury et al. 2019). The aim will then be to propose solutions based on recent frameworks (Bredin et al. 2020) to improve the precise localization of speaking segments, in particular when the frequency of speaker changes is high. In the second part of the internship, we will look at a more detailed measurement and prediction of the conflicting level of exchanges. We will search for the most relevant features to describe the conflicting level and by adapting or developing a neural network architecture for its modeling. The programming language used for this internship will be Python. The candidate will have access to the LISN computing resources (servers and clusters with recent generation GPUs).
Publications Depending on the degree of maturity of the work carried out, we expect the applicant to: • Distribute the tools produced under an open-source license • Write a scientific publication Conditions The internship will take place over a period of 4 to 6 months at the LISN (formerly LIMSI) in the TLP group (spoken language processing). The laboratory is located near the plateau de Saclay, university campus building 507, rue du Belvédère, 91400 Orsay. The candidate will be supervised by Marc Evrard (evrard@limsi.fr). Allowance under the official standards (service-public.fr). Applicant profile • Student in the last year of a 5-years diploma in the field of computer science (AI is a plus) • Proficiency in Python language and experience in using ML libraries (Scikit-Learn, Tensor- Flow, PyTorch) • Strong interest in digital humanities and political science in particular • Experience in automatic speech processing is preferred • Ability to carry out a bibliographic study from scientific articles written in English To apply: Send an email to evrard@limsi.fr including a résumé and a cover letter. Bibliography Bredin, Hervé, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, and Marie-Philippe Gill. 2020. “Pyannote. Audio: Neural Building Blocks for Speaker Diarization.” In ICASSP. IEEE. Chowdhury, Shammur Absar, Evgeny A Stepanov, Morena Danieli, and Giuseppe Riccardi. 2019. “Automatic Classification of Speech Overlaps: Feature Representation and Algorithms.” Computer Speech & Language 55: 145–67. Eyben, Florian, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y Devillers, et al. 2015. “The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing.” IEEE Transactions on Affective Computing 7 (2): 190–202. Liénard, Jean-Sylvain. 2019. “Quantifying Vocal Effort from the Shape of the One-Third Octave Long-Term-Average Spectrum of Speech.” The Journal of the Acoustical Society of America 146 (4): EL369–75. OOPAIP. 2018. “(Ontologie Et Outil Pour l’annotation Des Interventions Politiques).” DIM STCN (Sciences du Texte et connaissances nouvelles) Conseil régional d’Ile de France. http://www.dim-humanites-numeriques.fr/projets/oopaip-ontologie-et-outilspour- lannotation-des-interventions-politiques/. Rilliard, Albert, Christophe d’Alessandro, and Marc Evrard. 2018. “Paradigmatic Variation of Vowels in Expressive Speech: Acoustic Description and Dimensional Analysis.” The Journal of the Acoustical Society of America 143 (1): 109–22.
| |||||
6-21 | (2021-01-13) PhD position at CWI, Amsterdam, The Netherlands We have a PhD position available here in CWI, on the topic of user-centered optimisation for immersive media.
The full ad, including the link to apply, can be found here: https://www.cwi.nl/jobs/vacancies/868111
I would like to ask you if you could disseminate the call within your network. You can also redirect any potential candidate to me, if they have any questions: irene@cwi.nl
The deadline for applications is February 1st.
| |||||
6-22 | (2021-01-19) Associate professor, Telecom Paris, France Telecom Paris is hiring an associate professor in machine learning for distributed/multi-view machine listening and audio content analysis
| |||||
6-23 | (2021-02-15) Ingenieur contractuel Police Technique et Scientifique France
Un poste d'ingénieur contractuel à la section audio de la police technique et scientifique est à pourvoir.
| |||||
6-24 | (2021-03-08) Fully funded PhD at KTH, Stockholm, Sweden A fully funded PhD position in Deep Learning for Conversational AI KTH, Royal Institute of Technology, Stockholm, Sweden. Apply here (deadline 2/4) https://www.kth.se/en/om/work-at-kth/lediga-jobb/what:job/jobID:379667/where:4/
| |||||
6-25 | (2021-03-08) PhD and RA positions at University of Trento, Italy PhD and RA Positions in Conversational AI in the Health Domain? at University of Trento, Italy
and add this link :
| |||||
6-26 | (2021-03-08) Two PhD positions at NTNU, Trondheim, Norway. Two Two PhD positions are open at NTNU Trondheim, Norway
| |||||
6-27 | (2021-03-09) Associate professor at Telecom Paris, France
Note that you do *not* need to speak French to apply.
| |||||
6-28 | (2021-03-16) PhD position at INRIA, Nancy, France ********** PhD position *************
Title: Robust and Generalizable Deep Learning-based Audio-visual Speech EnhancementThe PhD thesis will be jointly supervised by Mostafa Sadeghi (Inria Starting Faculty Position) and Romain Serizel (Associate Professor, Université de Lorraine).
Contacts: Mostafa Sadeghi (mostafa.sadeghi@inria.fr) and Romain Serizel (romain.serizel@loria.fr)
Context: Audio-visual speech enhancement (AVSE) refers to the task of improving the intelligibility and quality of a noisy speech utilizing the complementary information of visual modality (lips movements of the speaker) [1]. Visual modality can help distinguish target speech from background sounds especially in highly noisy environments. Recently, and due to the great success and progress of deep neural network (DNN) architectures, AVSE has been extensively revisited. Existing DNN-based AVSE methods are categorized into supervised and unsupervised approaches. In the former category, a DNN is trained to map noisy speech and the associated video frames of the speaker into a clean estimate of the target speech. The unsupervised methods [2] follow a traditional maximum likelihood-based approach combined with the expressive power of DNNs. Specifically, the prior distribution of clean speech is learned using deep generative models such as variational autoencoders (VAEs) and combined with a likelihood function based on, e.g., non-negative matrix factorization (NMF), to estimate the clean speech in a probabilistic way. As there is no training on noisy speech, this approach is unsupervised. Supervised methods require deep networks, with millions of parameters, as well as a large audio-visual dataset with diverse enough noise instances to be robust against acoustic noise. There is also no systematic way to achieve robustness to visual noise, e.g., head movements, face occlusions, changing illumination conditions, etc. Unsupervised methods, on the other hand, show a better generalization performance and can achieve robustness to visual noise thanks to their probabilistic nature [3]. Nevertheless, their test phase involves a computationally demanding iterative process, hindering their practical use.
Objectives: Project description: In this PhD project, we are going to bridge the gap between supervised and unsupervised approaches, benefiting from both worlds. The central task of this project is to design and implement a unified AVSE framework having the following features: 1- Robustness to visual noise, 2- Good generalization to unseen noise environments, and 3- Computational efficiency at test time. To achieve the first objective, various techniques will be investigated, including probabilistic switching (gating) mechanisms [3], face frontalization [4], and data augmentation [5]. The main idea is to adaptively lower bound the performance by that of audio-only speech enhancement when the visual modality is not reliable. To accomplish the second objective, we will explore techniques such as acoustic scene classification combined with noise modeling inspired by unsupervised AVSE, in order to adaptively switch to different noise models during speech enhancement. Finally, concerning the third objective, lightweight inference methods, as well as efficient generative models, will be developed. We will work with the AVSpeech [6] and TCD-TIMIT [7] audio-visual speech corpora.
References: [1] D. Michelsanti, Z. H. Tan, S. X. Zhang, Y. Xu, M. Yu, D. Yu, and J. Jensen, ?An overview of deep-learning based audio-visual speech enhancement and separation,? arXiv:2008.09586, 2020. [2] M. Sadeghi, S. Leglaive, X. Alameda-Pineda, L. Girin, and R. Horaud, ?Audio-visual speech enhancement using conditional variational auto-encoders,? IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 28, pp. 1788 ?1800, 2020. [3] M. Sadeghi and X. Alameda-Pineda, ?Switching variational autoencoders for noise-agnostic audio-visual speech enhancement,? in ICASSP, 2021. [4] Z. Kang, M. Sadeghi, R. Horaud, ?Face Frontalization Based on Robustly Fitting a Deformable Shape Model to 3D Landmarks,? arXiv:2010.13676, 2020. [5] S. Cheng, P. Ma, G. Tzimiropoulos, S. Petridis, A. Bulat, J. Shen, M. Pantic, ?Towards Pose-invariant Lip Reading,? in ICASSP, 2020. [6] A. Ephrat, I. Mosseri, O. Lang, T. Dekel, K. Wilson, A. Hassidim, W.T. Freeman, M. Rubinstein, ?Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation,? SIGGRAPH 2018. [7] N. Harte and E. Gillen, ?TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech,? IEEE Transactions on Multimedia, vol.17, no.5, pp.603-615, May 2015. Skills:
Benefits package:
Remuneration:Salary: 1982? gross/month for 1st and 2nd year. 2085? gross/month for 3rd year. Monthly salary after taxes: around 1596,05? for 1st and 2nd year. 1678,99? for 3rd year. (medical insurance included).
| |||||
6-29 | (2021-03-20) Post-doc at Nara Institute for Science and Technology, Japan [Postdoctoral researcher, Nara Institute of Science and Technology]
| |||||
6-30 | (2021-04-05) Researchers in Speech, Text and Multimodal Machine Translation @ DFKI Saarbrücken, Germany Researchers in Speech, Text and Multimodal Machine Translation @ DFKI Saarbrücken, Germany
| |||||
6-31 | (2021-04-02) PhD at Université d'Avignon, France **** If you don't read French and are interested in a PhD position in AI/NLP please Les réponses doivent nous parvenir de préférence **avant le 10 mai**. PROPOSITION SUJETS DE THESES CONTRATS DOCTORAUX 2021-2024 Appel cible (merci de cocher la case correspondante): X Contrat doctoral ministeriel ED 536 □ Contrat doctoral ministeriel ED 537 ------------------------------------------------------------------------------------------------------------------------ Directeur de these : Fabrice LEFEVRE Co-directeur eventuel : Co-encadrant eventuel : Bassam JABAIAN Titre en francais : Transformer et renforcer pour le transfert et l’apprentissage en ligne des agents conversationnels vocaux Titre en anglais : Transformer and Reinforce for transfer and online learning of vocal conversational agents Mots-cles : IA, natural language processing , human-machine vocal interactions, deep learning, deep reinforcement learning, transfer learning Co tutelle : XXX - Non Pays : Opportunites de mobilite a l’international du doctorant dans le cadre de sa these : oui Profil du candidat : Le candidat doit avoir un master en informatique avec une composante sur les méthodes d'apprentissage automatique et/ou sur l’ingénierie de la langue. La bourse de thèse fera l’objet d’un concours au sein de l’Ecole Doctorale 536 de l’université d’Avignon, avec une audition du candidat retenu par les encadrants de thèse. Pour postuler merci d’envoyer un mail avant le 10 mai 2021 à Fabrice Lefèvre (fabrice.lefevre@univ-avignon.fr) et Bassam Jabaian (bassam.jabaian@univ-avignon.fr) incluant : votre CV, une lettre de motivation avec votre positionnement sur les propositions d’études ci-dessous, d’éventuelles lettres de recommandation et vos relevés de notes. Presentation detaillee du sujet : Domaine / Thematique : IA/NLP Objectif : Permettre le transfert et l'apprentissage en ligne des agents conversationnels vocaux avec une combinaison Transformers/Renforcement Contexte et enjeux : Parmi les activités de recherche en intelligence artificielle, améliorer l'interaction vocale avec les machines reste un défi majeur d’actualité. Le LIA traite de multiples aspects liés à l’interaction vocale mais cherche à travers cette thèse à approfondir en particulier la recherche sur les techniques d’apprentissage des agents conversationnels vocaux à base de réseaux de neurones profonds supervisés et renforcés. De tels agents dialoguant sont un enjeu primordial afin d’améliorer les capacités de nos sociétés à gérer une distanciation sociale contrôlée, notamment par la délégation de certaines tâches risquées à des artefacts matériels efficients, et bien acceptés par le grand public. Les récentes évolutions en réseaux de neurones ont permis d’élaborer des systèmes de génération de texte (ou modèles de langage) de grande qualité. Ils sont pour cela appris sur des quantités gigantesques de documents, mais permettent en contrepartie une couverture très large du langage humain. Les représentants les plus avancés dans ce domaine sont les Transformers, qui permettent d’éliminer le recours à la récurrence dans les réseaux (couteux en calcul) en privilégiant un mécanisme d’attention démultipliée (multi-head self-attention). De nombreux dérivés de ces modèles existent et ont permis des gains conséquents en performance sur de nombreuses tâches impliquant la génération de texte en langage naturel. Ainsi BERT [1] et GPT forment les grandes familles (et leurs multiples descendants distilBERT, alBERT, GPT-2…). Mais si de tels modèles permettent de porter à un plus haut niveau de performance nos capacités de modélisation du langage, il reste encore à savoir les mettre en oeuvre pour des tâches plus spécifiques ou exigeantes, comme les systèmes d’interaction orale. Ainsi le problème de leur application au cas des agents conversationnels reste ouvert car à la fois l’interaction directe avec les humains accentue l’impact des erreurs et imperfections des modèles et d’autre part la gestion des interactions se fait dans un contexte finalisé, où l’objectif n’est pas le simple échange de données langagières mais la réussite d’un objectif latent (obtenir une information précise, réaliser ou faire réaliser une action…). Aussi le challenge principal que nous souhaitons porter dans la thèse est de permettre une adaptation sur une tache particuliere des capacites d’un Transformer pre-entraine, notamment pour l’elaboration d’un agent conversationnel. Des approches par transfert d’apprentissage ont déjà été initiées mais leurs résultats sont contrastés et doivent être renforcés [2]. Nous identifions deux axes majeurs pour la thèse : Axe 1/ Transfert et apprentissage en ligne / Tout d’abord les approches de transfert reposent toujours sur le recours à de nouvelles données pré-collectées auxquelles sont confrontés les modèles [2]. Ainsi, dans la continuité de nos précédents travaux sur l’apprentissage en ligne des systèmes de dialogue, nous souhaiterions élaborer et évaluer des strategies efficaces pour permettre le recours a des apprentissages par renforcement [3, 4]. Pour rendre les systèmes artificiels capables d'apprendre à partir des données, deux hypothèses fortes sont généralement faites : (1) la stationnarité du système (l'environnement de la machine ne change pas avec le temps), (2) l'interdépendance entre la collecte des données et le processus d'apprentissage (l'utilisateur ne modifie pas son comportement dans le temps). Or les utilisateurs ont une tendance naturelle à adapter leur comportement en fonction des réactions de la machine, ce qui gêne la convergence de l'apprentissage vers un équilibre lui permettant de satisfaire en permanence les attentes de l'utilisateur. Aussi les interfaces vocales doivent évoluer vers une nouvelle génération de systèmes interactifs, capables d'apprendre dynamiquement sur le long terme à partir d'interactions, tout en anticipant les variations du comportement des humains, étant eux-mêmes vu comme des systèmes évolutifs. L’enjeu est alors, dans le contexte de l’apprentissage par renforcement profond [5] de pouvoir démontrer l’optimalité de la convergence des algorithmes utilisés pour mettre à jour les poids de certaines couches du modèle au fur et à mesure des interactions avec des utilisateurs, sans prendre le risque d’une atténuation des performances initiales. La détermination optimale des paramètres à modifier doit pouvoir être automatisée. Ce projet s’inscrit aussi dans le cadre de l’apprentissage en continu (continual learning) [6] d’un agent conversationnel. Axe 2/ Modelisation de l’oral / Ensuite l’essentiel des modèles pré-cités modélisent exclusivement le langage écrit et intègrent peu de mécanismes dédiés à la nature du langage parlé. Aussi nous souhaiterions augmenter les capacités de telles machines à faire face à : 1) des entrées utilisateurs plus naturelles, et comprenant donc de nombreux écarts vis-à-vis de l’écrit (agrammaticalité, confusions, reprises, corrections, hésitations…) et 2) des erreurs dans les transcriptions dues au composant de reconnaissance de la parole. Il est donc nécessaire de pouvoir interfacer le composant d’analyse de la parole avec la chaine de modelisation du langage qui suit (analyse sémantique, suivi de l’état de dialogue, gestion du dialogue, génération et synthèse de parole) de manière à prendre en compte les multiples hypotheses realistes (et non plus seulement la meilleure). Et enfin permettre un arbitrage entre ces hypothèses qui prenne en compte les traitements suivants, en conformité avec le processus cognitif humain équivalent (capable de re-traiter ses hypothèses acoustiques les plus probables en cas de conflit avec ses inférences sémantiques). Cette étude pourra être menée dans plusieurs cadres applicatifs, à préciser au démarrage de la thèse : par exemple un robot Pepper dialoguant affecté à la gestion de l’accueil d’un lieu public (par exemple dans un hôpital ou un musée). Il sera alors possible de déléguer des tâches de premier contact et d’orientation à des artefacts insensibles aux transmissions biologiques, ce qui constitue un atout hautement stratégique afin d’améliorer la gestion d’une situation de crise, du type de la pandémie mondiale de coronavirus en cours. [1] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv.org, Oct. 2018. [2] T. Wolf, V. Sanh, J. Chaumond, and C. Delangue, “TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents,” arXiv.org, Jan. 2019. [3] E. Ferreira, B. Jabaian, and F. Lefèvre, “Online adaptative zero-shot learning spoken language understanding using word-embedding,” in Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, 2015, pp. 5321–5325. [4] M. Riou, B. Jabaian, S. Huet, and F. Lefèvre, “Joint On-line Learning of a Zero-shot Spoken Semantic Parser and a Reinforcement Learning Dialogue Manager,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019, 2019, pp. 3072–3076. [5] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “A Brief Survey of Deep Reinforcement Learning,” IEEE SIGNAL Process. Mag. Spec. ISSUE Deep Learn. IMAGE Underst., Aug. 2017. [6] Z. Chen and B. Liu, Lifelong Machine Learning, Second Edition, vol. 12, no. 3. Morgan & Claypool Publishers, 2018. Les sujets devront être adressés à secretariat-ed@univ-avignon.fr
| |||||
6-32 | (2021-04-15) Director, Center for Language and Speech Processing, Baltimore, MA, USA POSITION: Director, Center for Language and Speech Processing REPORTS TO: Ed Schlesinger, Benjamin T. Rome Dean Johns Hopkins University, Whiting School of Engineering INSTITUTION: Johns Hopkins University, Baltimore, MD https://engineering.jhu.edu/ 2.23.21 The Whiting School of Engineering at Johns Hopkins University invites nominations and applications for the position of Director of the Center for Language and Speech Processing (CLSP). The Director will be appointed as a full-time tenured faculty member in the Whiting School of Engineering and will be encouraged to remain active in research, with strategic leadership of the Center as their top priority. This is an outstanding opportunity for an accomplished scholar with leadership experience to further strengthen an exceptional interdisciplinary research center at the nation’s first research university. The best candidates will embody the intellectual distinction, entrepreneurial capacity, collaborative spirit, transparency, inclusiveness, and creativity that characterize the School’s culture and will bring a scholarly record deserving appointment as tenured professor at The Johns Hopkins University. The Center for Language and Speech Processing CLSP is one of the Whiting School’s 25 Interdisciplinary Centers and Institutes. The Center currently comprises over 25 tenure-line and research faculty whose primary appointments are in the Whiting School of Engineering or in other closely related schools, along with over 70 PhD students. CLSP was established in 1992 and grew to prominence under the directorship of the late Frederick Jelinek. It aims to understand how human language is used to communicate ideas, and to develop technology for machine analysis, translation, and transformation of multilingual speech and text. In 2007 CLSP gained a sibling, the national Human Language Technology Center of Excellence (https://hltcoe.jhu.edu), a governmentfunded research center at Johns Hopkins that develops critical speech and language technology for government use; several HLTCOE researchers are tightly integrated into CLSP. Recently, CLSP has further expanded its research portfolio by adding several prominent researchers in computer vision and related fields. As part of its educational mission, CLSP coordinates a full complement of courses dealing with a diverse array of topics in language and speech. It offers a weekly seminar featuring prominent visiting speakers in speech and language processing. It also runs the Fred Jelinek Memorial Workshop in Speech and Language Technology (JSALT), a widely-known residential research workshop that annually assembles teams of researchers from around the world to spend 6 summer weeks conducting intensive research on fundamental problems. Held annually since 1995, the workshop has produced many important advances in speech and language technology. Opportunities for the Center Director The CLSP Director will work with colleagues in and beyond CLSP to increase its impact by both enhancing its historic strengths and positioning it as a central element of a set of AI-related initiatives across the Whiting School and the University more broadly. To these ends, the Director will identify ways in which the Center will continue to grow and evolve and through which the Center, the Whiting School, and Hopkins can recruit, sustain, and deploy the human and financial resources needed to further distinguish itself.The Director will work to maintain the Center’s position as the disciplinary and intellectual hub of language and speech processing research within the University, enabling CLSP to contribute to and benefit from the success of significant institutional investment in artificial intelligence and machine learning more broadly, including potential applications to key societal problems such as healthcare and scientific endeavors such as linguistics and neuroscience. Collaborations with the Applied Physics Lab (www.jhuapl.edu) present opportunities to bring additional resource, expertise, and scale to advance CLSP research including potentially in classified research. Beyond Hopkins, CLSP’s Director will foster connections with industry as part of the Center’s efforts to expand its base of resources and relationships, to disseminate knowledge and discoveries, and to develop and transfer technologies that may have an impact in the world. In these various external activities, the Director will work with the University’s technology ventures office (https://ventures.jhu.edu), with faculty and students, and with alumni and donors. Specific strategies for enhancing CLSP’s strengths, broadening its impact, and positioning it relative to Hopkins-wide initiatives, along with measures of success and the prioritization of activities designed to achieve success, will be developed by the Director in collaboration with CLSP’s faculty and the Dean. Diversity, equity, and inclusion at the Whiting School WSE has a stated commitment to diversity, equity, and inclusion: “Diversity and inclusion enrich our entire community and are critical to both educational excellence and to the advancement of knowledge. Discovery, creativity, and innovation flourish in an environment where the broadest range of experiences are shared, where all voices are heard and are valued, and where individuals from different cultures and backgrounds can collaborate freely to understand and solve problems in entirely new ways.” As the leader of the Center and within the School, CLSP’s Director will work to enhance and expand diversity and inclusion at all levels and will ensure that the Center is a welcoming and supportive environment for all. Position Qualifications The new Director will be a proven, entrepreneurial leader who can bring faculty, staff, and students together to pursue a compelling vision of CLSP as an international hub for Language and Speech Processing research and as a site of innovation, teaching, and translation. They will have strong skills for mentoring junior faculty and will promote the interests of the Center. Intellectual curiosity and fundraising experience are valued. They will have a dossier that represent a distinguished track record of scholarship and teaching; a passionate commitment to research, discovery, and application; and an interest in and success at academic administration. Expected educational background and qualifications include: • An earned doctorate in an area such as electrical and computer engineering, computer science, or a closely related field and a scholarly record deserving appointment as tenured professor at The Johns Hopkins University; • Recognized leadership in their respective field with a distinguished national and international reputation for research and education; • Excellent communication skills in both internal and external interactions; • Strong commitment to diversity and inclusion at all levels among faculty, students, and staff, along with measurable and sustained impact on the diversity and inclusiveness of organizations they have led or been part of; and • Leadership and administrative experience within a complex research environment or in national/international organizations connected to their respective field.
* The Whiting School of Engineering has engaged Opus Partners (www.opuspartners.net) to support the recruitment of the CLSP Director. Craig Smith, Partner, and Jeff Stafford, Senior Associate, are leading the search. Applicants should submit their CV and a letter of interest outlining their research and leadership experience to Jeffrey.stafford@opuspartners.net. Nominations, expressions of interest, and inquiries should go to the same address. Review of credentials will begin promptly and will continue until the appointment is finalized. Every effort will be made to ensure candidate confidentiality. The Whiting School of Engineering and CLSP are committed to building a diverse educational environment, and women and minorities are strongly encouraged to apply. Johns Hopkins University is an equal opportunity employer and does not discriminate on the basis of gender, marital status, pregnancy, race, color, ethnicity, national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, other legally protected characteristics or any other occupationally irrelevant criteria. The University promotes Affirmative Action for minorities, women, individuals who are disabled, and veterans. Johns Hopkins University is a drug-free, smoke-free workplace.
|