(2019-01-02) ASSOCIATE OR ASSISTANT PROFESSOR, Aalto University, Finland
Aalto University Aalto University is a community of bold thinkers where science and art meet technology and business. We are committed to identifying and solving grand societal challenges and building an innovative future. Aalto University has been ranked the 9th best young university in the world (Top 50 under 50, QS 2018) and one of the world’s top technology challenger universities (THE 2017), thinking outside the box on research collaboration, funding and innovation. Aalto has six schools with nearly 11 000 students and 4000 employees of which close to 400 are professors. Our campuses are located in the Capital Area of Finland. With 37% of our academic faculty coming from outside Finland, we are a highly international community with strong academic standing. At Aalto, high-quality research, art, education and entrepreneurship are promoted hand in hand. Disciplinary excellence is combined with multidisciplinary activities, engaging both students and the local innovation ecosystem. Our main campus is quickly transforming into an open collaboration hub that encourages encounters between students, researchers, industry, startups and other partners. Aalto University was founded in 2010 as three leading Finnish universities, Helsinki University of Technology, the Helsinki School of Economics and the University of Art and Design Helsinki, were merged to strengthen Finland’s innovative capability. Aalto University School of Electrical Engineering invites applications for an
ASSOCIATE OR ASSISTANT PROFESSOR IN SPEECH AND LANGUAGE TECHNOLOGY We are looking for an associate or assistant professor to establish and lead a group of researchers and students within the speech and language technology research area at Aalto University. In this position, you will have a chance to make an impact in academic research and teaching as well as in society for example by industrial research collaboration. At Aalto University, you will have excellent research facilities and opportunities for interdisciplinary research with top-level researchers in machine learning, signal processing, acoustics, neuroscience and human-computer interfaces. YOUR ROLE AND GOALS Your tasks and responsibilities include conducting outstanding research and teaching as well as preparing research projects with funding from international and national sources. You supervise and recruit postdocs and PhD students and participate in teaching within the Computer, Communication and Information Science Master's Programme which is one of the most competitive at Aalto University. SCIENTIFIC ENVIRONMENT The professorship is situated at the Department of Signal Processing and Acoustics (in School of Electrical Engineering) where currently two out of ten tenured/tenure track professors work in the area of speech and language technology. Academy Professor Paavo Alku leads a group in speech communication technology and text-to-speech synthesis. The group is known particularly for its voice source research and its technical and interdisciplinary applications. Professor Mikko Kurimo leads a group in speech recognition and language modeling. His group is best known for developing successful language-independent models of morphologically rich languages and winning the MGB 2017 speech recognition challenge. These groups are also very well connected to the recently founded Finnish Centre of Artificial Intelligence (FCAI), which is a large collaboration effort for professors in machine learning and speech and language technology in both Aalto University and University of Helsinki. YOUR EXPERIENCE AND AMBITIONS We expect a strong track record of publications and achievements in speech and language technology, excellent teaching skills to help students to learn difficult topics, and motivation and competence to start and lead new and highly ambitious research projects aiming at significant scientific results and impacts. The professorship is open for qualified applicants from all areas of speech and language technology but we prioritize such fields, which enable research collaboration with the Department’s current groups in speech and language technology. All applicants must have a doctorate in speech and language technology (or in a related area of engineering) and fluent command in English. If you wish to hear more about the position, you can contact Academy Professor Paavo Alku or Professor Mikko Kurimo (firstname.lastname@aalto.fi). In recruitment process related questions, please contact HR Coordinator Saara Haggrén (firstname.lastname@aalto.fi). READY TO APPLY? If you want to join our community, please submit your application through our eRecruitment system no later than 31 March 2019. To apply, please share the following application materials with us: 1. Cover letter 2. Curriculum vitae (with contact information and ResearcherID number) 3. List of publications in which the 7 most significant publications are highlighted 4. A research statement describing past research and plans for future research 5. A teaching portfolio describing teaching experience and plans for teaching 6. Contact information of possible references or at most 2 reference statements All application materials should be submitted in English, in pdf format. The applications for the tenure track positions are to be addressed to the President of Aalto University. From amongst the applicants in the first phase, Aalto University will select those who will be asked to visit Aalto University in May/June 2019. Short-listed candidates’ applications will be submitted for review by external experts (the second phase of the application process). General instructions for applicants including evaluation criteria, language requirements and guidelines for compiling teaching portfolio and CV are given at https://www.aalto.fi/tenure-track. Aalto University reserves the right for justified reasons to leave the position open, to extend the application period and to consider candidates who have not submitted applications during the application period. As a living and working environment, Finland consistently ranks high in quality-of-life. For more information about living in Finland: https://www.aalto.fi/services/about-finland.
(2019-01-05) Two postdoctoral researcher project /researcher positions in speech processing, University of Eastern Finland, Joenssu, Finland
Two Postdoctoral Researcher/Project Researcher positions in speech processing
The University of Eastern Finland, UEF, is one of the largest multidisciplinary universities in Finland. We offer education in nearly one hundred major subjects, and are home to approximately 15,500 students and 2,500 members of staff. We operate in Joensuu and Kuopio. In international rankings, we are ranked among the leading universities in the world.
The Faculty of Science and Forestry operates on the Kuopio and Joensuu campuses of the University of Eastern Finland. The mission of the faculty is to carry out internationally recognised scientific research and to offer research-education in the fields of natural sciences and forest sciences. The faculty invests in all of the strategic research areas of the university. The faculty?s environments for research and learning are international, modern and multidisciplinary. The faculty has approximately 3,800 Bachelor?s and Master?s degree students and some 490 postgraduate students. The number of staff amounts to 560. http://www.uef.fi/en/lumet/etusivu
We are now inviting applications for
two Postdoctoral Researcher/Project Researcher positions in speech processing funded by the Academy of Finland at the School of Computing, Joensuu Campus.
One position in machine learning for speaker modelling (e.g. speaker verification, voice anti-spoofing, voice conversion, text-to-speech, or similar)
One position in perceptual and/or acoustic speaker characterization (e.g. phonetics/linguistics, speech modelling, statistical methods)
Both positions are filled in the Academy of Finland funded NOTCH research project (NOn-cooperaTive speaker CHaracterization), led by Associate Professor Tomi H. Kinnunen. The project aims at advancing the state-of-the-art in automatic speaker verification (defence) and voice conversion (attack) under a generic umbrella of non-cooperative speech, whether being induced by spoofing attacks, disguise, or other less expected intentional voice modifications. The NOTCH project applies multi-disciplinary research methods. The ideal candidate for the first position will have a background in machine learning or signal processing for speaker modelling and characterization. You may have a background in recognition, conversion or synthesis methods, as long as you are seasoned in state-of-the-art machine learning theory and practice (especially deep learning). The ideal candidate for the second position will have a background in acoustic-phonetic or perceptual methods for speaker characterization and will be fluent in devising novel statistical analysis methods such as linear mixed effect models. For both positions, multi-disciplinary thinking and willingness to contribute to both themes is considered a plus.
The Computational Speech Group of the School of Computing (https://www.uef.fi/web/speech/) , formed officially in 2018, works in the facilities of Joensuu Science Park, provides access to a modern research infrastructure and is a strongly international working environment. We are a group of dedicated individuals who do not want to follow a linear research path ? we keep our mind open to high-risk new directions and collaborations. We hosted the Odyssey 2014 conference, were a partner in the H2020-funded OCTAVE project, and are known as a co-founder of the Automatic Speaker Verification and Countermeasures (ASVspoof) challenge series (http://www.asvspoof.org/). Joensuu, a friendly city ?in the middle of KNOWhere? (as one of UEF?s slogans say) of about 115,000 inhabitants, is compact and contains all the necessary services within walking distance, with low living expenses and excellent opportunities for many outdoor activities. Despite its remote location, Joensuu is very international thanks to many of UEF?s international collaboration programmes and a vibrant student community.
A person to be appointed as a postdoctoral researcher shall hold a suitable doctoral degree that has been awarded less than five years ago. If the doctoral degree has been awarded more than five years ago, the post will be one of a project researcher. The doctoral degree should be in spoken language technology, electrical engineering, computer science, machine learning or a closely related field. Researchers finishing their PhD in the near future are also encouraged to apply for the positions. However, they are expected to hold a PhD degree by the starting date of the position. We expect strong hands-on experience and a creative, out-of-the-box problem solving attitude. A successful applicant needs to have an internationally proven track record in topics relevant to the project he or she applies to.
English may be used as the language of instruction and supervision in these positions.
The positions will be filled from earliest April 1, 2019 for a minimum period of 12 months. The continuation of the positions will be agreed separately. The positions will be filled for a fixed term due to them pertaining to a specific project (positions of postdoctoral researcher shall always be filled for a fixed term, UEF University Regulations , Section 31).
The salary of the positions is determined in accordance with the salary system of Finnish universities and is based on level 5 of the job requirement level chart for teaching and research staff (?2,903.61/ month). In addition to the job requirement component, the salary includes a personal performance component, which may be a maximum of 50.0% of the job requirement component. The salary of a postdoctoral researcher is in the beginning around 3,300.00 - 3,500.00 euros.
For further information on the position, please contact (NOTCH): Associate Professor Tomi Kinnunen, email: tkinnu(a)cs.uef.fi, tel. +358 50 442 2647. For further information on the application procedure, please contact: Executive Head of Administration Arja Hirvonen, email: arja.hirvonen(a)uef.fi, tel. +358 29 445 3002.
A probationary period is applied to all new members of the staff.
You can use the same electronic form to apply for both research projects. The electronic application should contain the following appendices:
copies of the applicant's academic degree certificates/ diplomas, and copies of certificates / diplomas relating to the applicant?s language proficiency, if not indicated in the academic degree certificates/diplomas
motivation letter
The application needs to be submitted no later than February 28, 2019 (by 24:00 EET) by using the electronic application form.
(2019-01-06) Postdoc at the University of Colorado, Boulder, Co, USA
The Department of Computer Science at the University of Colorado Boulder anticipates hiring a full time postdoctoral fellow starting in Summer/Fall 2019 for one year and renewable for a second year. This position will work with Dr. Sidney D?Mello https://www.colorado.edu/ics/sidney-dmello and will play a collaborative and co-leadership role in a vibrant research team encompassing researchers in Computer Science, Cognitive Science, Psychology/Neuroscience, and Education.
Who we are:
=========
The mission of the Institute of Cognitive Science (ICS) at CU-Boulder is to understand and enhance human cognition, learning, and development through the creation of interdisciplinary partnerships. ICS fosters rich scientific interchange across researchers from a broad range of disciplines including Artificial Intelligence, Linguistics, Psychology, Neuroscience, Computer Science, Philosophy, and Education.
What your key responsibilities will be:
============================
Develop computational modeling and machine learning techniques to model behavioral and mental states (e.g., affect, attention, workload) from multimodal data (e.g., video, audio, physiology, eye gaze) across a range of interaction contexts (e.g., learning from educational games, collaborative problem solving, everyday activities in the wild).
This position offers a unique postdoctoral training experience and unsurpassed publishing opportunities within multi-department and multi-institution grant-funded projects. The candidate will be encouraged to develop advanced technical skills, strengthen their research portfolios via peer-reviewed publications, gain interdisciplinary experience by working with a diverse team, develop leadership skills by mentoring students, and gain expertise in co-authoring grant proposals.
Requirements:
===========
- Ph.D. in Computer Science/Machine Learning or a related field (at the time of hire)
- Research experience in computational modeling and advanced machine learning (e.g., graphical models, deep recurrent neural networks).
- Strong writing skills and ability to conduct independent research
as evidenced by first author publications.
Desired:
- Research experience in one or more of the following areas (computer vision, computational psychophysiology, natural language processing, speech processing).
- Background in interdisciplinary research.
- Experience mentoring graduate and undergraduate students.
Job details:
========
- 1-2 year position. Initial contract is for one year (providing renewal after 6-month probationary period). Second year contract is based on performance and extension to a third year is possible.
- Start date is negotiable, but anticipated for Summer/Fall 2019.
- Competitive salary commensurate with experience and full benefits.
Application:
=========
To apply, please submit the following materials through CU Boulder Jobs:
- Resume/CV
- Cover Letter
- PDF Sample of Work: Two representative publications.
During the application process you will need to enter contact information for three references and we will request letters of recommendation and additional materials, if needed, as the search progresses.
Review of applications will begin immediately and will continue until the position is filled.
(2019-01-10) Postdoc in speech production (M/F), CNRS-Sorbonne, Paris, France
Postdoc in Speech Production (M/F)
Reference : UMR7018-CECFOU-004 Workplace : PARIS 05 Date of publication : Monday, January 07, 2019 Type of Contract : FTC Scientist Section CN : Sciences du langage Contract Period : 12 months Expected date of employment : February/March 2019 Proportion of work : Full time Remuneration : between 2600 et 3600? (brut) per month according to experience Desired level of education : PhD Experience required : Indifferent
Missions
The post-doctoral fellow will conduct experiments on adaptation to speech disturbances and production conditions, aimed at testing the integrity/flexibility of speech units and their variability according to their structural/motor complexity or frequency. Speech will be compared with other non-verbal movements and data from healthy speakers will be compared with data from speakers with different motor speech disorders.
Activities
As part of the MoSpeeDi project, the Laboratoire de Phonétique et Phonologie (LPP - CNRS/Sorbonne Nouvelle) in Paris is offering a full-time postdoctoral position for 12 months (with a possible extension of a few months). The post-doctoral fellow will be in charge of designing, carrying out and processing experiments in collaboration with the other members of the project. The overall objective of the project is to better understand the processes and representations at play during speech production, focusing on the final stages of the process where the encoded linguistic message is transformed into articulated speech. At the interface between linguistic and motor processes, these steps are also associated with various Motor Speech Disorders (MSD, dysarthria and speech apraxia). Articulation and acoustic data will be collected and analyzed experimentally for healthy and MSD speakers to (a) better understand the phonetic and motor speech planning and programming stages, (b) identify markers of these processes, and (c) better isolate and categorize speech disorders in MSDs.
Skills
Required skills
- PhD in phonetics or on a subject related to speech production or speech motor control. - Good knowledge of speech production and motor control models, particularly on adaptation phenomena and/or speech motor disorders. - Experience in signal processing (acoustic and/or articulatory) - Programming skills (e. g. with Praat, Matlab, Python or R) - Strong statistical analysis skills and good writing skills - Basic knowledge of French and excellent proficiency in English
Work Context
The candidate will work closely with Cécile Fougeron and Leonardo Lancia. The Phonetics and Phonology Laboratory (CNRS/Université Sorbonne-Nouvelle), located in the 5th arrondissement in Paris, is a research, research training and teaching unit in experimental phonetics and phonology.
Required skills: background in statistics, natural language processing and computer program skills (Perl, Python). Candidates should email a detailed CV with diploma
Under noisy conditions, audio acquisition is one of the toughest challenges to have a successful automatic speech recognition (ASR). Much of the success relies on the ability to attenuate ambient noise in the signal and to take it into account in the acoustic model used by the ASR. Our DNN (Deep Neural Network) denoising system and our approach to exploiting uncertainties have shown their combined effectiveness against noisy speech.
The ASR stage will be supplemented by a semantic analysis. Predictive representations using continuous vectors have been shown to capture the semantic characteristics of words and their context, and to overcome representations based on counting words. Semantic analysis will be performed by combining predictive representations using continuous vectors and uncertainty on denoising. This combination will be done by the rescoring component. All our models will be based on the powerful technologies of DNN.
The performances of the various modules will be evaluated on artificially noisy speech signals and on real noisy data. At the end, a demonstrator, integrating all the modules, will be set up.
The recruited person will work in collaboration with an industrial partner.
Main activities
study and implementation of a noisy speech enhancement module and a propagation of uncertainty module;
design a semantic analysis module;
design a module taking into account the semantic and uncertainty information.
Skills
Strong background in mathematics, machine learning (DNN), statistics
Following profiles are welcome, either:
Strong background in signal processing
or
Strong experience with natural language processing
Excellent English writing and speaking skills are required in any case.
References
[Nathwani et al., 2018] Nathwani, K., Vincent, E., and Illina, I. DNN uncertainty propagation using GMM-derived uncertainty features for noise robust ASR, IEEE Signal Processing Letters, 2018.
[Nathwani et al., 2017] Nathwani, K., Vincent, E., and Illina, I. Consistent DNN uncertainty training and decoding for robust ASR, in Proc.IEEE Automatic Speech Recognition and Understanding Workshop, 2017.
[Nugraha et al., 2016] Nugraha, A., Liutkus, A., Vincent E. Multichannel audio source separation with deep neural networks. IEEE/ACMTransactions on Audio, Speech, and Language Processing, 2016.
[Sheikh, 2016] Sheikh, I. Exploitation du contexte sémantique pour améliorer la reconnaissance des noms propres dans les documents audio diachroniques?, These de doctorat en Informatique, Université de Lorraine, 2016.
Required skills: background in statistics, natural language processing and computer program skills (Perl, Python), neural networks tools. Candidates should email a detailed CV with diploma
Motivations and context
According to the 2017 International Migration Report, the number of international migrants worldwide has grown rapidly in recent years, reaching 258 million in 2017, among whom 78 million in Europe. A key reason for the difficulty of EU leaders to take a decisive and coherent approach to the refugee crisis has been the high level of public anxiety about immigration and asylum across Europe. There are at least three social factors underlying this attitude (Berri et al, 2015): the increase in the number and visibility of migrants; the economic crisis that has fed feelings of insecurity; the role of mass media. The last factor has a major influence on the political attitudes of the general public and the elite. Refugees and migrants tend to be framed negatively as a problem. This translates into a significant increase of hate speech towards migrants and minorities. The Internet seems to be a fertile ground for hate speech (Knobel, 2012).
The goal of this PhD Thesis is to develop a methodology to automatically detect hate speech in social network data (Twitter, YouTube, Facebook).
Our methodology in the hate speech classification will be related on the recent approaches for text classification with Neural Networks and word embeddings. In this context, fully connected feed forward networks (Iyyer et al., 2015; Nam et al., 2014), Convolutional Neural Networks (CNN) (Kim, 2014; Johnson and Zhang, 2015) and also Recurrent/Recursive Neural Networks (RNN) (Dong et al., 2014) have been applied. On the one hand, the approaches based on CNN and RNN capture rich compositional information, and have outperformed the state-of-the-art results in text classification; on the other hand they are computationally intensive and require careful hyperparameter selection and/or regularization (Dai and Le, 2015).
Objectives
The goal of this PhD Thesis is to develop a new methodology to automatically detect hate speech, based on machine learning and Neural Networks. Human detection of this material is infeasible since the contents to be analyzed are huge. In recent years, research has been conducted to develop automatic methods for hate speech detection in the social media domain. These typically employ semantic content analysis techniques built on Natural Language Processing (NLP) and Machine Learning (ML) methods (Schmidt et al. 2017). Although current methods have reported promising results, their evaluations are largely biased towards detecting content that is non-hate, as opposed to detecting and classifying real hateful content (Zhang et al., 2018). Current machine learning methods use only certain task-specific features to model hate speech. We propose to develop an innovative approach to combine these pieces of information into a multi-feature approach so that the weaknesses of the individual features are compensated by the strengths of other features (explicit hate speech, implicit hate speech, contextual conditions affecting the prevalence of hate speech, etc.).
The student will work in the framework of French-German project (ANR project).
References
Berri M, Garcia-Blanco I, Moore K (2015), Press coverage of the Refugee and Migrant Crisis in the EU: A Content Analysis of five European Countries, Report prepared for the United Nations High Commission for Refugees, Cardiff School of Journalism, Media and Cultural Studies.
Dai, A. M. and Le, Q. V. (2015). ?Semi-supervised sequence Learning?. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems 28, pages 3061-3069. Curran Associates, Inc
Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., and Xu, K. (2014). ?Adaptive recursive neural network for target-dependent twitter sentiment classification?. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL, Baltimore, MD, USA, Volume 2: pages 49-54.
Iyyer, M., Manjunatha, V., Boyd-Graber, J., and Daumé, H. (2015). ?Deep unordered composition rivals syntactic methods for text classification?. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, volume 1, pages 1681-1691.
Johnson, R. and Zhang, T. (2015). ?Effective use of word order for text categorization with convolutional neural networks?. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 103-112.
Knobel M. (2012). L?Internet de la haine. Racistes, antisémites, néonazis, intégristes, islamistes, terroristes et homophobes à l?assaut du web. Paris: Berg International
Schmidt A., Wiegand M.(2017). A Survey on Hate Speech Detection using Natural Language Processing, Workshop on Natural Language Processing for Social Media
Zhang, Z., Luo, L (2018). Hate speech detection: a solved problem? The Challenging Case of Long Tail on Twitter. arxiv.org/pdf/1803.03662
(2019-01-15) Several postdoc openings at IDIAP, Martigny, Switzerland
IDIAP has openings for at least one post-doc in the general area of speech and natural language processing. The work will involve investigating how to interface current end-to-end speech recognition technology with its counterparts in natural language processing; it would suit someone from either discipline. The posts are research oriented, but funded by industrial collaborations. More information along with application instructions are at the URL: http://www.idiap.ch/education-and-jobs/job-10251
Idiap is located in Martigny in French speaking Switzerland, although the lab hosts many nationalities, and functions in English. All positions offer quite generous salaries.
(2019-01-19) Several positions at the University of Naples Frederic II, Italy
Job: Post-Doc Position
Start: Spring 2019
Duration: The grant will be for 18 months, renewable up to 6 (by mutual consent), with a yearly salary of ? 25.000,00
Topics: Adaptive Multimodal Human-Robot and Machine Interaction
Language requirement: English
Description:
The goal is to study and design adaptive multimodal interaction mechanism. In multimodal interaction, research focus is to rely on different modalities and investigate how to apply fusion techniques on these in order to generate the correct interpretation of the user intention. Here, we will investigate also how to select the proper communication channels and to optimize their features to each user. In this view, the majority of the robotic applications are based on static user models: this prevents systems from adapting independently and proactively to changes in the needs and preferences of users. The aim of the present proposal is to investigate how to merge human-robot and, in general, human-machine multimodal interaction research issues with online adaptive learning ones.
Duration: The grant will be for 12 months, with a yearly salary from ? 15.000,00 up to 25.000,00 (depending on the experience and seniority of the candidate)
Topics: Social Signal Processing in Rehabilitation Domains
Language requirement: English
Description:
Social Signal Processing is the discipline concerned with the automatic analysis of human social behaviour and with the generation of coherent social signals in artificial embodied agents. The AVATEA project (Advanced Virtual Adaptive Technologies e-hEAlth) aims at providing an adaptive support to therapists and children during motor rehabilitation sessions so, in this domain, the proper recognition of social signals, such as attention, engagement and distress could provide a valuable index of therapeutic effectiveness. Formalising such measures will be the main topic of interest for candidates to this position, together with the appropriate research about methods to elicit social behaviour in children suffering from different, and possibly limiting, diseases.
Duration: The grant will be for 12 months, with a yearly salary from ? 15.000,00 up to 25.000,00 (depending on the experience and seniority of the candidate)
Topics: Machine Learning for Profiling of Physical Capabilities
Language requirement: English
Description:
The AVATEA project (Advanced Virtual Adaptive Technologies e-hEAlth) aims at providing an adaptive support to therapists and children during motor rehabilitation sessions. To provide such support, it is necessary that the system is able to profile the physical capabilities of the patients and monitor his/her performance during the rehabilitation sessions by analyzing temporal data produced by different wearable sensors and sensors to be positioned on the instruments used for the exercises. Machine learning approaches supporting the creation of a user model will be investigated to address this problem.
Duration: The grant will be for 12 months, with a yearly salary from ? 15.000,00 up to 25.000,00 (depending on the experience and seniority of the candidate)
Topics: Personalization in Human-Robot Interaction
Language requirement: English
Description:
The project PRIN UPA4SAR (User Profiling and Adaptation for Socially Assistive Robotics) goal is to design an adaptive behavior of a robotic system that is in charge of monitoring the user's Activity of Daily Living (ADL) in the case of people with dementia. In our opinion, the robot presence, in order to be effective and well accepted by users, must be the least invasive as possible. In fact, an interactive robotic device whose behavior is unrelated to the specific needs of a person, his/her abilities and preferences can cause discomfort. The majority of the robotic applications are based on static user models and on the specification of all the possible contexts of interaction. This makes such systems incapable of adapting independently and proactively to changes in the needs and preferences of users. In this direction, our goal is to design an adaptive behavior of the robotic system that is able to regulate its social interaction parameters (e.g., the interaction distances, proxemics, the speed of movements, and the same modality of interaction) on the basis of personality factors as well as of the cognitive state of the user.
Applications are invited from candidates with Master Degree or a PhD in Cognitive Science, Robotics, Computer Science, Artificial Intelligence, Electronic Engineering or other relevant disciplines.
The selected candidate will join the PRISCA Laboratory (Projects of Intelligent Robotics and Advanced Cognitive Systems) in Naples. The PRISCA Lab is a dynamic, international, and multidisciplinary team that offers exciting scientific projects, as well as an excellent and stimulating research environment (http://prisca.unina.it/).
Naples (Italian: Napoli) is the third largest city in Italy, and is the capital of the Campania region. World-known for its rich history, art, culture, architecture, music, and gastronomy, Naples is a lively, exciting and bustling city situated on the southwest coast in a gorgeous gulf, and is surrounded by attractive tourist and archaeological sites such as Capri, Ischia, Amalfi Coast, Pompei, Ercolano, Mount Vesuvius. See https://www.visitnaples.eu/en for further information.
- Pr. Pierre Philippe – PU-PH, SANPSY : pr.philip@free.fr
Subject: Detection sleepiness is useful for many reasons : for instance, it can help prevent road traffic accidents, it can be useful to monitor workers in critical environments (air traffic control, nuclear plants, etc.). While these applications are very important, it can also be used in a clinical way in the follow-up of sleep deprived patients. The Obstrusive Sleep Apnea is nowaday recognised as a major public health problem resulting in many consequences : road traffic accidents, increase in heart failure rates, behavioural and cognitive troubles, … In order to deal with these problems, we devised an experiment with the SANPSY research unit (Sommeil - Addiction- Neuropsychiatrie) Université Bordeaux Ségalen CNRS USR 3413) in order to assess if we can evaluate the sleepiness level of a patient using only a simple speech recording. Previous research has shown that this task is possible, however most studies on sleepiness detection from speech rely on corpora with self reported labels according to the KSS scale [1]. For instance, the Interspeech 2011 speaker state challenge [2] uses data from 99 speakers and contains mixed data from different tasks (isolated vowels, read speech, command request, spontaneous speech) in German. The annotations are self-reported using the KSS scale and are divided in two classes : sleepy (S) and not sleepy (NS). The best system [3] in the challenge competition won with a reported accuracy slightly above the baseline, around 72 % of correctly identified samples. Other efforts on sleepiness detection from speech often use the same kind of data. For example, in [4] 77 participants are recorded speaking isolated vowels, and the annotation is also made using self-reported scores from the KSS scale. Reported performances on two classes (S and NS) are around 78 % of correction identification. In a more recent paper [5], the number of participants is increased (402), the recordings are read passages from 7 texts. However the classification task is not the same since the classifier tries to predict the value of the KSS score. In our project, in close partnership with the SANSPY unit, we started to record patients (current number of patients recorded is 78) while asssessing their sleepiness states by various measurements including EEG as well as clinical expertise. Recording the patients follows a strict clinical methodology resulting in sets of 4 recordings per patient, always at the same time of the day. Three categories of sleepiness level have been devised according to the health professionals (instead of usually two in previous research on sleepiness detection in speech): very sleepy, intermediate and normal. Using these recordings and the provided categories, we begun to test different features and classification methods. Using a relatively small set of features and simple classification techniques, we managed to obtain in a cross validation procedure a global classification rate of 70% correct. The task of the intern student is to further explore the different possibilities in terms of features and machine learning methods as the data collection continues, and to carry on thorough analysis of the results so as to understand the influence of several factors such as gender, age, or pathology.
References:
[1] Shahid, A., Wilkinson, K., Marcu, S., & Shapiro, C. M. (2011). Karolinska sleepiness scale (KSS). In STOP, THAT and One Hundred Other Sleep Scales (pp. 209-210). Springer New York.
[3] Dong-Yan Huang, Zhengchen Zhang, Shuzhi Sam Ge, Speaker state classification based on fusion of asymmetric simple partial least squares (SIMPLS) and support vector machines, In Computer Speech & Language, Volume 28, Issue 2, 2014, Pages 392-419, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2013.06.002.
[4] Krajewski, J., Schnieder, S., Sommer, D., Batliner, A., & Schuller, B. (2012). Applying multiple classifiers and non-linear dynamics features for detecting sleepiness from speech. Neurocomputing, 84, 65-75.
[5] Krajewski, J., Schnieder, S., Monschau, C., Titt, R., Sommer, D., & Golz, M. (2016, October). Large Sleepy Reading Corpus (LSRC): Applying Read Speech for Detecting Sleepiness. In Speech Communication; 12. ITG Symposium; Proceedings of (pp. 1-4). VDE.
Requested skills:
- speech processing and/or signal processing techniques
- machine learning
- programming languages : matlab, python, C/C++
- interest in clinical research and/or cognitive sciences
(2019-01-24) Post-doctoral positions at Telecom-ParisTech, Paris, France
Post-doctoral positions at Telecom-ParisTech on Deep learning approaches for social computing in human-agent interactions
*Place of work* Telecom ParisTech [TPT] 46 rue Barrault 75013 Paris ? France Paris until september 2019, and then Palaiseau (Paris outskirt)
*Starting date* From now to September 2019
*Salary* according to background and from 2300 ? /month
*Duration* 12 months renewable
*Context* The post-doctoral fellowship will take part in the Telecom-ParisTech?s chair on Data Science & Artificial Intelligence for Digitalized Industry & Services [DSIAI]. Established for a five-year period, one of its main goals of the chair is to allow sustainable funding of research activities in AI and Data Science, on methodological topics crucial for applications. The research activity of the postdoctoral fellowship will also contribute to the Social Computing topic [SocComp.] of the S2a team [SSA] at Telecom-ParisTech, in close collaboration with other researchers and PhD students of the team.
* Candidate profile* As a minimum requirement, the successful candidate should have: ? A PhD in one or more of the following areas: human-agent interaction, deep learning, computational linguistics, affective computing, reinforcement learning, natural language processing, speech processing ? Excellent programming skills (preferably in Python) ? Excellent command of English
*How to apply* The application should be formatted as **a single pdf file** and should include: ? A complete and detailed curriculum vitae ? A letter of motivation ? The defense and Phd reports ? The contact of two referees The pdf file should be sent to the two supervisors: Chloé Clavel [Clavel] and Giovanna Varni [Varni]: chloe.clavel@telecom-paristech.fr, giovanna.varni@telecom-paristech.fr
1/ First position: Multimodal attention models for predicting the user's socio-emotional behavior in human-machine interactions
*Keywords* human-machine interaction, attention models, recurrent neural networks, Social Computing, natural language processing, speech processing, multimodality
*Supervision* Chloé Clavel, Giovanna Varni,
*Description* Social robotics, and more broadly human-agent interaction, is a field of human-machine interaction for which the integration of socio-emotional behaviors (emotions, social attitudes, personality) is expected to have a great potential. For example, companion robots are designed to provide their users with both help (especially in the assistance and dependency market) and entertainment (in the entertainment market). For intelligent cars, the analysis of the driver's emotions through multimodal sensors can provide a better understanding of his driving [CARS] This post-doctoral fellowship will focus on multimodal modeling for the prediction of the user's socio-emotional behaviors during interactions with a virtual agent. In particular, the post-doctoral fellow will address the following points: - the encoding of multimodal representations relevant for the modelling of socio-emotional behavior; - the development and evaluation of models that take advantage of the complementarity of modalities in order to monitor the evolution of the user's socio-emotional behaviors during the interaction (e. g. taking into account the inherent sequentially of the interaction structure) The models will be based on sequential neural approaches (recurrent networks) that integrate attention models as a continuation of the work done in [Hemamou] and [BenYoussef].
2/ Second Position: Reinforcement learning for the development of socially competent agents
*Keywords* human-machine dialogue, reinforcement learning, language generation model, Social Computing
*Supervision* Chloé Clavel
*Description* Conversational agents (e.g. Djingo, Orange, Alexa d'Amazon, Siri d'Apple, Cortana de Microsoft, etc.), chatbots and more broadly human-agent interaction and social robotics (see for example [CIMON]) are applications for which the integration of socio-emotional behaviour analysis in the interaction between humans and virtual agents has great potential. Recent developments in artificial intelligence in natural language processing have made it possible to set up functional chatbots: extraction of keywords, understanding of natural language, question and answer systems, dialogue trees. While virtual assistants are already on the market, taking into account the social component of interaction remains a crucial issue for the fluidity and naturalness of interaction. For example, the development of socio-emotional interaction strategies can compensate for the chatbot's lack of understanding of user requests, which results in expressions of frustration and irritation on the part of the user [Maslowski] and can lead to the user abandoning the conversation (also called an engagement breakdown [BenYoussef]), thus hindering the completion of the chatbot's intended task. This post-doctoral fellowship will address this issue - the development of socially competent agents - by proposing methods of reinforcement and deep learning [Qureshi, Ritschel] for the selection and generation of natural language utterances based on their socio-emotional relevance.
Selected references of the team: [Hemamou] L. Hemamou, G. Felhi, V. Vandenbussche, J.-C. Martin, C. Clavel, HireNet: a Hierarchical Attention Model for the Automatic Analysis of Asynchronous Video Job Interviews. in AAAI 2019, to appear [Garcia] Alexandre Garcia, Chloé Clavel, Slim Essid , Florence d?Alche-Buc, Structured Output Learning with Abstention: Application to Accurate Opinion Prediction, ICML 2018 [Clavel&Callejas] Clavel, C.; Callejas, Z., Sentiment analysis: from opinion mining to human-agent interaction, Affective Computing, IEEE Transactions on, 7.1 (2016) 74-93. [Langlet] C. Langlet and C. Clavel, Improving social relationships in face-to-face human-agent interactions: when the agent wants to know user?s likes and dislikes , in ACL 2015 [Maslowski] Irina Maslowski, Delphine Lagarde, and Chloé Clavel. In-the-wild chatbot corpus: from opinion analysis to interaction problem detection, ICNLSSP 2017. [Ben-Youssef] Atef Ben-Youssef, Chloé Clavel, Slim Essid, Miriam Bilac, Marine Chamoux, and Angelica Lim. Ue-hri: a new dataset for the study of user engagement in spontaneous human-robot interactions. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, pages 464?472. ACM, 2017.
L'équipe SAMoVA de l'IRIT (Institut de Recherche en Informatique de Toulouse) recrute un chercheur ou une chercheuse en post-doctorat pour le projet collaboratif LinTo (PIA - Programme d?Investissements d'Avenir), projet d?assistant conversationnel destiné à opérer en contexte professionnel pour proposer des services en lien avec le déroulement de réunions.
Ce travail post-doctoral concerne l?analyse du flux audio pour extraire un ensemble d?indicateurs non verbaux destinés à compléter la transcription automatique générée par d?autres partenaires du projet. Cet enrichissement aura pour rôle d?apporter des indications précieuses pour aider à la compréhension du déroulement des réunions, que ce soit au niveau des interactions, entre participants ou avec l?assistant vocal, ou de manière plus détaillée au niveau du contenu des échanges.
Plusieurs pistes de recherche pourront être explorées en fonction du profil de la personne recrutée ainsi des situations étudiées dans le cadre du projet : - Analyse acoustique pour la recherche de marqueurs prosodique pertinents ; - Exploration des approches de type Speech2Vect pour extraire des indicateurs plus marqués sémantiquement ; - Application de méthodes d'apprentissage semi-supervisé dans un contexte faiblement annoté.
Informations Pratiques : Poste à pourvoir : post-doc Durée: 12-20 mois à partir de février/mars 2019 Domaine : analyse acoustique - traitement automatique de la parole - apprentissage automatique - interaction conversationnelle Lieu : Institut de Recherche en Informatique de Toulouse (Université Paul Sabatier) - Equipe SAMoVA Profil recherché : titulaire d'un doctorat en informatique, machine learning, traitement de l'audio. Contact : Isabelle Ferrané (isabelle.ferrane@irit.fr) Dossier de candidature : à envoyer avant le 15 février 2019. Détail de l'offre : https://www.irit.fr/recherches/SAMOVA/pagejobs.html Salaire : selon expérience
(2019-02-01) Lecturer at LIMSI, Orsay, Paris, France
Le département informatique de l?UFR Sciences de l?université Paris-Sud recrute un.e Maître de Conférences pour renforcer son équipe pédagogique et continuer de développer la recherche au sein du LIMSI sur les les thèmes du traitement automatique des langues et de la parole.
Les recherches de la personne recrutée porteront en priorité sur le développement de nouvelles méthodes en traitement automatique de la parole, avec par exemple les thématiques suivantes: la caractérisation du locuteur dans un contexte multimédia ; l?étude des dimensions affectives des interactions sociales ; l?étude des systèmes de traduction automatique et l?apprentissage artificiel ; l?étude des systèmes de reconnaissance vocale. Le laboratoire est également ouvert à des candidatures qui mettraient en avant d?autres thématiques relatives au traitement automatique de la parole, ou plus largement à l?ensemble du champ du traitement automatique des langues.
La personne recrutée pourra enseigner dans toutes les filières relevant du département informatique de l'UFR Sciences d?Orsay, au niveau Licence et Master (classique et en apprentissage). Elle pourra enseigner dans ses domaines d?intérêt et dans un ou desles domaines de l'informatique qui auront besoin d'être renforcés. La personne pourra également dispenser une partie de ses enseignements en anglais, en particulier dans le cadre de Masters internationaux.
Les candidat.e.s pourront obtenir des informations plus précises à partir de la page suivante:
(2019-02-03) PhD and Postdoc positions at University of Genova, Italy
Fully funded PhD and PostDoc positions are available at the Casa Paganini - InfoMus Research Centre (www.casapaganini.org), DIBRIS-Dept. of Informatics, Bioengineering, Robotics, and Systems Engineering, Polytechnic School, University of Genoa, Italy. Each research position will have a specific focus on the development of computational models, multimodal systems and interfaces, research experiment and prototypes in one of the following areas: (i) automated measurement, analysis, and prediction of full-body non-verbal individual movement qualities and emotions; (ii) automated measurement, analysis, and prediction of full-body non-verbal social signals (synchronization, entrainment, leadership). Accepted candidates will develop a research plan in the framework of the 4-year (2019-2022) Horizon 2020 European Project FET Proactive EnTimeMent (https://entimement.dibris.unige.it/ ), and may be asked to participate in joint activities with research partners in EnTimeMent, including possible short residencies at EnTimeMent partners' sites.
Requirements Candidates should ideally have the following profile: + Master's degree in Computer Science, Computer Engineering or related disciplines; + Excellent technical and programming skills (Python, Java, C/C++); + Prior experience in at least one of the following fields: human computer interaction, affective computing, motion capture and motion analysis, multimodal interfaces, sound analysis and interactive sonification, computer vision, machine learning; + Ability to work independently, self-motivation, and ability to actively contribute as a member of a multidisciplinary research team including experts in computer science and engineering, movement science, cognitive neuroscience, cognitive and motoric rehabilitation, performing arts; + Strong commitment to advancing the state-of-the-art research and publishing in top research venues; + Excellent communication skills in English. Applying To apply, please email your application to: antonio.camurri@unige.it and gualtiero.volpe@unige.it The application should consist of a single pdf file including: + cover letter expressing your interest in the position and your profile relevance; + curriculum vitae showing academic records with tracks related to the themes of the thesis; + list of publications (post-doc applications only); + transcript of marks according to M1-M2 profile or last 3 years of engineering or related school (PhD applications only); + contact and recommendation letter of at least two university referents;
As a preliminary step, candidates will be invited for a Skype interview. Candidates may also be invited to a fully funded short research internship in our research team during summer 2019. To be finally enrolled, candidates will need to pass a formal evaluation performed by a selection committee University of Genova will appoint according to the Italian laws. The envisioned starting date for the first selected PhD candidates is November 2019. PostDoc starting date is negotiable.
Conditions of employment Hired on a fixed-term contract at University of Genoa, working full-time at the Casa Paganini-InfoMus Research Centre of DIBRIS, University of Genoa, with possible short internships at a research centre of an EnTimeMent project partner. Duration: three years for PhD students; 2-year contract for post-docs (possible extensions available).
The Casa Paganini-InfoMus Research Centre at DIBRIS, Polytechnic School, University of Genoa, Italy As art influences science and technology, science and technology can in turn inspire art. Recognizing this mutually beneficial relationship, researchers at the Casa Paganini-InfoMus Research Centre work to combine scientific research in information and communications technology (ICT) with artistic and humanistic research. The mission of Casa Paganini - InfoMus consists of carrying out scientific and technological research on human-centered computing where art and humanistic culture are a fundamental source of inspiration. The research team includes computer engineers and experts from the human sciences and the arts. Scientific and technological research includes: investigation and development of computational models and of multimodal systems focusing on non-verbal, full-body, expressive, emotional, and social behavior (entrainment, leadership); sound and music computing; interactive sonification; multimodal interactive systems and serious games for rehabilitation, entertainment, sport, edutainment, museums and cultural institutions; multimedia systems and services for the creative industry: ICT for active music listening, interactive dance, theatre, cultural heritage, user-centric media and mobile systems. The Casa Paganini - InfoMus Research Centre coordinates and participates as partner in many international projects on scientific and technological research, education, and develops multimedia systems, platforms, and applications for the creative industry and cultural institutions. www.casapaganini.org youtube.com/InfoMusLab
The EnTimeMent EU Horizon 2020 FET PROACTIVE project EnTimeMent aims at a radical change in scientific research and enabling technologies for human movement qualitative analysis, entrainment and prediction, based on a novel neuro-cognitive approach of the multiple, mutually interactive time scales characterizing human behaviour. Our approach will afford the development of computational models for the automated detection, measurement, and prediction of movement qualities from behavioural signals, based on multi-layer parallel processes at non-linearly stratified temporal dimensions, and will radically transform technology for human movement analysis. EnTimeMent new innovative scientifically-grounded and time-adaptive technologies operate at multiple time scales in a multi-layered approach: motion capture and movement analysis systems will be endowed with a completely novel functionality, achieving a novel generation of time-aware multisensory motion perception and prediction systems. The proposed model and technologies will be iteratively tested and refined, by designing and performing controlled and ecological experiments, ranging from action prediction in a controlled laboratory setting, to prediction in dyadic and small group interaction. EnTimeMent scenarios include health (healing and support of everyday life of persons with chronic pain and disability), performing arts (e.g. dance), sports, and entertainment group activities, with and without living architectures. EnTimeMent will create and support community-building and exploitation with concrete initiatives, including a community of users and stakeholders, innovation hubs and SME incubators, as premises for the consolidation beyond the end of the project in a broader range of market areas. http://entimement.dibris.unige.it
(2019-02-08) Fully funded PhD position at Graz University of Technology, Austria
Graz University of Technolgy (TU Graz) is the organizer of the INTERSPEECH 2019 conference in September 2019 and offers a PhD position in its Signal Processing and Speech Communication Laboratory. Be part of this exciting opportunity and join our team!
The position is for up to four years and involves both research and teaching commitments. Teaching will be focussed on problem classes and lab courses for fundamental subjects such as signal processing. Research will address interdisciplinary topics at the interface between automatic speech recognition and speech science. You will work on top-level publications and your PhD thesis under the joint supervision of Prof. Gernot Kubin and Dr. Barbara Schuppler. Graz University of Technology offers systematic guidance to their doctoral students in specific doctoral schools with structured programs, international cooperation opportunities, and more. All doctoral programs and more than half of our Masters' programs are taught in English. The gross salary (before taxes) for this full-time position is according to scale B1 at Austrian Universities, approximately 40.000,- EUR per year. The expected starting date is September-October 2019.
Mandatory skills of the candidates are a relevant master's degree in electrical or information engineering, computer science, or speech science; excellent programming skills; English language competence (IELTS 7.0 or higher). Expertise in signal processing and machine learning as well as knowledge of the German language are considered additional assets.
Interested candidates should send immediately the following information in PDF format to Prof. Gernot Kubin (g.kubin@ieee.org) and to the dean’s office dekanat.etit@tugraz.at: curriculum vitae, transcript of records of both Bachelor's and Master's degree courses, master's thesis and all publications, proof of English language competence, and contact information for 2 referees. The official application deadline is July 26, 2019. Additional application documents may be required in due course. Female students are particularly encouraged to apply. For more information consult
(2019-02-15) Faculty position (Associate professor) at Telecom ParisTech, Paris, France
Faculty position (Associate professor) at Telecom ParisTech in
Machine-Learning.
Important Dates
?May 3, 2019: closing date
?June 3, 2019: hearings of preselected candidates
Telecom ParisTech?s [1] machine learning, statistics and signal processing group (a.k.a S²A group) [2], within the laboratoire de traitement et communication de l?information (LTCI) [5], is inviting applications for a permanent (indefinite tenure) faculty position at the *Associate Professor* level (Maitre de Conferences) in *Machine learning*.
Main missions
The recruit will be expected to:
Research activities
?Develop groundbreaking research in the field of theoretical or applied machine learning, targeting applications that are well aligned with the topics of the S²A group [3] and the Images, Data & Signals department [4], which include (and is not restricted to) sequential/reinforcement learning, multitask learning,learning for structured data (e.g. time series analysis, audio signals), natural language processing, social signal processing, predictive maintenance, biomedical or physiological signal analysis, recommendation, finance, health, ?.
?Develop both academic and industrial collaborations on the same topic, including collaborative activities with other Telecom ParisTech research departments and teams, and research contracts with industrial players
?Set up research grants and take part in national and international collaborative research projects
Teaching activities
?Participate in teaching activities at Telecom ParisTech and its partner academic institutions (as part of joint Master programs), especially in machine learning and Data science, including life-long training programs (e.g. the local Data Scientist certificate)
Impact
?Publish high quality research work in leading journals and conferences
?Be an active member of the research community (serving in scientific committees and boards, organizing seminars, workshops, special sessions...)
Candidate profile
As a minimum requirement, the successful candidate will have:
?A PhD degree
?A track record of research and publication in one or more of the following areas: machine learning, applied mathematics, signal processing,
?Experience in teaching
?Good command of English
The ideal candidate will also (optionally) have:
?Experience in temporal data analysis problems (sequence prediction, multivariate time series, probabilistic graphical models, recurrent neural networks...)
NOTE:
The candidate does *not* need to speak French to apply, just to be willing to learn the language (teaching will be mostly given in English)
Other skills expected include:
?Capacity to work in a team and develop good relationships with colleagues and peers
?Good writing and pedagogical skills
More about the position
?Place of work: Paris until 2019, then Saclay (Paris outskirts)
?For more information about being an Associate Professor at Telecom ParisTech, check [6] (in French)
?A document detailing past activities of the candidate in teaching and research: the two types of activities will be described with the same level of detail and rigor.
?The texts of the main publications
?The names and addresses of two referees
?A short teaching project and a research project (maximum 3 pages)
(2019-02-14) PhD student, Radbout University, Nijmegen, The Netherlands
PhD student “Morphology in spoken word recognition models”
Location: Radboud University, Nijmegen
Supervision: Louis ten Bosch, Mirjam Ernestus, and Ingo Plag
Starting date: September / October 2019
Duration: 4 Years (3 years, with possibility of extension of 1 year)
Salary: Around 1600 euros a month
The project is part of the project DMC: Dutch morphologically complex words: The role of morphology in speech production and comprehension of the Spoken Morphology: Phonetics and phonology of complex words DFG research unit FOR 2373. It is funded by the Deutsche Forschungsgemeinschaft.
The PhD student will study what properties a computational model of auditory word recognition needs to have in order to well simulate human listeners’ processing of morphologically complex words. The computational models that will be considered are DIANA (e.g., ten Bosch et al., 2013, 2014, 2015) and Naïve Discriminative Learning (NDL, e.g., Arnold et al. 2017), since these two models represent very different types of processing and both can receive the speech signal, with morpho-acoustic cues, as their inputs. The human data to be simulated will include the BALDEY database (Ernestus & Cutler, 2015). The PhD student will produce a dissertation consisting of several publishable articles, preceded by a General Introduction and followed by a general discussion.
Required skills: Strong background in mathematics, machine learning (DNN), statistics, natural language processing and computer program skills (Perl, Python).
Following profiles are welcome, either:
· Strong background in signal processing or · Strong experience with natural language processing
Excellent English writing and speaking skills are required in any case.
Candidates should email a detailed CV with diploma
LORIA is the French acronym for the ?Lorraine Research Laboratory in Computer Science and its Applications? and is a research unit (UMR 7503), common to CNRS, the University of Lorraine and INRIA. This unit was officially created in 1997. Loria?s missions mainly deal with fundamental and applied research in computer sciences.
MULTISPEECH is a joint research team between the Université of Lorraine, Inria, and CNRS. Its research focuses on speech processing, with particular emphasis to multisource (source separation, robust speech recognition), multilingual (computer assisted language learning), and multimodal aspects (audiovisual synthesis).
Context and objectives
Under noisy conditions, audio acquisition is one of the toughest challenges to have a successful automatic speech recognition (ASR). Much of the success relies on the ability to attenuate ambient noise in the signal and to take it into account in the acoustic model used by the ASR. Our DNN (Deep Neural Network) denoising system and our approach to exploiting uncertainties have shown their combined effectiveness against noisy speech.
The ASR stage will be supplemented by a semantic analysis. Predictive representations using continuous vectors have been shown to capture the semantic characteristics of words and their context, and to overcome representations based on counting words. Semantic analysis will be performed by combining predictive representations using continuous vectors and uncertainty on denoising. This combination will be done by the rescoring component. All our models will be based on the powerful technologies of DNN.
The performances of the various modules will be evaluated on artificially noisy speech signals and on real noisy data. At the end, a demonstrator, integrating all the modules, will be set up.
Main activities
? study and implementation of a noisy speech enhancement module and a propagation of uncertainty module; ? design a semantic analysis module; ? design a module taking into account the semantic and uncertainty information.
References
[Nathwani et al., 2018] Nathwani, K., Vincent, E., and Illina, I. DNN uncertainty propagation using GMM-derived uncertainty features for noise robust ASR, IEEE Signal Processing Letters, 2018.
[Nathwani et al., 2017] Nathwani, K., Vincent, E., and Illina, I. Consistent DNN uncertainty training and decoding for robust ASR, in Proc. IEEE Automatic Speech Recognition and Understanding Workshop, 2017.
[Nugraha et al., 2016] Nugraha, A., Liutkus, A., Vincent E. Multichannel audio source separation with deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016.
[Sheikh, 2016] Sheikh, I. Exploitation du contexte sémantique pour améliorer la reconnaissance des noms propres dans les documents audio diachroniques?, These de doctorat en Informatique, Université de Lorraine, 2016.
[Sheikh et al., 2016] Sheikh, I. Illina, I. Fohr, D. Linares, G. Learning word importance with the neural bag-of-words model, in Proc. ACL Representation Learning for NLP (Repl4NLP) Workshop, Aug 2016.
[Mikolov et al., 2013a] Mikolov, T. Chen, K., Corrado, G., and Dean, J. Efficient estimation of word representations in vector space, CoRR, vol. abs/1301.3781, 2013.
(2019-02-16) PhD's grants at SHEFFIELD CENTRE FOR DOCTORAL TRAINING IN SPEECH AND LANGUAGE, UK
SHEFFIELD CENTRE FOR DOCTORAL TRAINING IN SPEECH AND LANGUAGE
The University of Sheffield has won an £8M grant to fund some 60 PhDs over a period of 8 years in Speech and Language Technology. This 'Centre for Doctoral Training' is one of 16 newly announced by UKRI (UK Research and Innovation) to strengthen British AI research.
The Sheffield CDT will cover a wide range of SLT subjects, researching and developing robust methods for natural language and speech processing in challenging real world scenarios. Its research will stimulate novel applications of SLTs in sectors such as health, sport, manufacturing, aerospace, robotics, finance, political science and digital humanities.
A major feature of the CDT is close collaboration with industry, involving multinationals such as Google, Amazon, Voicebase, Nuance, NHS Digital, Solvay and TechNation, as well as substantial UK SME support.
In addition to a Ph.D., students will complete a Postgraduate Diploma (PGDip) in SLT Leadership. This bespoke programme will provide them with the necessary skills for academic and industrial leadership in SLT, and will cover software engineering, entrepreneurship, management, and societal responsibility.
The SLT is headed by Professor Thomas Hain and Professor Rob Gaizauskas.
(2019-02-20) Post-doc position at GIPSA-Lab, Grenoble, France
The CRISSP team of GIPSA-lab (Grenoble-France) is opening a 24-month postdoctoral position on 'Text-to-Speech alignment for assessing reading fluency of young children' within the framework of the e-FRAN Fluence project, where we assess and train 700 pupils and 300 collegians.
(2019-02-20) CDD ingenieur d'etudes chargé d'affaires, Aix-Marseille, France
En collaboration avec les directions des trois plateformes technologiques1 CRVM, CEP et H2C2, et sous la responsabilité du chargé de mission Plateformes technologiques d?Aix-Marseille, le/la chargé.e d?affaires conçoit et met en oeuvre la stratégie de développement des
offres de services des plateformes technologiques afin de favoriser leur croissance.
CRIM (COMPUTER RESEARCH INSTITUTE OF MONTREAL) IS HIRING!
POSTDOCTORAL RESEARCHER POSITION - SPEAKER RECOGNITION Speech and Text group
CRIM (http://www.crim.ca/en) is an applied research and expertise centre in information technology, dedicated to making organizations more effective and competitive through the development of innovative technology and the transfer of leading edge know-how, while contributing to scientific advancement.
JOB DESCRIPTION CRIM is looking for a postdoctoral researcher with a background in speaker recognition, and, ideally, in other related fields such as speaker diarization, speech recognition and machine learning. The successful candidate will work on speaker recognition RD activities within the Speech and Text group. The ideal candidate must be able to work on team research themes and supervise graduate students in an open environment where collaborations with experts in other fields at CRIM is valued. The position is offered on a one-year basis with the possibility of renewal for up to 3 or 4 years depending on performance and funding. Responsibilities:
● Perform high quality research on speaker recognition and anti-spoofing
● Assist in supervising graduate students
● Publish in referred journals and conferences
● Write/contribute to grant applications for new research projects
SKILLS AND EXPERIENCE Required qualifications:
● Doctoral degree (Ph.D.) in a relevant field
● Exceptional academic record and a clear aptitude for research ● Experience in student supervision ● Good publication record
● Excellent verbal and written communication skills in English Preferred qualifications:
● Familiarity with modern distributed programming environments and with languages such as C++, Python and Perl
● Programming experience with relevant tools such as Theano, TensorFlow, Torch or Kaldi
● Proficiency in written and spoken French ABOUT THE ENVIRONMENT Located in Montreal, an historical, vibrant and culturally diverse city with 6 universities, recognized for its safety and low cost of living. Already a favorite of high tech and creative industries, Montreal have recently received large public and private investments and been turned into a leading center in machine learning and artificial intelligence research.
GOOD REASONS FOR JOINING CRIM’S TEAM ● Benefit from various attractive employment conditions (Drug and health insurance plan, Pension plan, Competitive salary, French training programs)
● Reap the benefits of an outstanding work atmosphere, characterized by mutual support and good humour
● Work alongside passionate people in a collaborative setting
● Maintain work/family balance and quality of life
HOW TO APPLY Apply directly to emploi@crim.ca, or use our online form. CRIM is an equal opportunity employer and values diversity. We encourage the development of ideas as a team and cultivate an open work environment that respects differences. We encourage all candidates to apply for this position; however, only selected individuals will be contacted. Thank you for your interest in CRIM! Join CRIM’s team and work with dynamic and passionate people!
(2019-03-03) Professor (W2) Speech Technology and Hearing Devices at University of Oldenburg, Germany
Professor (W2) Speech Technology and Hearing Devices at University of Oldenburg, Cluster of Excellence Hearing4all Oldenburg, Germany (website: http://hearing4all.eu/EN/)
UKRI CENTRE FOR DOCTORAL TRAINING IN NATURAL LANGUAGE PROCESSING
School of Informatics School of Philosophy, Psychology and Language Sciences University of Edinburgh
UK Research and Innovation has recently announced funding for a Centre in Doctoral Training in Natural Language Processing (CDT in NLP) at the University of Edinburgh. This CDT offers unique, tailored doctoral training consisting of both taught courses and a doctoral dissertation. Both components run concurrently over four years. Each student will take a set of courses designed to complement their existing expertise and give them an interdisciplinary perspective on NLP. They will received full funding for four years, plus generous funding for travel, equipment, and research costs.
The CDT brings together researchers in NLP, speech, linguistics, cognitive science, and design informatics from across the University of Edinburgh. Students will be supervised by a team of over 40 world-class faculty and will benefit from cutting edge computing and experimental facilities, including a large GPU cluster and eye-tracking, speech, virtual reality, and visualization labs. The CDT involves over 20 industrial partners, including Amazon, Facebook, Huawei, Microsoft, Mozilla, Reuters, Toshiba, and the BBC. Close links also exist with the Alan Turing Institute and the Bayes Centre.
The first cohort of CDT students will start in September 2019, and we are now seeking applications. A wide range of research topics fall within the remit of the CDT:
Natural language processing and computational linguistics
Speech technology
Dialogue, multimodal interaction, language and vision
Information retrieval and visualisation, computational social science
Computational models of human cognition and behaviour, including language and speech processing
Human-Computer interaction, design informatics, assistive and educational technology
Psycholinguistics, language acquisition, language evolution, language variation and change
Linguistic foundations of language and speech processing Approximately 8 studentships are available, covering both maintenance at the research council rate of GBP 15,009 per year and tuition fees. Studentships are available for UK, EU, and non-EU nationals.
Applicants should have an undergraduate or master?s degree in computer science, linguistics, cognitive science, AI, or a related discipline. We particularly encourage applications from women, minorities, and members of other groups that are underrepresented in technology.
Further details including the application procedure can be found at:
In order to ensure full consideration for funding, applications (including all supporting documents) need to be received by 29 March 2019. Please direct inquiries to the PhD admissions team at cdt-nlp-admissions@inf.ed.ac.uk.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
(2019-02-27) Postdoctoral position at IRISA Rennes France
Postdoctoral researcher
Job description CONTEXT IRISA (https://www.irisa.fr/) is the largest research laboratory dedicated to computer science in France, hosting more than 800 people and 40 research teams. Its activities spans all the fields of computer science. It is located in Rennes, Lannion, and Vannes. The Expression team (https://www-expression.irisa.fr/) focuses on natural language processing (NLP), be it through texts, speech or gestures. In particular, it has internationally recognized expertise in textto-speech (TTS). The opened position is part of a project aimed at the synthesis of the Breton language. The recruited person will collaborate with other researchers and engineers implied in NLP and TTS. TASKS Development of NLP modules for Breton and integration in a TTS pipeline, i.e.:
1. Phonetization, grapheme-to-phoneme conversion.
2. Text normalization.
3. POS tagging and chunking ;
4. Adaptation of the team’s TTS engine. The position also includes data management and project monitoring tasks. REQUIRED QUALIFICATION & SKILLS
• PhD in computer science
• Recent skills in natural language processing
• Recent skills in machine learning
• Top academic and publication records
• Good communication skills
• Team work experience
CONTRACT
• 18 month, full time.
• Campus of Lannion (22).
• Competitive salary, depending on the experience.
(2019-03-07) Two PhD positions in the area of Information Extraction, Data Mining and Machine Learning at Saarland University
Two PhD positions in the area of Information Extraction, Data Mining and Machine Learning at Saarland University
We anticipate the availability of funds for two PhD positions in the area of information extraction, data mining and machine learning.
The project aims at designing a framework for extracting evidence and actionable intelligence from large amount of noisy multilingual multimodal data based on advanced speech and language technologies (SLTs), visual analysis (VA) and network analysis (NA). The overall project goal is to achieve a significant improvement in identification of events, entities and relations, and to design a new generation of probabilistic and neural networks based tools interfacing SLT, VA and NA technologies. The research will be carried out together with a European consortium of high-profile research institutes and companies.
The successful candidate should have a degree in computer science, computational linguistics, mathematics, physics or a discipline with a related background. Excellent programming skills in modern object-oriented languages are required, as well as strong analytical and problem-solving skills. Relevant expertise in the area of the project is desired. Very good oral and written communication skills in English are required. This work will be conducted at the Spoken Language Systems group (http://www.lsv.uni-saarland.de/) at Saarland University. Saarland University (http://www.uni-saarland.de/en/) is a European leader in Computer Science research and teaching, and is particularly well-known for its research in Computational Linguistics and Natural Language Processing. In addition, the Max Planck Institute for Computer Science, the Max Planck Institute for Software Systems and the German Research Center for Artificial Intelligence (DFKI) are located on campus. Students and researchers come from many countries and the research language is English.
The department of Language Science and Technology at Saarland University is one of the leading departments in the speech and language area in Europe. The flagship project at the moment is the CRC on Information Density and Linguistic Encoding. It also runs a significant number of European and nationally funded projects. Both positions are 3-years positions with a salary German TV-L E13 scale (75%). The starting salary is about 34,000 euros per year and increases with experience. Each application should include:
* Curriculum Vitae including a list of publications (if applicable) * Transcript of records * Short statement of interest (not more than half a page) * Names of two references * Any other supporting information or documents
Applications (documents in PDF format in a single file) should be sent to: Dietrich.Klakow@lsv.uni-saarland.de Priority will be given to applications received by Monday April 15th 2019. Further inquiries regarding the project should be directed to: Michael A. Hedderich or Olga Petukhova
(2019-03-23) Chargé de recherches à l'IRCAM, Paris, France
L'équipe Analyse-Synthèse de l'IRCAM recherche un/une chargé(e) de recherche pour le développement des technologies relatives à l?analyse automatique d?enregistrements musicaux (progression d?accords, identification d?instruments, qualité audio, auto-tagging).
EXPÉRIENCE ET COMPÉTENCES REQUISES :
- Titulaire d?un doctorat, le/la candidat/e a une très bonne connaissance du traitement du signal (analyse spectrale, extraction de descripteurs audio, estimation de paramètres) ainsi qu?en algorithmes d?apprentissage automatique (SVM, ConvNet) et en calcul distribué ; - Il/Elle maîtrise la programmation Matlab, Python et C/C++ ; - Il/Elle a une bonne connaissance de l'environnement UNIX (GNU-Linux ou MacOSX) ; - Haute productivité, travail méthodique, excellent style de programmation, bonne communication, rigueur.
Prière d'envoyer une lettre de motivation avec la référence 201903UMGRES et un CV détaillant le niveau d'expérience/expertise dans les domaines mentionnés ci-dessus (ainsi que toute autre information pertinente) à mignot at ircam dot fr avec copie à vinet at ircam dot fr et roebel at ircam dot fr.
We are recruiting a Lecturer/Senior Lecturer in Speech and Hearing Technologies to join the Speech and Hearing Group at the University of Sheffield. The group currently has interests that span speech recognition, speech enhancement and source separation, diarisation, speaker and language identification, language learning, assistive robotics and media application of speech technology, spoken dialogue systems and downstream integration such as machine translation, summarisation, and speech analytics. SPandH is host to the Voicebase Centre for Speech and Language Technology and the UKRI Centre for Doctoral Training in Speech and Language Technology and Their Applications.
En tant que Project Manager, vous participerez au développement et à l’amélioration du traitement du langage naturel. Cela inclut :
Selon le type de projet, vous serez amené(e) à former, encadrer, et superviser une équipe afin de réaliser diverses tâches de traitement automatique des langues (TAL)
Le développement, la maintenance et l’amélioration de la qualité des systèmes de traitement du langage naturel
Entraîner et/ou évaluer des systèmes de dialogue ou de traitement du langage naturel, identifier les erreurs/régressions et proposer des solutions
La consultation linguistique sur le traitement automatique appliqué à une langue et/ou sur l’analyse/modélisation du dialogue
Garantir la qualité de l’output linguistique pour les utilisateurs dans les dialogues homme-machine
Compétences/Expériences :
Niveau natif en français canadien et maîtrise courante de l’anglais.
Diplôme de licence en linguistique, linguistique informatique et/ou dans des disciplines associées. (niveau bac + 4)
Capacité à comprendre rapidement des concepts techniques et à apprendre à manier les outils internes à une entreprise.
Réel intérêt pour les (nouvelles) technologies.
Réel intérêt pour le traitement automatique des langues, la linguistique théorique et descriptive ainsi que toutes les formes de ressources linguistiques.
Attention du détails, compétences d’organisation et de communication.
Expérience avec la gestion de projet.
Capacité à travailler efficacement de manière indépendante et avec flexibilité dans un environnement collaboratif évoluant constamment de façon rapide.
Intitulé du poste :
Project manager
Champs linguistiques :
Anglais, français, canadien (obligatoire)
Lieu :
Paris, France
Salaire : selon expérience
CV + lettre de motivation en Anglais : celine.couillaud@adeccooutsourcing.fr
The Expression team of the IRISA lab (France) is opening a 18-month postdoctoral position on speech synthesis and ?natural language processing for the Breton language, starting as soon as possible.
Detailed missions:
Development of NLP and TTS modules for Breton
Phonetization, grapheme-to-phoneme conversion
Text normalization
POS tagging and chunking
Adaptation of the team?s TTS engine.
This includes the use of machine learning techniques.
Profile / skills :
PhD in natural language processing, speech processing or machine learning
Within the software department of Parrot Faurecia Automotive in Paris, you will be in charge of developing the speech recognition architecture on our new platform, managing the expertise topics related to Virtual Personal Assistants and managing specific developments for demonstrations and prototypes.
MISSIONS
Reporting to the manager of the company's transversal activities (voice recognition, navigation, translations, App Market), your missions will be as follows:
Define, specify and implement the speech recognition architecture taking into account the limits and interests of both modes (embedded/deported)
Prototype and model the different dialogue platforms
Write the technical specifications of these HMIs
Study, analyze and respond to calls for tenders related to the subject
Participate in customer and supplier workshops
Develop and implement the speech recognition and Natural Language Understanding (NLU) part for modeling and prototyping
Identify and document speech recognition technological innovations to extract added value for the company
Study, analyze and characterize speech and NLU recognition solutions from suppliers and competitors
Collaborate within a dynamic and motivated team
Ensure that speech recognition works properly on the new platform
Create dialogue platforms that combine flexibility of use, robustness and performance
Provide expertise and support on the theme 'Speech Processing': advice, methods, tools
PROFILE
You are a graduate of an engineering school or thesis, ideally in computer science with a specialization in speech processing.
You have at least 5 years of experience in the field of speech recognition.
A good command of C++ and Python languages and a solid knowledge of speech processing are essential.
Ideally, you are familiar with:
Understanding natural language
Signal processing
Nuance Technologies
Google Assistant
Alexa
Java development under Android
Occasional travel is expected.
Fluent French is required.
Your level of English allows you to exchange in writing and orally, you master the technical vocabulary in particular.
Annonce pour le poste de Voice Assistant Integration Engineer (M/F) :
Would you like to be part of a French company known for its innovation and agility?
Come and join Parrot Faurecia Automotive, a leading automotive equipment manufacturer with a start-up mentality, working to develop the cockpit of the future!
Located in central Paris, our teams engineer innovative technologies by developing man-machine interfaces based on Android Auto to provide an intuitive connected experience to end users. Our embedded electronic solutions are equipped with powerful multi-core graphics processors to provide quick response times for multiple simultaneous instructions.
Join us in taking up one of the greatest challenges of the automotive industry!
We are looking for a Voice Assistant Integration Engineer (M/F)
CONTEXT
Within the software department of Parrot Faurecia Automotive in Paris, you will be in charge of managing the expertise subjects related to Virtual Personal Assistants and managing specific developments for demonstrations and prototypes.
MISSIONS
Reporting to the manager of the company's transversal activities (voice recognition, navigation, translations, App Market), your missions will be as follows:
Prototype and model the different dialogue platforms
Write the technical specifications of these HMIs
Study, analyze and respond to calls for tenders related to the subject
Participate in customer and supplier workshops
Develop and implement the speech recognition and Natural Language Understanding (NLU) part for modeling and prototyping
Identify and document speech recognition technological innovations to extract added value for the company
Study, analyze and characterize speech and NLU recognition solutions from suppliers and competitors
Collaborate within a dynamic and motivated team
PROFILE
You are a graduate of an engineering school or thesis, ideally in computer science with a specialization in speech processing.
You have at least 2 years of experience in the field of speech recognition.
You are proficient in C++ language, Java development under Android and you have a solid knowledge of Google and Alexa assistants.
Ideally, you are familiar with:
Understanding natural language
Speech processing
Signal processing
Nuance Technologies
The Python language
Occasional travel is expected.
Fluent French is required.
Your level of English allows you to exchange in writing and orally, you master the technical vocabulary in particular.
(2019-04-11) Project Manager Intellectual Property and Language Data , ELDA, Paris, France
The European Language resources Distribution Agency (ELDA), a company specialised in Human Language Technologies within an international context is currently seeking to fill an immediate vacancy for a permanent Project Manager Intellectual Property and Language Data position.
Under the CEO?s supervision, the Project Manager Intellectual Property and Language Data will handle legal issues related to compilation, use and distribution of language datasets on a European and international scale. This yields excellent opportunities for young, creative, and motivated candidates wishing to participate actively to the Language Engineering field.
His main tasks will consist of:
drafting and negotiating distribution contracts for language datasets to be added to an online catalogue;
analysing of the legal status of language datasets;
implementing GDPR requirements in the processing of language data;
supervising data collection, particularly in the context of public Open Data;
implementing evaluation procedures for IPR clearance of digital data.
A successful candidate:
holds a Master?s degree (or equivalent) in IT Law, with good understanding of intellectual property and data protection;
holds a Bachelor?s degree (or justifies equivalent experience) in Information Science, Knowledge Management or a similar domain;
speaks fluent English, with advanced writing and analytical skills;
is familiar with public licensing schemes (CC, GPL, etc.);
justifies experience in project management and/or participation in European or international projects;
is dynamic, communicative, flexible and willing to work on various tasks;
is capable of working independently as well as in a team;
is an EU citizen, or has a residence permit enabling him to work in France.
All applications will be carefully examined until the position is filled. The position is based in Paris.
Gross annual salary: 30.000-36.000 EUR depending on experience.
Applicants should email a cover letter addressing the points listed above together with a curriculum vitae to: job@elda.org.
ELDA is a human-sized company (15 people) acting as the distribution agency of the European Language Resources Association (ELRA). ELRA was established in February 1995, with the support of the European Commission, to promote the development and exploitation of Language Resources (LRs). Language Resources include all data necessary for language engineering, such as monolingual and multilingual lexica, text corpora, speech databases and terminology. The role of this non-profit membership Association is to promote the production of LRs, to collect and to validate them and, foremost, make them available to users. The association also gathers information on market needs and trends.
Keywords: discriminative pattern mining, neural networks analysis, explainability of black box models, speech recognition.
Context:
Understanding the inner working of deep neural networks (DNN) has attracted a lot of attention in the past years [1, 2] and most problems were detected and analyzed using visualization techniques [3, 4]. Those techniques help to understand what an individual neuron or a layer of neurons are computing. We would like to go beyond this by focusing on groups of neurons which are commonly highly activated when a network is making wrong predictions on a set of examples. In the same line as [1], where the authors theoretically link how a training example affects the predictions for a test example using the so called ?influence functions?, we would like to design a tool to ?debug? neural networks by identifying, using symbolic data mining methods, (connected) parts of the neural network architecture associated with erroneous or uncertain outputs.
In the context of speech recognition, this is especially important. A speech recognition system contains two main parts: an acoustic model and a language model. Nowadays models are trained with deep neural networks-based algorithms (DNN) and use very large learning corpora to train an important number of DNN hyperparameters. There are many works to automatically tune these hyperparameters. However, this induces a huge computational cost, and does not empower the human designers. It would be much more efficient to provide human designers with understandable clues about the reasons for the bad performance of the system, in order to benefit from their creativity to quickly reach more promising regions of the hyperparameter search space.
Description of the position:
This position is funded in the context of the HyAIAI ?Hybrid Approaches for Interpretable AI? INRIA project lab (https://www.inria.fr/en/research/researchteams/inria-project-labs). With this position, we would like to go beyond the current common visualization techniques that help to understand what an individual neuron or a layer of neurons is computing, by focusing on groups of neurons that are commonly highly activated when a network is making wrong predictions on a set of examples. Tools such as activation maximization [8] can be used to identify such neurons. We propose to use discriminative pattern mining, and, to begin with, the DiffNorm algorithm [6] in conjunction with the LCM one [7] to identify the discriminative activation patterns among the identified neurons.
The data will be provided by the MULTISPEECH team and will consist of two deep architectures as representatives of acoustic and language models [9, 10]. Furthermore, the training data will be provided, where the model parameters ultimately derive from. We will also extend our results by performing experiments with supervised and unsupervised learning to compare the features learned by these networks and to perform qualitative comparisons of the solutions learned by various deep architectures. Identifying ?faulty? groups of neurons could lead to the decomposition of the DL network into ?blocks? encompassing several layers. ?Faulty? blocks may be the first to be modified in the search for a better design.
The recruited person will benefit from the expertise of the LACODAM team in pattern mining and deep learning (https://team.inria.fr/lacodam/) and of the expertise of the MULTISPEECH team (https://team.inria.fr/multispeech/) in speech analysis, language processing and deep learning. We would ideally like to recruit a 1 year (with possibly one additional year)post-doc with the following preferred skills: ? Some knowledge (interest) about speech recognition ? Knowledgeable in pattern mining (discriminative pattern mining is a plus) ? Knowledgeable in machine learning in general and deep learning particular ? Good programming skills in Python (for Keras and/or Tensor Flow) ? Very good English (understanding and writing)
However, good PhD applications will also be considered and, in this case, the position will last 3 years. The position will be funded by INRIA (https://www.inria.fr/en/). See the INRIA web site for the post-doc and PhD wages.
The candidates should send a CV, 2 names of referees and a cover letter to the four researchers (firstname.lastname@inria.fr) mentioned above. Please indicate if you are applying for the post-doc or the PhD position. The selected candidates will be interviewed in June for an expected start in September 2019.
Bibliography:
[1] Pang Wei Koh, Percy Liang: Understanding Black-box Predictions via Influence Functions. ICML 2017: pp 1885-1894 (best paper).
[2] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals: Understanding deep learning requires rethinking generalization. ICLR 2017.
[3] Anh Mai Nguyen, Jason Yosinski, Jeff Clune: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. CVPR 2015: pp 427-436.
[4] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus: Intriguing properties of neural networks. ICLR 2014.
[5] Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi: Deep Text Classification Can be Fooled. IJCAI 2018: pp 4208-4215.
[6] Kailash Budhathoki and Jilles Vreeken. The difference and the norm?characterising similarities and differences between databases. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 206?223. Springer, 2015.
[7] Takeaki Uno, Tatsuya Asai, Yuzo Uchida, and Hiroki Arimura. Lcm: An efficient algorithm for enumerating frequent closed item sets. In Fimi, volume 90. Citeseer, 2003.
[8] Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. Visualizing higher-layer features of a deep network. University of Montreal, 1341(3):1, 2009.
[9] G. Saon, H.-K. J. Kuo, S. Rennie, M. Picheny: The IBM 2015 English conversational telephone speech recognition system?, Proc. Interspeech, pp. 3140-3144, 2015.
[10] W. Xiong, L. Wu, F. Alleva, J. Droppo, X. Huang, A. Stolcke : The Microsoft 2017 Conversational Speech Recognition System, IEEE ICASSP, 2018.
(2019-04-13) PhD Thesis position or research engineer or post-doc position in Natural Language Processing: Introduction of semantic information in a speech recognition system, LORIA, Nancy, France
PhD Thesis position or research engineer or post-doc position in Natural Language Processing: Introduction of semantic information in a speech recognition system
Duration of post-doc or research engineer: 12-18 months
Duration of PhD Thesis : 3 years
Deadline to apply : May 15th, 2019
Required skills: background in statistics, natural language processing and computer program skills (Perl, Python). Candidates should email a detailed CV with diploma
Under noisy conditions, audio acquisition is one of the toughest challenges to have a successful automatic speech recognition (ASR). Much of the success relies on the ability to attenuate ambient noise in the signal and to take it into account in the acoustic model used by the ASR. Our DNN (Deep Neural Network) denoising system and our approach to exploiting uncertainties have shown their combined effectiveness against noisy speech.
The ASR stage will be supplemented by a semantic analysis. Predictive representations using continuous vectors have been shown to capture the semantic characteristics of words and their context, and to overcome representations based on counting words. Semantic analysis will be performed by combining predictive representations using continuous vectors and uncertainty on denoising. This combination will be done by the rescoring component. All our models will be based on the powerful technologies of DNN.
Main activities
study and implementation of a noisy speech enhancement module and a propagation of uncertainty module;
design a semantic analysis module;
design a module taking into account the semantic and uncertainty information.
Skills
Strong background in mathematics, machine learning (DNN), statistics
Following profiles are welcome, either:
Strong background in signal processing
or
Strong experience with natural language processing
Excellent English writing and speaking skills are required in any case.
References
[Nathwani et al., 2018] Nathwani, K., Vincent, E., and Illina, I. DNN uncertainty propagation using GMM-derived uncertainty features for noise robust ASR, IEEE Signal Processing Letters, 2018.
[Nathwani et al., 2017] Nathwani, K., Vincent, E., and Illina, I. Consistent DNN uncertainty training and decoding for robust ASR, in Proc.IEEE Automatic Speech Recognition and Understanding Workshop, 2017.
[Nugraha et al., 2016] Nugraha, A., Liutkus, A., Vincent E. Multichannel audio source separation with deep neural networks. IEEE/ACMTransactions on Audio, Speech, and Language Processing, 2016.
[Sheikh, 2016] Sheikh, I. Exploitation du contexte sémantique pour améliorer la reconnaissance des noms propres dans les documents audio diachroniques?, These de doctorat en Informatique, Université de Lorraine, 2016.
[Peters et al., 2017] Matthew Peters, Waleed Ammar, Chandra Bhagavatula, and Russell Power. 2017. ?Semi-supervised sequence tagging with bidirectional language models.? In ACL.
[Peters et al., 2018] Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. ?Deep contextualized word representations?. In NAACL.
PhD Title: Remote and Automatic Monitoring of Bird Populations
Studentship: Full Scholarship, including fees (EU/Non EU) plus annual stipend of €18,000.
Start Date: Sept 2019
PhD Supervisor: Dr. Naomi Harte, Sigmedia Group, Electronic & Electrical Engineering, Trinity College Dublin, Ireland
Background:
The analysis of birdsong has increased in the speech processing community in the past 5 years. Much of the reported research has concentrated on the identification of bird species from their songs or calls. Birdlife International has identified over 12,000 sites globally that are home to bird species of conservation concern and other forms of biodiversity. Out of these, 422 are in danger due to a number of threats including human encroachment and climate change. One of the main challenges in properly managing these sites is adequately monitoring them to determine their state, threats to the ecosystem and responses to these threats. Remote monitoring is the best potential option to achieve the level of coverage required.
The objective of this PhD project is to define the next-generation approaches to the use of remote monitoring for populations of birds of conservation concern. This PhD programme will develop acoustic techniques for the monitoring of bird species of conservation concern by leveraging recent developments in speech and language processing technologies. The PhD will develop appropriate approaches to acoustic data collection in the wild to ensure that acoustic surveys yield accurate bird population data and investigate audio signal analysis steps necessary to extract useful information from these long recordings. In particular the student will focus on signal enhancement to mitigate noise, and the idea of diarisation, i.e. the concept of 'who sang when'. This ambitious approach will take concepts from speaker diarisation in the speech processing domain and attempt to make sense of recordings overall. Birdsong presents significant challenges beyond speech, with more rapid pitch fluctuations coupled with noisier recordings in the wild. Thus the research is very far from a re-application of knowledge from one domain to another. Also, rather than trying to identify specific species in a recording from a closed set of possible birds, this approach will consider an unconstrained set to add to the technical challenges and make the results even more impactful. The desire is to exploit online archives of birdsong recordings from experts such as those available on xeno-canto.org and The Macaulay Library at Cornell. Based on the known geographical location of a recording, spontaneous models of bird vocalisations from populations in that area could be constructed using machine learning performed on available archived recordings. Techniques developed originally for speaker
identification will be further developed for this application. This work will also leverage deep learning to quickly build accurate models from these large datasets.
Envisaged Outputs of the Research:
• Signal processing algorithms to address noise issues specific to remote recordings in bird habitats.
• Exploitation of advanced machine learning approaches, including deep learning, to identify portions of recordings that contain bird activity.
• Disruptive approaches to automatic bird species identification to leverage opensource repositories to identify birds present in the recordings.
Requirements:
The ideal candidate for this position will:
• Have a primary degree (first class honours) in Electrical Engineering, Computer Engineering or a closely related discipline.
• Possess strong written and oral communication skills in English.
• Have a strong background and interest in digital signal processing (DSP)
• Have strong coding skills
• Be mathematically minded, and be curious about nature.
Interested candidates should send an email to Prof. Naomi Harte at nharte@tcd.ie. The email MUST include the following:
• Candidate CV (max 2 pages)
• A short statement of motivation (half page)
• Scanned academic transcripts
• Proof of English language competency (where applicable, see 1)
• Name and contact details for TWO academic referees
(2019-04-20) Two PhD Positions in Computational Linguistics or Phonetics or Speech Science, Saarland University, Germany
Two PhD Positions in Computational Linguistics or Phonetics or Speech Science
We are pleased to announce the availability of two PhD positions in the Language Science and Technology department at Saarland University in a project funded by the German Research Foundation (DFG). The three-year project is entitled 'Pause-internal phonetic particles' and is directed by Jürgen Trouvain and Bernd Möbius. Its focus is on the production and perception of vocalisations such as breath noises and tongue clicks typically found in speech pauses. Research in the project will be based on corpus analyses and production and perception experiments and develop pause models for speech synthesis.
The successful candidates should have a Master's degree in Computational Linguistics or Phonetics or Speech Science, or a related discipline. A good command of English is mandatory. Working knowledge of German is desirable but not a prerequisite. Candidates must have completed their Master studies by the time of the appointment. We are happy to consider applicants who have not yet finished their MA/MSc by the time of application but will have submitted their thesis by the starting date. Both contracts are funded for three years at a 65% salary on the German TV-L 13 scale.
The doctoral researchers will join a vibrant community of speech and language researchers at Saarland University whose expertise spans areas such as computational linguistics, psycho-linguistics, language and speech technology, speech science, theoretical and corpus linguistics, computer science, and psychology. Saarland University offers a lively academic environment for phonetics research. The department of Language Science and Technology is one of the leading departments in speech and language in Europe, with approximately 50 postdoctoral researchers and PhD students. The flagship project at the moment is the Collaborative Research Centre on Information Density and Linguistic Encoding. It also runs a significant number of European and nationally funded projects.
Applicants with a degree in Phonetics, Computational Linguistics, Spoken Language Processing, Speech Technology, or related fields, are encouraged to submit a full CV (including transcripts and copies of certificates, as well as two references) and a sample of written academic work, accompanied by a cover letter, to trouvain[at]coli.uni-saarland.de and moebius[at]coli.uni-saarland.de. Review of applications will begin on June 3, 2019, and will continue until the suitable candidates are found.
Dr. Jürgen Trouvain and Prof. Bernd Möbius
-- Jürgen Trouvain Saarland University Language Science and Technology Campus C7.2 D-66123 Saarbrücken Tel.: +49 - (0)681 - 302 - 46 94
(2019-04-21) Technical Engineer/Scientist (Project Manager) position, ELDA, Paris
The European Language resources Distribution Agency (ELDA), a company specialised in Human Language Technologies within an international context is currently seeking to fill an immediate vacancy for a permanent Technical Engineer/Scientist (Project Manager) position, specialized in Speech and Multimodal technologies.
Technical Engineer / Scientist (Project Manager) in Speech and Multimodal Technologies
Under the supervision of the CEO, the responsibilities of the Technical Engineer/Scientist include designing/specifying language resources, setting up production frameworks and platforms, carrying out quality control and assessment. He/she will be in charge of renovating the current language resources production workflows. This yields excellent opportunities for young, creative, and motivated candidates wishing to participate actively to the Language Engineering field. He/she will be in charge of conducting the activities related to language resources and Speech and Multimodal technologies. The task will mostly consist in managing language resources production projects and co-ordinating ELDA?s participation in R&D projects while being also hands-on whenever required by the development team.
Profile :
PhD in computer science, speech, audiovisual/multimodal technologies
Experience and/or good knowledge in speech data collection, expertise in phonetics, transcription tools
Experience in speech recognition, synthesis, speaker ID and the well-used packages (e.g. KALDI) and the tools to produce, collect and assess quality of resources and datasets
Experience and/or good knowledge of the Language Technology area
Experience with technology transfer projects, industrial projects, collaborative projects within the European Commission or other international frameworks
Good knowledge of Linux and open source software
Proficiency in Python
Hands-on experience in Django is a plus
Ability to work independently and as part of a team, in particular the ability to supervise members of a multidisciplinary team
Dynamic and communicative, flexible to combine and work on different tasks
Proficiency in French and English
Citizenship of (or residency papers) a European Union country
All applications will be carefully examined until the position is filled. The position is based in Paris.
Salary: Commensurate with qualifications and experience.
Applicants should email a cover letter addressing the points listed above together with a curriculum vitae to: job@elda.org.
ELDA is a human-sized company (15 people) acting as the distribution agency of the European Language Resources Association (ELRA). ELRA was established in February 1995, with the support of the European Commission, to promote the development and exploitation of Language Resources (LRs). Language Resources include all data necessary for language engineering, such as monolingual and multilingual lexica, text corpora, speech databases and terminology. The role of this non-profit membership Association is to promote the production of LRs, to collect and to validate them and, foremost, make them available to users. The association also gathers information on market needs and trends.
(2019-04-30) Ingénieur Développeur Domotique/Traitement multimédia temps-réel, LIG, Grenoble, France
Dans le cadre du projet national ANR VocADom, le Laboratoire d?Informatique de Grenoble (LIG) recrute un ingénieur Développeur Domotique/Traitement multimédia temps-réel. L?objectif général du projet de recherche industrielle VocADom vise à définir, en lien avec les utilisateurs finaux, les fonctionnalités d?un système domotique à commande vocale s?adaptant à l?utilisateur et utilisable au domicile dans des conditions réelles (bruit, présence de plusieurs personnes). Plus détails peuvent être trouvé sur le site du projet VocADom (https://vocadom.imag.fr).
*Mission :* La mission consiste à adapter le middleware OpenHAB gérant les capteurs et les actionneurs de l?appartement intelligent du LIG et d?un mini-kit domotique portable aux spécifications du projet et à réaliser l?interfaçage avec les outils de traitements automatiques développés par les partenaires du projet. Par ailleurs, l?ingénieur sera chargé de la mise en place technique des expérimentations et de leurs scénarios, incluant la production et la synchronisation des données multimédia (vidéo, audio, traces domotiques). Enfin, il sera soutien de l?intégration des algorithmes temps-réel des équipes de recherche dans l?architecture temps-réel développée par l?entreprise THEORIS, partenaire du projet. Le travail proposé comprendra les tâches suivantes : * prise en main et adaptation de l?infrastructure domotique OpenHab de l?appartement intelligent du LIG ; * conception du mini-réseau domotique portable (sur la base d?une architecture existante) ; * soutien aux expérimentations du projet ; * soutien à l?intégration des algorithmes temps-réels des équipes de recherche ; * documentation.
*Profil et compétences souhaitées :*
* niveau ingénieur ou M2 en informatique, * compétences opérationnelles en génie logiciel (gestion de version, tests, qualité de code), * connaissance de Java, OSGi, C, C++, Linux, * autonomie et force de proposition, capacité de gestion de projet, * une première expérience d?utilisation des middlewares domotiques (OpenHAB, UPnP, KNX...) serait un plus.
*Salaire :* 1650 à 1950e net/mois selon l?expérience
*Environnement de travail :* Le poste sera rattaché au Laboratoire d?Informatique de Grenoble, UMR CNRS, au sein de l?équipe GETALP. L?équipe GETALP (http://getalp.imag.fr/) regroupe plus de 40 chercheurs, ingénieurs et étudiants dans le domaine du traitement automatisé des langues et de la parole multilingue. Le candidat sera également amené à collaborer étroitement avec l?équipe IIHM du LIG et avec les collaborateurs de la Maison de l?Innovation et de la Création (Maci).
*Candidature* Envoyer un CV, une lettre de motivation accompagnée éventuellement de 1 à 3 lettres de recommandation à Michel.Vacher@imag.fr et Francois.Portet@imag.fr. Les candidatures seront examinées dès à présent au fil de l?eau jusqu?à la date de démarrage. Merci de candidater dès que possible avant cette date.
(2019-05-17) 2 PhDs in Trinity College Dublin, Ireland
2 PhDs in Trinity College Dublin, Ireland, to start in Sept 2019. Both come with a stipend of 18000Euros per year, along with full student fees for a 4 year period. Please contact me at nharte@tcd.ie if interested in either post.
Human Speech? How do I know it’s Real?
20 years ago, the major focus in developing speech synthesis systems was testing the intelligibility of the output speech. More recently, attention has switched focus to assessing not only intelligibility, but also naturalness, pleasantness, pauses, stress, intonation, emotion and listening effort. The intelligibility of systems is now so high, that synthetic voices are becoming more human-like. This is good news for generating realistic synthetic speech for applications such as voice reconstruction or gaming. In tandem, research in the area of speaker verification, or voice based biometrics, has started to pay closer attention to the issue of spoofing – where systems are attacked with reconstructed speech. Now, with improvements in speech synthesis, another realistic form of spoofing is the use synthetic speech generated by modelling the target user. So how can you tell when speech is real, or when it is fake? This is the focus of this PhD project and it goes to the very core of the nature of human speech.
Remote and Automatic Monitoring of Bird Populations
The objective of this PhD project is to define the next-generation approaches to the use of remote monitoring for populations of birds of conservation concern. This PhD programme will develop acoustic techniques for the monitoring of bird species of conservation concern by leveraging recent developments in speech and language processing technologies. The PhD will develop appropriate approaches to acoustic data collection in the wild to ensure that acoustic surveys yield accurate bird population data and investigate audio signal analysis steps necessary to extract useful information from these long recordings. Approaches will involve the use of signal processing and deep learning. The research will be conducted in collaboration with the Dept of Zoology at TCD.
(2019-05-18)Tenure track assistant professor at Faculty of Medicine at the University of Toronto, Canada
The Department of Speech-Language Pathology, in the Faculty of Medicine at the University of Toronto, invites applications for a full-time tenure-stream appointment in the field of pediatric language disorders. The appointment will be at the rank of Assistant Professor and will commence on January 1, 2020 or shortly thereafter.
The successful candidate must have a Ph.D. in Speech-Language Pathology, Communication Disorders or a related field of research and expertise in child language and child language disorders, with a focus on semantics, syntax, pragmatics or literacy with a demonstrated record of excellence in research and teaching. Completion of a post-doctoral fellowship will be considered an asset, as will experience supervising student research and/or post-doctoral fellows. We require that candidates have a clinical background in speech-language pathology/communication disorders and are eligible for registration with the College of Audiologists and Speech-Language Pathologists of Ontario (CASLPO). We seek candidates whose research and teaching interests complement and strengthen our existing departmental strengths. The successful candidate will be expected to pursue innovative and independent research at the highest international level and to establish an outstanding, competitive and externally funded research program.
Research excellence should be demonstrated by high-quality peer-reviewed publications, peer-reviewed funding, the submitted research statement, presentations at significant conferences, awards and accolades, and strong endorsements from referees of high standing.
Evidence of excellence in teaching will be provided through teaching accomplishments and awards, the teaching dossier, including a detailed teaching statement, sample course syllabi, and teaching evaluations submitted as part of the application, as well as strong letters of reference.
Salary will be commensurate with qualifications and experience.
This is an exceptional opportunity to join the Department of Speech-Language Pathology at the University of Toronto, one of the most highly ranked research universities in North America. The department is housed in the Rehabilitation Sciences Building, which provides excellent teaching and research facilities. The University of Toronto offers unique opportunities for collaborative and interdisciplinary research, encourages innovative scholarship, and provides the prospect of teaching a diverse student population.
All qualified candidates are invited to apply by clicking the link below. Applicants must submit a cover letter, curriculum vitae, research statement (up to 3 pages), copies of up to three representative publications and a teaching dossier to include a detailed teaching statement (up to 3 pages), sample syllabi, and teaching evaluations.
Applicants must also arrange to have three letters of reference sent directly by the referee via email (on letterhead and signed) to search.slp@utoronto.ca by the closing date. If you have questions about this position, please contact search.slp@utoronto.ca. All application materials must be submitted online.
Submission guidelines can be found at http://uoft.me/how-to-apply. We recommend combining attached documents into one or two files in PDF format.
All application materials, including reference letters, must be received by July 18, 2019.
For more information about the Department of Speech-Language Pathology, please visit our home page at https://slp.utoronto.ca/.
The University of Toronto is strongly committed to diversity within its community and especially welcomes applications from racialized persons / persons of colour, women, Indigenous / Aboriginal People of North America, persons with disabilities, LGBTQ persons and others who may contribute to the further diversification of ideas.
As part of your application, you will be asked to complete a brief Diversity Survey. This survey is voluntary. Any information directly related to you is confidential and cannot be accessed by search committees or human resources staff. Results will be aggregated for institutional planning purposes. For more information, please see http://uoft.me/UP.
All qualified candidates are encouraged to apply; however, Canadians and permanent residents will be given priority.
Understanding the inner working of deep neural networks (DNN) has attracted a lot of attention in the past years [1, 2] and most problems were detected and analyzed using visualization techniques [3, 4]. Those techniques help to understand what an individual neuron or a layer of neurons are computing. We would like to go beyond this by focusing on groups of neurons which are commonly highly activated when a network is making wrong predictions on a set of examples. In the same line as [1], where the authors theoretically link how a training example affects the predictions for a test example using the so called ?influence functions?, we would like to design a tool to ?debug? neural networks by identifying, using symbolic data mining methods, (connected) parts of the neural network architecture associated with erroneous or uncertain outputs.
In the context of speech recognition, this is especially important. A speech recognition system contains two main parts: an acoustic model and a language model. Nowadays models are trained with deep neural networks-based algorithms (DNN) and use very large learning corpora to train an important number of DNN hyperparameters. There are many works to automatically tune these hyperparameters. However, this induces a huge computational cost, and does not empower the human designers. It would be much more efficient to provide human designers with understandable clues about the reasons for the bad performance of the system, in order to benefit from their creativity to quickly reach more promising regions of the hyperparameter search space.
Description of the position:
This position is funded in the context of the HyAIAI ?Hybrid Approaches for Interpretable AI? INRIA project lab (https://www.inria.fr/en/research/researchteams/inria-project-labs). With this position, we would like to go beyond the current common visualization techniques that help to understand what an individual neuron or a layer of neurons is computing, by focusing on groups of neurons that are commonly highly activated when a network is making wrong predictions on a set of examples. Tools such as activation maximization [8] can be used to identify such neurons. We propose to use discriminative pattern mining, and, to begin with, the DiffNorm algorithm [6] in conjunction with the LCM one [7] to identify the discriminative activation patterns among the identified neurons.
The data will be provided by the MULTISPEECH team and will consist of two deep architectures as representatives of acoustic and language models [9, 10]. Furthermore, the training data will be provided, where the model parameters ultimately derive from. We will also extend our results by performing experiments with supervised and unsupervised learning to compare the features learned by these networks and to perform qualitative comparisons of the solutions learned by various deep architectures. Identifying ?faulty? groups of neurons could lead to the decomposition of the DL network into ?blocks? encompassing several layers. ?Faulty? blocks may be the first to be modified in the search for a better design.
The recruited person will benefit from the expertise of the LACODAM team in pattern mining and deep learning (https://team.inria.fr/lacodam/) and of the expertise of the MULTISPEECH team (https://team.inria.fr/multispeech/) in speech analysis, language processing and deep learning. We would ideally like to recruit a 1 year (with possibly one additional year)post-doc with the following preferred skills:
? Some knowledge (interest) about speech recognition
? Knowledgeable in pattern mining (discriminative pattern mining is a plus)
? Knowledgeable in machine learning in general and deep learning particular
? Good programming skills in Python (for Keras and/or Tensor Flow)
? Very good English (understanding and writing)
However, good PhD applications will also be considered and, in this case, the position will last 3 years. The position will be funded by INRIA (https://www.inria.fr/en/). See the INRIA web site for the post-doc and PhD wages.
The candidates should send a CV, 2 names of referees and a cover letter to the four researchers (firstname.lastname@inria.fr) mentioned above. Please indicate if you are applying for the post-doc or the PhD position. The selected candidates will be interviewed in June for an expected start in September 2019.
Bibliography:
[1] Pang Wei Koh, Percy Liang: Understanding Black-box Predictions via Influence Functions. ICML 2017: pp 1885-1894 (best paper).
[2] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals: Understanding deep learning requires rethinking generalization. ICLR 2017.
[3] Anh Mai Nguyen, Jason Yosinski, Jeff Clune: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. CVPR 2015: pp 427-436.
[4] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus: Intriguing properties of neural networks. ICLR 2014.
[5] Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi: Deep Text Classification Can be Fooled. IJCAI 2018: pp 4208-4215.
[6] Kailash Budhathoki and Jilles Vreeken. The difference and the norm?characterising similarities and differences between databases. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 206?223. Springer, 2015
[7] Takeaki Uno, Tatsuya Asai, Yuzo Uchida, and Hiroki Arimura. Lcm: An efficient algorithm for enumerating frequent closed item sets. In Fimi, volume 90. Citeseer, 2003.
[8] Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. Visualizing higher-layer features of a deep network. University of Montreal, 1341(3):1, 2009.
[9] G. Saon, H.-K. J. Kuo, S. Rennie, M. Picheny: The IBM 2015 English conversational telephone speech recognition system?, Proc. Interspeech, pp. 3140-3144, 2015.
[10] W. Xiong, L. Wu, F. Alleva, J. Droppo, X. Huang, A. Stolcke : The Microsoft 2017 Conversational Speech Recognition System, IEEE ICASSP, 2018.
(2019-05-19) PhD position at LPNC, Grenoble, France
SUBJECT TITLE: Bio-Bayes Predictions -- Coupling Biological and Bayesian Predictive Models in Neurocognitive Speech Processing RESEARCH FIELD: Computer science, Cognitive Science, Psychological sciences, Neurosciences SCIENTIFIC DEPARTMENT (LABORATORY?S NAME): LPNC -- Laboratoire de Psychologie et NeuroCognition DOCTORAL SCHOOL?S: MSTII -- Mathématiques, Sciences et technologies de l'information, Informatique SUPERVISOR?S NAME: Julien Diard, PhD
SUBJECT DESCRIPTION: The issue of the relationship between perception and production mechanisms is central to many domains in cognitive science. In speech communication for instance, where predictions from speech production simulation interact in various ways with perceptual processes. In this context, we have developed COSMO (Communicating about Objects using SensoriMotor Operations), a family of Bayesian algorithmic models of communicating agents. We have previously used such models to study the evolution of phonological systems (Moulin-Frier et al., 2015), speech perception and learning (Laurent et al., 2017; Barnaud et al., 2017; 2018), and speech production and adaptation (Patri et al., 2015; 2018).
However, so far, these models consider greatly simplified temporal dimensions. For instance, syllable perception was restricted to consonant-vowel syllables, assuming that the key points of speech trajectories, respectively at the center of the consonant and the vowel, were previously identified. This, of course, contrasts with natural speech processing, where sensory inputs and motor controls continuously unfold over time. Indeed, the neuronal substrates, in the brain, that deal with auditory input are well described in terms of their oscillatory nature, since they intrinsically have to deal with temporal properties of speech, and their predictive nature, since they aim at anticipating events.
In this PhD project, we aim at extending the COSMO framework to define the first Bayesian perceptuo-motor model of continuous speech communication. In previous work, in the domain of Bayesian visual word recognition modeling (Phénix et al., 2018; Ginestet et al., 2019), we have developed mathematical tools to describe the temporal dynamics of perceptual evidence accumulation across layers of hierarchical representations. Probability distributions at each layer (letters and words) evolve continuously over time, as a function of bottom-up sensory evidence and top-down lexical constraints, to predict upcoming events. Crucially, we have developed mathematical tools to model, on the one hand, attentional control of these information flows, and, on the other hand, asynchronous and asymmetric information transfer. Applying these mathematical constructs to speech communication modeling would yield a novel class of Bayesian hierarchical and predictive models, able to account for observations of neuronal oscillatory systems in the brain.
This PhD project is part of an international collaboration with Anne-Lise Giraud and Itsaso Olasagasti of the ?Auditory, Speech and Language Neuroscience? group at UNIGE (Université de Genève, Switzerland), with regular meetings and visits to Geneva planned. This international collaboration with Geneva will provide a unique framework for mixing the Bayesian approach with neuroscience constraints and data, providing a valuable multidisciplinary environment for the PhD project. The PhD candidate will integrate the ?Language? research team at LPNC (Laboratoire de Psychologie et NeuroCognition, Grenoble), will be supervised by Julien Diard (CR CNRS, LPNC) and Jean-Luc Schwartz (DR CNRS, Gipsa- Lab), and will be registered in the MSTII (Mathématiques, Sciences et technologies de l'information, Informatique) doctoral school of Univ. Grenoble Alpes.
ELIGIBILITY CRITERIA Applicants must hold a Master's degree (or be about to earn one) or have a university degree equivalent to a European Master's (5-year duration), Applicants will have to send an application letter in English and attach: - their last diploma; - their CV; - letters of recommendation are welcome. Address to send their application: Julien.Diard@univ-grenoble-alpes.fr
SELECTION PROCESS Application deadline: June, 21, 2019 at 17:00 (CET) Applications will be evaluated through a three-step process: 1. Eligibility check of applications before June, 24, 2019 2. 1st round of selection: the applications will be evaluated by a Review Board between June, 24, and June, 28, 2019. Results will be given June, 28, 2019. 3. 2nd round of selection: shortlisted candidates will be invited for an interview session in Grenoble on July, 5, 2019 (in Grenoble or by Skype).
TYPE of CONTRACT: temporary-3 years of doctoral contract JOB STATUS: Full time HOURS PER WEEK: 35 OFFER STARTING DATE: October, 1st, 2019 APPLICATION DEADLINE: June, 21, 2019 Salary: between 1768.55 ? and 2100 ? brut per month (depending on complementary activity or not) PROFILE SUMMARY: The candidate must have a background in computer science, applied mathematics or signal processing, with a strong affinity for cognitive sciences, or instead a background in cognitive science or computational neuroscience, with a strong affinity for mathematical modelling. This training must be validated, or be validated soon (before Summer 2019), by a Masters level diploma (or equivalent).
The candidate must have previous experience in scientific research in a laboratory. Mastery of a scientific calculus language, or of a general-purpose programming language is required (R, Python, Matlab, etc.). The candidate must also have a good command of scientific English, both spoken and written.
Previous experience in probabilistic modeling, dynamic systems modeling, or connectionist modeling are a plus but are not required. Previous knowledge of the field of perception modeling or speech production is also a plus.
(2019-05-20) Doctorant en robotique cognitive, Université de Grenoble, France
Lieu de travail : Grenoble
Date de publication : 1 Mai 2019
Noms des responsables scientifiques : Gérard Bailly (DR CNRS, GIPSA-Lab, U. Grenoble Alpes) et Pascal Huguet (DR CNRS, LAPSCO, U. Clermont Auvergne)
Type de contrat : CDD Doctorant/Contrat doctoral
Durée du contrat : 36 mois
Date de début de la thèse : 1 octobre 2019
Quotité de travail : Temps complet
Rémunération : 2 135,00 € brut mensuel
Sujet de la these:
Impact de la présence de robots attentifs et bienveillants sur les comportements et fonctionnements cognitifs humains
Contexte
La robotique sociale humanoïde vise à créer des robots capables d’interagir avec des partenaires humains à des fins de coopération dans des secteurs aussi variés que l’assistance aux personnes âgées, l’apprentissage de compétences par les enfants ou encore la cobotique dans l’industrie 4.0. L’objectif est souvent de substituer à un partenaire humain ou animal le robot dans des tâches où les propriétés d’endurance, de rapidité ou de flexibilité des attributions sociales de ces avatars sociaux sont bénéfiques. Si beaucoup d’études en interaction face-à-face montrent l’impact des comportements verbaux [1] et coverbaux [2] des robots sociaux sur ceux de leurs partenaires humains, peu d’études ont été consacrées aux effets comparés entre présence robotique vs. humaine en matière de surveillance de tâches dans lesquelles le robot ou son modèle humain ne sont pas directement engagés. Deux études récentes conduites au LAPSCO [3] [4] ont notamment permis de répliquer avec succès, sous condition de présence robotique, l’influence généralement bénéfique de la présence humaine en matière de contrôle de l’attention dans des tâches impliquant de réprimer un automatisme cognitif néfaste dans l’activité cible (e.g. [5]).
Sujet et plan de travail
Nous proposons ici d’explorer l’impact du comportement d’un robot humanoïde possédant des capacités de communication verbale et co-verbale étendues [6] sur les performances (en matière de contrôle cognitif) de sujets impliqués dans des tâches dans lesquelles le robot est directement impliqué ou plus simplement présent dans l’environnement d’interaction. Nous allons donc varier le degré avec lequel le robot s’immisce dans la tâche principale, afin de déterminer les signaux susceptibles d’optimiser l’expression des bénéfices attachés aux robots sociaux humanoïdes (et à terme, leur acceptabilité) et les signaux plus néfastes à éliminer.
Dans le cadre de cette thèse, nous explorerons l’impact du comportement robotique – dans un premier temps, téléopéré par un pilote humain [7] puis piloté par les modèles appris sur ces données comportementales [8] – sur le comportement observable des sujets (e.g., performances cognitives, signaux verbaux et co-verbaux, notamment regard), sur leurs signaux physiologiques (i.e. conduction dermale, rythme respiratoire et cardiaque) et sur leurs activités cérébrales (i.e. étude de l’onde négative d’erreur par EEG).
Retombées scientifiques et technologiques
Les retombées attendues de cette thèse sont multiples : d’abord des modèles de prédiction des liens entre comportements observables, variables physiologiques et activités cérébrales sous-jacentes, ceci en fonction des performances observées et des comportements actifs et réactifs du robot ; ensuite, un ensemble de stratégies de comportement de robots attentifs et bienveillants, adaptées aux profils psychologiques des sujets et aux indicateurs de performance propres à la tâche ; et enfin des protocoles d’évaluation automatique de ces profils psychologiques par des robots sociaux interactifs, susceptibles d’être déployés dans des applications en santé, éducation ou évaluation de ressources humaines.
Références
[1] K.-Y. Chin, Z.-W. Hong, and Y.-L. Chen, “Impact of using an educational robot-based learning system on students’ motivation in elementary education,” IEEE Transactions on learning technologies, vol. 7, no. 4, pp. 333–345, 2014.
[2] S. Andrist, X. Z. Tan, M. Gleicher, and B. Mutlu, “Conversational gaze aversion for humanlike robots,” presented at the Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction, 2014, pp. 25–32.
[3] N. Spatola et al., “Improved Cognitive Control in Presence of Anthropomorphized Robots,” International Journal of Social Robotics, pp. 1–14, 2019.
[4] N. Spatola et al., “Not as bad as it seems: When the presence of a threatening humanoid robot improves human performance,” Science Robotics, vol. 3, no. 21, eaat5843, 2018.
[5] D. Sharma, R. Booth, R. Brown, and P. Huguet, “Exploring the temporal dynamics of social facilitation in the Stroop task,” Psychonomic bulletin & review, vol. 17, no. 1, pp. 52–58, 2010.
[6] Parmiggiani, Alberto, Elisei, Frédéric, Maggiali, Marco, Randazzo, Marco, Bailly, Gérard, and Metta, Giorgio, “Design and validation of a talking face for the iCub,” International Journal of Humanoid Robotics, vol. 12, no. 3, 20 pages, 2015.
[7] R. Cambuzat, F. Elisei, G. Bailly, O. Simonin, and A. Spalanzani, “Immersive teleoperation of the eye gaze of social robots,” in Int. Symposium on Robotics (ISR), Munich, Germany, 2018, pp. 232–239.
[8] Nguyen, Duc-Canh, Bailly, Gérard, and Elisei, Frédéric, “Learning off-line vs. on-line models of interactive multimodal behaviors with Recurrent Neural Networks,” Pattern Recognition Letters (PRL), pp. 29–36, 2017.
Au sein du département Parole & Cognition du GIPSA-Lab (Grenoble, France), l'équipe Cognitive Robotics, Interactive Systems & Speech Processing (CRISSP) développe des modèles de comportements multimodaux (parole, gestes, regard, etc.) pour des robots humanoïdes en interaction avec des partenaires humains. Elle s'appuie sur les moyens techniques du GIPSA-Lab et en particulier sur la plate-forme robotique MICAL qui gère le robot iCub NINA et les outils de développement.
Cette thèse s'inscrit dans le projet de l'équipe de développer des robots engagés dans des tâches finalisées nécessitant un contrôle fin de l’engagement. Elle est financée par le projet 80 ans du CNRS «Robotique Sociale Humanoïde et Cognition » (RSHC) impliquant des chercheurs de trois laboratoires : le LAPSCO et le LIMOS à Clermont et le GIPSA-Lab à Grenoble,
La thèse sera rattachée à l’école doctorale EEATS de Grenoble. Des déplacements et trois séjours de courte-durée à Clermont-Ferrand seront à prévoir pour collaborer avec les chercheurs du LAPSCO et du LIMOS de l’UCA.
Le candidat devra être titulaire d’un diplôme d’ingénieur et/ou d’un master en Sciences Cognitives ou en Robotique d’interaction. Le poste nécessite de solides connaissances en expérimentation, programmation et analyse statistique. Une formation initiale en neurosciences est un plus. Le candidat doit avoir de bonnes aptitudes de communication orale et écrite (français et anglais nécessaires) pour présenter aux congrès et rédiger des articles dans des revues scientifiques. Nous recherchons un jeune chercheur qui saura s’impliquer dans son projet, curieux, ayant une certaine autonomie et une forte motivation pour développer des compétences en synthèse et évaluation de comportements dans le domaine de l’interaction homme-robot. De plus, le candidat devra être apte à travailler en équipe sur des projets pluridisciplinaires.
Les candidatures devront inclure un CV détaillé ; au moins deux références (personnes susceptibles d’être contactées) ; une lettre de motivation d’une page ; un résumé d’une page du mémoire de master ; les notes de Master 1 ou 2 ou d’école d’ingénieur.
La date limite pour l’envoi des candidatures est le 15/7/2019
Required skills: background in statistics, natural language processing and computer program skills (Perl, Python). Candidates should email a detailed CV with diploma
Keywords: hate speech, social media, natural language processing.
The rapid development of the Internet and social networks has brought great benefits to women and men in their daily lives. Unfortunately, the dark side of these benefits has led to an increase in hate speech and terrorism as the most common and powerful threats on a global scale. Hate speech is a type of offensive communication mechanism that expresses an ideology of hatred often using stereotypes. Hate speech can target different societal characteristics such as gender, religion, race, disability, etc. Hate speech is the subject of different national and international legal frameworks. Hate speech is a type of terrorism and often follows a terrorist incident or event.
Social networks are incredibly popular today. Nowadays, Twitter, LinkedIn, Facebook and YouTube are used as a standard tool for communicating ideas, beliefs and feelings. Only a small percentage of people use part of the network for unhealthy activities such as hate speech and terrorism. But the impact of this low percentage of users is extremely damaging. For years, social media companies such as Twitter, Facebook and YouTube have invested hundreds of millions of dollars each year in the task of detecting, classifying and moderating hate. But these efforts are mainly based on manually revising the content to identify and remove offensive content, which is extremely expensive.
This thesis aims at designing automatic and evolving methods for the classification of hate speech in the field of social media. Despite the studies already published on this subject, the results show that the task remains very difficult. We will use semantic content analysis methodologies from automatic language processing (NLP) and methodologies based on deep learning (DNN) which is the revolution in the field of artificial intelligence. During this thesis, we will develop a research protocol to classify hate speech in the text in terms of hateful, aggressive, insulting, ironic, neutral, etc. character. This type of problem is placed in the context of the multi-label classification.
In addition, the problem of obfuscation of words in hate messages will need to be addressed. People who want to write hate speech on the Internet know that they risk being censored by rudimentary automatic systems of moderation. So, users try to obscure their words by changing the spelling or the spelling of words.
Among the crucial points of this thesis are the choice of the DNN architecture and the relevant representation of the data, ie the text of the internet message. The system designed will be validated on real flows of social networks.
Skills
Strong background in mathematics, machine learning (DNN), statistics
Following profiles are welcome, either:
Strong experience with natural language processing
Excellent English writing and speaking skills are required in any case.
References :
T Gröndahl, L Pajola, M Juuti, M Conti, N Asokan (2018) ?All You Need is? Love?: Evading Hate-speech Detection, arXiv preprint arXiv:1808.09115
Wiegand, M., Klakow, D. (2008). Optimizing Language Models for Polarity Classification. In Proceedings of ECIR, pp. 612-616.
Wiegand, M., Ruppenhofer, J. (2015). Opinion Holder and Target Extraction based on the Induction of Verbal Categories. In Proceedings of CoNLL, pp. 215-225.
Wiegand, M., Ruppenhofer J., Schmidt A., C. Greenberg (2018) Inducing a Lexicon of Abusive Words ? A Feature-Based Approach. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Wiegand, M., Wolf, M., Ruppenhofer, J. (2017) Negation Modeling for German Polarity Classification. In Proceedings of GSCL.
Zhang Z., Luo L. (2018). Hate speech detection: a solved problem? The Challenging Case of Long Tail on Twitter. arxiv.org/pdf/1803.03662
(2019-06-07) PhD grant at ISIR and STMS, Paris, France
Modélisation multimodale de l’expressivité et de l’alignement pour l’interaction humain-machine
Directrice de thèse : Catherine Pelachaud (ISIR)
Co-encadrant : Nicolas Obin (STMS)
Contexte
Cette thèse s’inscrit dans un contexte particulièrement riche en développement d’interfaces de communication entre l’humain et la machine. Par exemple, l’émergence et la démocratisation des assistants personnels (smartphones, home assistants, chatbots) font de l’interaction avec la machine une réalité quotidienne pour de plus en plus d’individus. Cette pratique tend à s’amplifier et à se généraliser à un grand nombre d’usages et de pratique de l’être humain : depuis les agents d’accueil (aujourd’hui, quelques robots Pepper plus pour la démo que pour un usage réel), la consultation à distance, ou les agents embarqués dans les véhicules autonomes. L’expansion des usages appelle à une extension des modalités d’interaction et à l’amélioration de la qualité de l’interaction avec la machine : aujourd’hui, la voix constitue la modalité privilégiée de l’interaction, et les scénarios d’interaction demeurent très limités (demande d’information, question-réponse, pas de réelle interaction dans la durée). Les limitations principales sont d’une part une faible expressivité : le comportement de l’agent est encore souvent monomodal (voix, comme les assistants Alexa ou Google Home) et demeure très monotone, ce qui limite grandement l’acceptabilité, la durée et la qualité de l’interaction ; et d’autre part le comportement de l’agent est peu ou pas adapté à l’interlocuteur, ce qui diminue l’engagement de l’humain dans l’interaction. Lors d’une interaction humain-humain les phénomènes d’alignement (e.g., ton de voix, vitesse de mouvement corporel) sont des indices de compréhension commune et d’engagement dans l’interaction (Pickering et Garrod, 1999; Castellano et al, 2012). L’engagement est marqué par des comportements non-verbaux sociaux (nonverbal social behaviors) à des moments spécifiques de l’interaction : ce peut être des signaux de feedbacks (pour indiquer être en phase avec l’interactant), ou bien une forme d’imitation (par exemple : un sourire appelle un autre sourire, le ton de la voix reprend des éléments de celui de l’interactant), ou encore des signaux synchronisés avec ceux de l’interactant (la gestion des tours de parole). Cette thèse vise à modéliser le comportement de l’agent en fonction de celui de l’utilisateur pour qu’il puisse montrer son engagement attentionnel en vue de maintenir l’interaction et de rendre ses messages plus compréhensifs. L’adaptation du comportement de l’agent se produira à différents niveaux comportementaux (prosodique, lexicale, comportementale, imitation, tour de parole…). L’interaction humain-machine, avec un fort potentiel applicatif dans de nombreux domaines, est un exemple d'interdisciplinarité nécessaire entre humanités numériques, robotique, et intelligence artificielle.
Objectif
L’objectif de la thèse est de mieux comprendre et à modéliser les mécanismes qui régissent l’interaction multimodale (voix et geste) entre un humain et une machine, pour permettre de lever des verrous technologiques et permettre d’élaborer un agent conversationnel capable de s’adapter de manière naturelle et cohérente à un interactant humain.
1) Expressifs (Léon, 1993) : capable d’avoir une expression variée et cohérente pour maintenir l’attention de l’interlocuteur, souligner les points importants, améliorer la qualité de l’interaction et en allonger la durée (dépasser un ou deux tours de parole)
2) Alignés sur le comportement multimodal de l’interlocuteur (Pickering et Garrod, 1999; Castellano et al, 2012; Clavel et al, 2016) : c’est-à-dire capable d’adapter son comportement en fonction du comportement de l’interlocuteur, pour renforcer l’engagement de ce dernier dans l’interaction.
Dans un premier temps, la thèse proposera de réaliser une architecture neuronale unifiée pour la modélisation générative du comportement multimodale de l’agent. L’expressivité de l’agent, prosodique (Obin, 2011; Obin, 2015) et gestuelle (Pelachaud, 2009), sera modélisée par des architectures neuronales récurrentes aujourd'hui couramment utilisées pour la voix et le geste (Bahdanau et al, 2014, Wang, 2017, Robinson & Obin, 2019). La thèse se focalisera sur deux aspects essentiels de la modélisation du comportement de l’agent : le développement d’architectures structurées sur plusieurs échelles temporelles pour améliorer la modélisation de la variabilité prosodique et gestuelle à l’échelle de la phrase et à l’échelle du discours (Le Moine & Obin, 2019), et l’apprentissage d’un comportement multimodal cohérent par l’approfondissement de mécanismes d’attention multimodaux partagés appliqués à la synchronicité des profils prosodiques et gestuels générés (He, 2018).
Dans un deuxième temps, la thèse s’attaquera à l’alignement du comportement de l’agent avec celui de l’humain. La thèse approfondira particulièrement l’apprentissage interactif et par imitation pour adapter de manière cohérente le comportement multimodal de l’agent à l’humain (Weber, 2018; Mancini, 2019), à partir des bases de données de dialogues accessibles (telles que NoXi (récoltées à l’ISIR et annotées en terme d’engagement), IEMOCAP (USC, Carlos Busso), Gest-IS (Edinburgh University, Katya Saint-Amard)) pour apprendre la relation et aussi leur adaptation au cours de l'interaction entre les profils prosodiques et comportementaux des interlocuteurs.
La thèse sera co-encadrée par Catherine Pelachaud, de l’équipe PIRoS de l’ISIR, spécialisée en interaction humain-machine et agents conversationnels, et par Nicolas Obin, de l’équipe Analyse et Synthèse des Sons (AS) de STMS, spécialisée en modélisation générative de signaux de parole.. Le doctorant bénéficiera par ailleurs des connaissances, savoir-faire, et outils existants à STMS et à l’ISIR (par exemple : synthétiseur de parole ircamTTS développé à STMS, plateforme GRETA développée à l’ISIR) et de la logistique de calcul de STMS (serveur de calculs, GPU).
Bibliographie
(Bevacqua et al., 2012) Elisabetta Bevacqua, Etienne de Sevin, Sylwia Julia Hyniewska, Catherine Pelachaud, A listener model: Introducing personality traits, Journal on Multimodal User Interfaces, special issue Interacting ECAs, Elisabeth André, Marc Cavazza and Catherine Pelachaud (Guest Editors), July 2012, 6(1-2), pp 27-38.
(Castellano et al., 2012) G. Castellano, M. Mancini, C. Peters, P. W. McOwan. Expressive copying behavior for social agents: a perceptual analysis. IEEE Trans Syst, Man Cybern, Part A: Syst Hum 42(3), 2012.
(Clavel et al., 2016) Chloé Clavel, Angelo Cafaro, Sabrina Campano, and Catherine Pelachaud, Fostering user engagement in face-to-face human-agent interactions, in A. Esposito and L. Jain (Eds), Toward Robotic Socially Believable Behaving Systems - Volume I: Modeling Social Signals, Springer Series on Intelligent Systems Reference Library (ISRL), 2016
(Glas and Pelachaud, 2015) N. Glas, C. Pelachaud, Definitions of Engagement in Human-Agent Interaction, workshop ENHANCE, in International Conference on Affective Computing and Intelligent Interaction (ACII), 2015.
(Hall et al., 2005) L. Hall, S. Woods, R. Aylett, L. Newall, A. Paiva. Achieving empathic engagement through affective interaction with synthetic characters. Affective computing and intelligent interaction, 2005.
(He, 2018) Xiaodong He, Deep Attention Mechanism for Multimodal Intelligence: Perception, Reasoning, & Expression across Language & Vision, Microsoft Research, AI NEXTCon, 2018.
(Le Moine & Obin, 2019) Clément Lemoine, Modélisation neuronale de l’expressivité pour la transformation de la voix, stage de Master, 2019.
(Léon, 1993) P. Léon. Précis de phonostylistique : Parole et expressivité. Paris:Nathan, 1993.
(Obin, 2011) N. Obin. MeLos: Analysis and Modelling of Speech Prosody and Speaking Style, PhD. Thesis, Ircam-Upmc, 2011.
(Obin, 2015) N. Obin, C. Veaux, P. Lanchantin. Exploiting Alternatives for Text-To-Speech Synthesis: From Machine to Human, in Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis. Chapter 3: Control of Prosody in Speech Synthesis, p.189-202, Springer Verlag, February, 2015.
(Ochs et al., 2008) M. Ochs, C. Pelachaud, D. Sadek, An Empathic Virtual Dialog Agent to Improve Human-Machine Interaction, Seventh International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Estoril Portugal, May 2008.
(Paiva et al., 2017) A. Paiva, I. Leite, H. Boukricha, Hana I. Wachsmuth 'Empathy in Virtual Agents and Robots: A Survey.', ACM Trans. Interact. Intell. Syst. (2017), 7 (3):11:1-11:40.
(Pelachaud, 2009) C. Pelachaud, Studies on Gesture Expressivity for a Virtual Agent, Speech Communication, special issue in honor of Björn Granstrom and Rolf Carlson, 51 (2009) 630-639.
(Poggi, 2007) I. Poggi. Mind, hands, face and body: a goal and belief view of multimodal communication. Weidler, Berlin, 2007.
(Robinson & Obin, 2019) C. Robinson, N. Obin, A. Roebel. Sequence-to-sequence modelling of F0 for speech emotion conversion, in IEEE International Conference on Audio, Signal, and Speech Processing (ICASSP), 2019.
(Sadoughi et al., 2017) Najmeh Sadoughi, Yang Liu, and Carlos Busso, 'Meaningful head movements driven by emotional synthetic speech,' Speech Communication, vol. 95, pp. 87-99, December 2017.
(Sidner and Dzikovska, 2002) C. L. Sidner, M. Dzikovska. Human-robot interaction: engagement between humans and robots for hosting activities. In: IEEE int conf on multimodal interfaces, 2002.
(Wang, 2017) Xin Wang, Shinji Takaki, Junichi Yamagishi. An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis, Interspeech, 2017
(Wang, 2018) Yuxuan Wang, Daisy Stanton, Yu Zhang, RJ Skerry-Ryan, Eric Battenberg, Joel Shor, Ying Xiao, Fei Re, Ye Jia, Rif A. Saurous. « Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis », 2018.
(Weber, 2018) K. Weber, H. Ritschel, I. Aslan, F. Lingenfelser, E. André, How to Shape the Humor of a Robot - Social Behavior Adaptation Based on Reinforcement Learning, ACM International Conference on Multimodal Interaction, 2018.
(Mancini, 2019) M. Mancini, B. Biancardi, S. Dermouche, P. Lerner, C. Pelachaud, Managing Agent’s Impression Based on User’s Engagement Detection, Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents, 2019.
(2019-06-14) PhD position: Privacy preserving and personalized transformations for speech recognition, INRIA Nancy and Univ.Le Mans, France
Thesis title
Privacy preserving and personalized transformations for speech recognition
This PhD thesis fits within the scope of a collaborative project (funded by the French National Research Agency) involving several French teams, among which, the MULTISPEECH team of Inria Nancy - Grand-Est and the LIUM (Laboratoire d'Informatique de l'Université du Mans).
This PhD position is in collaboration between the Multispeech team of the LORIA laboratory (Nancy) and Le Mans University. The thesis will be co-supervised by Denis Jouvet (https://members.loria.fr/DJouvet/) and Anthony Larcher (https://lium.univlemans.fr/team/anthony-larcher/). The selected candidate is expected to spend time in both teams over the course of the PhD.
Scientific Context
Over the last decade, great progress has been made in automatic speech recognition [Saon et al., 2017; Xiong et al., 2017]. This is due to the maturity of machine learning techniques (e.g., advanced forms of deep learning), to the availability of very large datasets, and to the increase in computational power. Consequently, the use of speech recognition is now spreading in many applications, such as virtual assistants (as for instance Apple’s Siri, Google Now, Microsoft’s Cortana, or Amazon’s Alexa) which collect, process and store personal speech data in centralized servers, raising serious concerns regarding the privacy of the data of their users. Embedded speech recognition frameworks have recently been introduced to address privacy issues during the recognition phase: in this case, a (pretrained) speech recognition model is shipped to the user's device so that the processing can be done locally without the user sharing its data. However, speech recognition technology still has limited performance in adverse conditions (e.g., noisy environments, reverberated speech, strong accents, etc.) and thus, there is a need for performance improvement. This can only be achieved by using large speech corpora that are representative of the actual users and of the various usage conditions. There is therefore a strong need to share speech data for improved training that is beneficial to all users, while preserving the privacy of the users, which means at least keeping the speaker identity and voice characteristics private1.
1 Note that when sharing data, users may want not to share data conveying private information at the linguistic level (e.g., phone number, person name, …). Such privacy aspects also need to be taken into account, but they are out-of-the scope of this thesis.
Missions: (objectives, approach, etc.)
Within this context, the objective of the proposed thesis is twofold. First, it aims at finding a privacy preserving transform of the speech data, and, second, it will investigate the use of additional personalized transforms, that can be applied on the user’s terminal, to increase speech recognition performance.
In the proposed approach, the device of each user will not share its raw speech data, but a privacy preserving transformation of the user speech data. In such approach, some private computations will be handled locally, while some cross-user computations may be carried out on a server using the transformed speech data, which protect the speaker identity and some of his/her features (gender, sentiment, emotions...). More specifically, this will rely on a representation learning to separate the features of the user data that can expose private information from generic ones useful for the task of interest, i.e., here, the recognition of the linguistic content. We will build upon ideas of Generative Adversarial Networks (GANs) for proposing such a privacy preserving transform. Since a few years, GANs are getting more and more used in deep learning. They
typically rely on both a generative network and a discriminative network, where the generator aims to output samples that the discriminator cannot distinguish from the true samples [Goodfellow et al., 2014; Creswell et al., 2018]. They have also been used as autoencoders [Makhzani et al., 2015] which are made of three mains blocks: encoder, generator and discriminator. In our case, the discriminators shall focus on discriminating between speakers and/or between voice-related classes (defined according to gender, emotions, etc). The training objective will be to maximize the speech recognition performance (using the privacy preserving transformed signal) while minimizing the available speaker or voice-related information measured by the discriminator.
As devices are getting more and more personal, it creates opportunities to make speech recognition more personalized. This includes two aspects: adapting the model parameters to the speaker (and to the device) and introducing personalized transforms to help hiding the speaker voice identity. Both aspects will be investigated. Voice conversion approaches provide example of transforms aiming at modifying the voice of a speaker so that it sounds like the voice of another target speaker [e.g., Chen et al., 2014; Mohammadi & Kain, 2014]. Similar approaches can thus be applied to map speaker specific features to those of a standard (or average) speaker, which thus would help concealing its identity. To take benefit of the increased personal usage of terminals, speaker and environment specific adaptation will be investigated to improve speech recognition performance. Collaborative learning mixing speech and speaker recognition has been shown to benefit both tasks [Liu et al. 2018; Garimella et al. 2015] and provide a way to combine both information in a single framework. This approach will be compared to adaptation of deep neural networks-based models [e.g., Abdel-Hamid & Jiang, 2013] to handle best different amounts of adaptation data.
Skills and profile:
Master in machine learning or in computer science
Background in statistics, and in deep learning
Experience with deep learning tools is a plus
Good computer skills (preferably in Python)
Experience in speech and/or speaker recognition is a plus
Bibliography:
[Abdel-Hamid & Jiang, 2013] Abdel-Hamid, O., & Jiang, H. Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code. In ICASSP-2013, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7942-7946, 2013.
[Chen et al., 2014] Chen, L. H., Ling, Z. H., Liu, L. J., & Dai, L. R. Voice conversion using deep neural networks with layer-wise generative training. TASLP-2014, IEEE/ACM Transactions on Audio, Speech and Language Processing, 22(12), pp. 1859-1872, 2014.
[Creswell et al., 2018] Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., and Bharath, A. A. Generative adversarial networks: An overview. IEEE Signal Processing Magazine 35, 1, 53-65, 2018.
[Garimella et al. 2015] Garimella, S., Mandal, A., Strom, N., Hoffmeister, B., Matsoukas, S., & Parthasarathi, S. H. K., Robust i-vector based adaptation of DNN acoustic model for speech recognition. In INTERSPEECH, 2015.
[Goodfellow et al., 2014] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672-2680, 2014.
[Liu et al. 2018] Y. Liu, L. He, J. Liu, and M. Johnson, Speaker Embedding Extraction with Phonetic Information,' in INTERSPEECH , pp. 2247-2251, 2018
[Makhzani, 2015] Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
[Mohammadi & Kain, 2014] Mohammadi, S. H., & Kain, A. Voice conversion using deep neural networks with speaker-independent pre-training. In SLT-2014, Spoken Language Technology Workshop , pp. 19-23, 2014.
[Saon et al., 2017] G. Saon, G. Kurata, T. Sercu, K. Audhkhasi, S. Thomas, D. Dimitriadis, X. Cui, B. Ramabhadran, M. Picheny, L.-L. Lim, B. Roomi, and P. Hall. English conversational telephone speech recognition by humans and machines. Technical report, arXiv:1703.02136, 2017.
[Xiong et al., 2017] W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and G. Zweig. Achieving human parity in conversational speech recognition. Technical report, arXiv:1610.05256, 2017.
The candidates are required to provide the following documents in a single pdf or ZIP file:
CV
A cover/motivation letter describing their interest in the topic
Degree certificates and transcripts for Bachelor and Master (or the last 5 years)
Master thesis (or equivalent) if it is already completed, or a description of the work in progress, otherwise
The publications (or web links) of the candidate, if any (it is not expected that they have any)
In addition, one recommendation letter from the person who supervises(d) the Master thesis (or research project or internship) should be sent directly by his/her author to the prospective PhD advisor.
(2019-06-16) PhD position: Hybrid Bayesian and deep neural modeling for weakly supervised learning of sensory-motor speech representations, University of Grenoble-Alpes, France
Open fully-funded PhD position: “Hybrid Bayesian and deep neural modeling for weakly supervised
learning of sensory-motor speech representations”
The Deep-COSMO project, part of the new AI institute in Grenoble, is welcoming applications for a 3-year, fully funded
PhD scholarship starting October, 1st, 2019 at GIPSA-lab (Grenoble, France)
TOPIC: Representation learning, speech production and perception, Bayesian cognitive models, generative neural
(2019-06-20) Post-doc position, CNRS and Unv.Aix-Marseille, Aix-en-Provence, France
POST-DOC POSITION (18 months) - Forensic Voice Comparison (VoxCrim project): ability, limitations and specificities of listeners in speaker identification tasks Laboratoire Parole et Langage (CNRS and Aix-Marseille Université) ? Aix-en-Provence, France
CONTEXT The post-doc will be carried out within the framework of the VoxCrim project, funded by an ANR (Agence nationale de la recherche) grant (2017-2021, https://anr.fr/Project-ANR-17-CE39-0016). VoxCrim focuses on national security and legal/justice applications and aims to provide a validated scientific objective framework for all types of forensic voice comparison methods (automatic and phonetic). The goal is to develop certified standards to determine the specific areas in which voice comparison methods are applicable. The project includes two complementary subject areas: 1. the proposal of methodological standards to homogenize the expertise of voice comparison in a judicial environment, 2. the development of basic research in the fields of automatic speech processing and phonetics (speaker characteristics in the production and perception of speech). The post-doc will participate in the second subject area (speaker characteristics in the production and perception of speech). Two questions need to be answered: What are the abilities and limits of listeners in speaker identification tasks? Which cues do listeners use to identify speakers?
TASKS The main objective will be to conduct perception experiments aimed at assessing the ability of listeners in several speaker identification tasks. The post-doctoral fellow will: - design experimental protocols - create and manipulate acoustic stimuli - run experiments and collect data - process data and perform statistical analysis Finally, results will be presented at conferences and published in international journals.
WORK ENVIRONNEMENT The postdoctoral fellow will work at the Laboratoire Parole et Langage (http://www.lpl-aix.fr/), a laboratory whose research interests are extremely varied (including linguistics, phonetics, neuroscience, psycholinguistics, sociolinguistics, and computer science). He or she will benefit from this stimulating environment and interact with all the members of the laboratory (faculty members, other post-docs, engineers, doctoral students, etc.). He or she will have the opportunity to discover all the projects of the laboratory. EXPECTED PROFILE The postdoctoral fellow will have a PhD in the speech sciences and/or in psychoacoustics (auditory measurements, audio signal processing). A strong background in data processing and statistics is also be required. A good command of French and English will also be appreciated.
18 months. Beginning in autumn 2019 Monthly salary: ~?1900 net (depending on experience) Location : Laboratoire Parole et Langage (http://www.lpl-aix.fr/), Aix Marseille Université, CNRS UMR 7309, Aix-en-Provence, France For additional information: christine.meunier@univ-amu.fr
(2019-06-21) Ingénieur d'études, LIG, Univ.de Grenoble-Alpes, France
RECRUTEMENT D?UN INGÉNIEUR D?ÉTUDES EN TRAITEMENT AUTOMATIQUE DES LANGUES NATURELLES ET EN DÉVELOPPEMENT D?UNE INTERFACE IHM-WEB Début de contrat: Octobre 2019 Durée: 7 mois Salaire: 2000? brut/mois
Profil : - Titulaire d?un Master ou d?un doctorat en TAL - Une formation en sciences du langage sera appréciée. - Compétences opérationnelles en génie logiciel (gestion de version, tests, qualité de code) et Python
- Des compétences en C/C++ seraient un plus
- Une expérience en traitement automatique de la parole est requise ainsi qu?un bon niveau de français.
- Une experience en METEOR, firepad, Node.JS, mongoDB, firebase serait un plus.
Ce poste nécessite des capacités de travail en équipe et en autonomie. Une connaissance du contexte linguistique de la surdité serait un atout supplémentaire. *Description du projet et des missions* Dans le cadre du projet MANES (Médiation et Accessibilité Numérique pour les Étudiants Sourds) dirigé par François Portet (LIG), Isabelle Estève (LIDILEM) et Marion Fabre (ECP) dont une partie du financement est assurée par PULSALYS, IDEX Lyon-Saint Etienne, nous recrutons un ingénieur d?études pour un CDD de 7 mois. L?objectif général du projet est de développer un dispositif de sous-titrage en temps-réel pour rendre accessible le discours oral de l?enseignant aux étudiants sourds, de façon à favoriser l?appropriation individuelle des savoirs, par le biais de la prise de notes. La réalisation technologique et les capacités de traitement de l?écrit par les publics sourds seront les deux axes de ce projet.
La mission de l?ingénieur d?études consiste d?une part à développer, évaluer et améliorer des prototypes basés sur les dernières avancées scientifiques et à les fusionner pour réaliser un prototype de sous-titrage automatique en temps réel, à partir de la plateforme Kaldi. D?autre part, à concevoir une interface IHM pour la retranscription automatique en temps réel du discours de l?enseignant et la projection du sous-titrage en cours, en y intégrant le prototype mentionné ci-dessus. Le candidat aura en charge l?élaboration du prototype pour des expérimentations en salle de cours et les réajustements de l?interface liés à ces expérimentations.
Missions en Traitement automatique des Langues : - Prise de connaissance de l?état de l'art des systèmes de sous-titrage automatique - Test de transcription semi-automatique et vérification des extraits oraux de cours magistraux - Adaptation du système temps-réel de transcription automatique à réseaux de neurones (Kaldi) - Traitement en temps-réel des transcriptions pour le sous-titrage adapté aux publics sourds Les exigences fonctionnelles envisagées pour l?implémentation de la plateforme Kaldi sont : le repérage des mots-clés et des synonymes en temps réel (déjà existant dans KALDI) et le développement de nouvelles fonctionnalités : la segmentation et la simplification. Des perspectives d?adaptations pour le Off-Line seront aussi à envisager.
Missions en développement : - Réalisation d?une application web permettant la retranscription en temps réel du discours de l?enseignant et la projection de la retranscription obtenue (vidéo projecteur et possibilité d?extension pour une interface mobile) - Elaboration de l?interface étudiant : stockage, récupération de la trace écrite pour retravail et modification. - Elaboration de l?interface enseignant : paramétrisation des éléments clés du cours - Documentation : description et mode d?emploi de l?interface d?IHM.
*Environnement de travail* Le projet est porté par le laboratoire Education, Cultures et Politiques (ECP, EA 4571), Université Lumière Lyon 2 encollaboration avec le Laboratoire de Linguistique et Didactique des Langues Étrangères et Maternelles (LIDILEM), Université Grenoble-Alpes et le Laboratoire d'Informatique de Grenoble (LIG). Le poste sera accueilli physiquement au Laboratoire d'Informatique de Grenoble, UMR CNRS, au sein de l'équipe GETALP. L'équipe GETALP (https://lig-getalp.imag.fr) regroupe plus de 40 chercheurs, ingénieurs et étudiants dans le domaine du traitement automatisé des langues et de la parole multilingue.
(2019-06-22) Responsable de IA H/F. Manager de l’équipe R&D, Zaion, Paris, France
ZAION est une société innovante en pleine expansion spécialisée dans la technologie des robots conversationnels : callbot et chatbot intégrant de l’Intelligence Artificielle.
ZAION a développé une solution qui s’appuie sur une expérience de plus de 20 ans de la Relation Client. Cette solution en rupture technologique reçoit un accueil très favorable au niveau international et nous comptons déjà 18 clients actifs (GENERALI, MNH, APRIL, CROUS, EUROP ASSISTANCE, PRO BTP …).
Nous sommes actuellement parmi les seuls au monde à proposer une offre de ce type entièrement tournée vers la performance. Nous rejoindre, c’est prendre part à une aventure passionnante au sein d’une équipe ambitieuse afin de devenir la référence sur le marché des robots conversationnels.
Dans le cadre de son développement ZAION recherche son Responsable de IA H/F. Manager de l’équipe R&D, votre rôle est stratégique dans le développement et l’expansion de la société. Vous développerez, une solution qui permet de détecter les émotions dans les conversations. Nous souhaitons augmenter les fonctionnalités cognitives de nos callbots afin qu’ils puissent détecter les émotions de leurs interlocuteurs (joie, stress, colère, tristesse…) et donc adapter leurs réponses en conséquence.
Vos missions principales :
- Vous participez à la création du pôle R&D de ZAION et piloterez à votre arrivée votre premier projet de reconnaissance d’émotion dans la voix.
- Construisez, adaptez et faites évoluer nos services de détection d’émotion dans la voix
- Analysez de bases de données conséquentes de conversations pour en extraire les conversations émotionnellement pertinentes
- Construisez une base de données de conversations labelisées avec des étiquettes émotionnelles
- Formez et évaluez des modèles d'apprentissage automatique pour la classification d’émotion
- Déployez vos modèles en production
- Améliorez en continue le système de détection des émotions dans la voix
Qualifications requises et expérience antérieure :
-Vous avez une expérience de 5 ans minimum comme Data Scientist/Machine Learning appliqué à l’Audio et une appétence à l’encadrement
- Diplômé d’une école d’Ingénieur ou Master en informatique ou un doctorat en informatique mathématiques avec des compétences solides en traitements de signal (audio de préférence)
- Solide formation théorique en apprentissage machine et dans les domaines mathématiques pertinents (clustering, classification, factorisation matricielle, inférence bayésienne, deep learning...)
- La mise à disposition de modèles d'apprentissage machine dans un environnement de production serait un plus
- Vous maîtrisez un ou plusieurs des langages suivants : Python, Frameworks de machine Learning/Deep Learning (Pytorch, TensorFlow,Sci-kit learn, Keras) et Javascript
- Vous maîtrisez les techniques du traitement du signal audio
- Une expérience confirmée dans la labélisation de grande BDD (audio de préférence) est indispensable ;
- Votre personnalité : Leader, autonome, passionné par votre métier, vous savez animer une équipe en mode projet
- Vous parlez anglais couramment
Merci d’envoyer votre candidature à : alegentil@zaion.ai
(2019-06-21) Post doc at LIUM, Univ. du Mans, Le Mans, France
Post-doc position open ------------------------------------ The Speech and Language Technology Group in Le Mans University is looking for a post-doc scientist to develop autonomous systems
Keywords: Deep Learning, lifelong, autonomous systems, unsupervised learning, active-learning, interactive-learning
Context ------------------------------------ The LST team from LIUM (Le Mans University) is focusing on autonomous system?s behavior for the task of speaker diarization and machine translation. The ALLIES project (European Chist-ERA collaborative project) aims at developing evaluation protocols, metrics and scenarios for lifelong learning autonomous systems. The goal is to enable auto-adaptable systems that can also auto-evaluate in order to sustain their performance across time. Autonomous systems can rely on human domain experts via active and interactive learning processes to be define within the ALLIES project.
Missions ------------------------------------ Develop an autonomous system for speaker diarization by integrating lifelong learning, active and interactive learning components. The research work will be related to some of the following topics: - unsupervised adaptation - unsupervised evaluation - active learning (based on the unsupervised evaluation process, the autonomous system is free to require additional knowledge from the human domain expert) - Interactive learning (a human domain expert provides specific knowledge to the autonomous system. This information must be taken into account by the system) - Performance will be analyzed using protocols, metrics and scenarios developed for the ALLIES project.
Participation to the ALLIES benchmarking evaluation for speaker diarization. During the ALLIES project, LIUM is organizing two international evaluation campaigns (one for Speaker Diarization jointly organized with Albayzin and the second one for Machine Translation jointly with WMT) The benchmarking evaluation will serve to validate approaches developed during the post-doc
Dissemination The research will be published in the major conferences and journals
Supervisers: Anthony Larcher (anthony.larcher@univ-lemans.fr) and Loïc Barrault (loic.barrault@univ-lemans.fr)
Expected competences: - Phd in Machine Learning and Deep Learning - Experience in speech processing is positive - Python fluent - familiar with a deep learning toolkit (Pytorch, TensorFlow)
“As an intern at Speechmatics I have worked on projects that use real machine learning to deliver real value to people across the world. There are few places where the machine learning being used is at the bleeding edge of the field, but Speechmatics is one of them. The company has an amazing culture that allows you to grow as a programmer and as a person. If you want to be a part of a fast-growing machine learning company where you, personally, will make a difference then Speechmatics could well be the place for you!”
Sam Ringer, Machine Learning Engineer (previously R&D Intern), Speechmatics
Background
Speech technology is one of the most popular discussion items at the moment, yet speech interaction is limited to “Alexa, turn on the light”, or “Siri, where is the nearest coffee shop?” We are taking speech technology to the next level using our expertise in machine learning and speech-to-text technology to enable our customers to use conversational speech recognition. Our solutions power subtitling on TV, content discovery for videos, compliance solutions in banks, improve efficiency of meetings, and many other use-cases. Our mission is to improve human communication with a global speech engine, that works and put speech back at the heart of communication.
At Speechmatics you’ll be working with some of the smartest minds in the industry, working on cutting-edge projects and deploying the latest machine learning techniques to disrupt the market, providing customers with the best speech technology available, all whilst immersed in a progressive and great company culture. You can enjoy benefits including, share options, healthcare, life assurance, Bike Doctor, massages, regular BBQs, Brew Dogs in the fridge, no red tape, a top end laptop and much more. We’re building a company that truly strives to be world-leading and we’re looking for people who wholeheartedly believe they can be additive to our culture, bring new ideas to the table and get stuff done. If that’s you, carry on reading.
The Opportunity
The Speechmatics Engineering team develops and maintains speech-oriented products and services that will be used by businesses worldwide and is responsible for the complete product development cycle for these products. In this internship, you’ll help to support fundamental speech and language processing research to improve our performance and language coverage as well as helping to build products and features to delight our users.
Because you will be joining a rapidly expanding team, you will need to be a team player who thrives in a fast-paced environment, with a focus on investigating ideas and rapidly moving research developments into products. We strongly encourage versatility and knowledge transfer within and across teams. You will be expected to learn fast and feel emboldened to ask for support as you need it.
Prior experience of speech recognition is desirable, although Speechmatics has a team of speech recognition engineers who will collaborate and share any specialised knowledge required. If you are enthusiastic about speech recognition and machine learning in general, with the drive to deliver the best possible technology solutions, then we want to hear from you!
Our internships are not time constrained to specific dates – we can work out mutually agreeable start and end dates as part of the application process.
Key Responsibilities
Exploring and evaluating research ideas
Increasing and improving our language coverage
Prototyping new and improved features
Helping the company to take your R&D through to production
Communicating your work internally
Requirements
Essential
Team player
Enthusiasm for speech recognition and machine learning
Technical understanding of speech recognition or related discipline
Ability to rapidly deliver on ideas
Competent in Python and/or C/C++
Have or be studying towards a degree involving speech recognition, machine learning / computer science or related field
Desirable
Practical experience of ASR and ML packages such as Kaldi, HTK or TensorFlow
Commercial experience of speech recognition
Software development experience
Salary
Competitive salary (dependent on experience), flexible working and some awesome benefits & perks.
'As a Speech Recognition Engineer at Speechmatics, I work on solving a multitude of problems related to improving the accuracy and delivering new features for a global automatic speech recognition engine. As a member of the speech team, I work across every aspect of speech and implement the latest research in acoustic and language modelling. The team is supportive and also rich in terms of skills and backgrounds. Speechmatics offer progressive and rewarding opportunities in one of the best speech technology companies in the world.'
André Mansikkaniemi, Speech Recognition Engineer at Speechmatics
Background
Speech technology is one of the most popular discussion items at the moment, yet speech interaction is limited to “Alexa, turn on the light”, or “Siri, where is the nearest coffee shop?” We are taking speech technology to the next level using our expertise in machine learning and speech-to-text technology to enable our customers to use conversational speech recognition. Our solutions power subtitling on TV, content discovery for videos, compliance solutions in banks, improve efficiency of meetings, and many other use-cases. Our mission is to improve human communication with a global speech engine, that works and put speech back at the heart of communication.
At Speechmatics you’ll be working with some of the smartest minds in the industry, working on cutting-edge projects and deploying the latest machine learning techniques to disrupt the market, providing customers with the best speech technology available, all whilst immersed in a progressive and great company culture. You can enjoy benefits including, share options, healthcare, life assurance, Bike Doctor, massages, regular BBQs, Brew Dogs in the fridge, no red tape, a top end laptop and much more. We’re building a company that truly strives to be world-leading and we’re looking for people who wholeheartedly believe they can be additive to our culture, bring new ideas to the table and get stuff done. If that’s you, carry on reading.
The Opportunity
We are looking for a talented speech engineer to help us build the best speech technology for anybody, anywhere, in any language. You will be part of a team that is working on our core ASR capabilities to improve our speed and accuracy and develop novel features that we can support in all languages. Your work will feed into our ground-breaking framework to support the building of ASR models in every language pack published by the company. You will be responsible for keeping our system the most accurate and useful commercial speech recognition system available.
As you will be joining a small team, you will need to be a team player who thrives in a fast-paced environment, with a focus on rapidly moving research developments into products. Bringing skills into the team is as important as a can-do attitude. We strongly encourage versatility and knowledge transfer within the team, so that we can share efficiently what needs to be done to meet our commitments to the rest of the company.
Key Responsibilities
Research and development of improved speed and accuracy across our range of world leading ASR products and related features
Delivering the software that provides an easy-to-use feature rich ASR product for our customers
Enhancing our machine learning framework that robustly builds any language with the best possible performance
Taking data all the way from its raw form through to a finished model
Working within a team in an agile environment
Working closely with other technical teams and product team to deliver on the company’s technical vision
Requirements
Essential
Graduate degree in Statistics, Engineering, Mathematics, Computer Science
Knowledge of key natural language processing or related technologies, such as speech recognition, text-to-speech or natural language understanding
Experience working with standard speech and/or ML toolkits, e.g. Kaldi, KenLM, TensorFlow, etc.
Solid Python programming skills
Experience using Unix/Linux
Quick and enthusiastic learner
Excellent teamwork and communications skills
Analytical mind-set with a data-driven approach to making decisions and attention to detail
Desirable
Postgraduate degree in related discipline
Commercial work experience in ASR or a related field
Experience of working in an Agile framework
Expertise in modern speech recognition, including WFSTs, lattice processing, neural net (RNN / DNN / LSTM), acoustic and language models, decoding
Comprehensive knowledge of machine learning and statistical modelling
Experience in deep machine learning and related toolkits, e.g. Theano, Torch, etc.
Deep expertise in Python and/or C++ software development
Experience working effectively with software engineering teams or as a Software Engineer
Salary
Competitive salary (dependent on experience), flexible working and some awesome benefits & perks.
'As a Speech Recognition Engineer at Speechmatics, I work on solving a multitude of problems related to improving the accuracy and delivering new features for a global Automatic Speech Recognition engine. As a member of the speech team, I work across every aspect of speech and implement the latest research in acoustic and language modelling. The team is supportive and also rich in terms of skills and backgrounds. Speechmatics offer progressive and rewarding opportunities in one of the best speech technology companies in the world.'
André Mansikkaniemi, Speech Recognition Engineer, Speechmatics
Background
Speech technology is one of the most popular discussion items at the moment, yet speech interaction is limited to “Alexa, turn on the light”, or “Siri, where is the nearest coffee shop?” We are taking speech technology to the next level using our expertise in machine learning and speech-to-text technology to enable our customers to use conversational speech recognition. Our solutions power subtitling on TV, content discovery for videos, compliance solutions in banks, improve efficiency of meetings, and many other use-cases. Our mission is to improve human communication with a global speech engine, that works and put speech back at the heart of communication.
At Speechmatics you’ll be working with some of the smartest minds in the industry, working on cutting-edge projects and deploying the latest machine learning techniques to disrupt the market, providing customers with the best speech technology available, all whilst immersed in a progressive and great company culture. You can enjoy benefits including, share options, healthcare, life assurance, Bike Doctor, massages, regular BBQs, Brew Dogs in the fridge, no red tape, a top end laptop and much more. We’re building a company that truly strives to be world-leading and we’re looking for people who wholeheartedly believe they can be additive to our culture, bring new ideas to the table and get stuff done. If that’s you, carry on reading.
The Opportunity
We are looking for a talented speech engineer to help us build the best speech technology for anybody, anywhere, in any language. You will be part of a team that is working on our core ASR capabilities to improve our speed and accuracy and develop novel features that we can support in all languages. Your work will feed into our ground-breaking framework to support the building of ASR models in every language pack published by the company. You will be responsible for keeping our system the most accurate and useful commercial speech recognition system available.
As you will be joining a small team, you will need to be a team player who thrives in a fast-paced environment, with a focus on rapidly moving research developments into products. Bringing skills into the team is as important as a can-do attitude. We strongly encourage versatility and knowledge transfer within the team, so that we can share efficiently what needs to be done to meet our commitments to the rest of the company.
Key Responsibilities
Research and development of improved speed and accuracy across our range of world leading ASR products and related features
Delivering the software that provides an easy-to-use feature rich ASR product for our customers
Enhancing our machine learning framework that robustly builds any language with the best possible performance
Taking data all the way from its raw form through to a finished model
Working within a team in an agile environment
Working closely with other technical teams and product team to deliver on the company’s technical vision
Requirements
Essential
Commercial experience in ASR or a related field
Graduate degree in Statistics, Engineering, Mathematics, or Computer Science
Expertise in modern speech recognition, including WFSTs, lattice processing, neural net (RNN / DNN / LSTM), acoustic and language models, decoding
Experience working with standard speech and/or ML toolkits, e.g. Kaldi, KenLM, TensorFlow, etc.
Solid Python programming skills
Experience using Unix/Linux
Drive to help those around you learn and improve every day
Excellent teamwork and communications skills
Analytical mind-set with a data-driven approach to making decisions and attention to detail
Desirable
Postgraduate degree in related discipline
Experience of working in an Agile framework
Comprehensive knowledge of machine learning and statistical modelling
Experience in deep machine learning and related toolkits, e.g. Theano, Torch, etc.
Deep expertise in Python and/or C++ software development
Experience working effectively with software engineering teams or as a Software Engineer
Salary
Competitive salary (dependent on experience), flexible working and some awesome benefits & perks.
Live for the wow | Build authentic relationships | Be the adventure
Innovation is what we do. We build, we iterate, we develop the next thing that delivers that wow moment. We see value in building long-term, authentic relationships that last and are based on trust and honesty. With our customers, our colleagues, our leaders, our suppliers or within our local community. Our journey should be fun and exciting. We will celebrate our successes and learn from our mistakes together along the way. We embrace learning and change to grow naturally and organically as a company and individuals. We trust, we’re honest, kind and respectful.