ISCA - International Speech
Communication Association


ISCApad Archive  »  2019  »  ISCApad #258  »  Jobs

ISCApad #258

Tuesday, December 10, 2019 by Chris Wellekens

6 Jobs
6-1(2019-06-01) PhD position in NLP at LORIA, Nancy, France
Automatic classification using deep learning of hate speech posted on the Internet
 
Supervisors: Irina Illina, MdC, HDR, Dominique Fohr, CR CNRS
Team: Multispeech, LORIA-INRIA, France
Contact: illina@loria.fr, dominique.fohr@loria.fr
Duration of PhD Thesis : 3 years
Deadline to apply : June  30th, 2019
Required skills: background in statistics, natural language processing and computer program skills (Perl, Python). Candidates should email a detailed CV with diploma
Keywords: hate speech, social media, natural language processing.
 
The rapid development of the Internet and social networks has brought great benefits to women and men in their daily lives. Unfortunately, the dark side of these benefits has led to an increase in hate speech and terrorism as the most common and powerful threats on a global scale. Hate speech is a type of offensive communication mechanism that expresses an ideology of hatred often using stereotypes. Hate speech can target different societal characteristics such as gender, religion, race, disability, etc. Hate speech is the subject of different national and international legal frameworks. Hate speech is a type of terrorism and often follows a terrorist incident or event.
 
Social networks are incredibly popular today. Nowadays, Twitter, LinkedIn, Facebook and YouTube are used as a standard tool for communicating ideas, beliefs and feelings. Only a small percentage of people use part of the network for unhealthy activities such as hate speech and terrorism. But the impact of this low percentage of users is extremely damaging. For years, social media companies such as Twitter, Facebook and YouTube have invested hundreds of millions of dollars each year in the task of detecting, classifying and moderating hate. But these efforts are mainly based on manually revising the content to identify and remove offensive content, which is extremely expensive.
 
This thesis aims at designing automatic and evolving methods for the classification of hate speech in the field of social media. Despite the studies already published on this subject, the results show that the task remains very difficult. We will use semantic content analysis methodologies from automatic language processing (NLP) and methodologies based on deep learning (DNN) which is the revolution in the field of artificial intelligence. During this thesis, we will develop a research protocol to classify hate speech in the text in terms of hateful, aggressive, insulting, ironic, neutral, etc. character. This type of problem is placed in the context of the multi-label classification.
 
In addition, the problem of obfuscation of words in hate messages will need to be addressed. People who want to write hate speech on the Internet know that they risk being censored by rudimentary automatic systems of moderation. So, users try to obscure their words by changing the spelling or the spelling of words.
 
Among the crucial points of this thesis are the choice of the DNN architecture and the relevant representation of the data, ie the text of the internet message. The system designed will be validated on real flows of social networks.
 
Skills
Strong background in mathematics, machine learning (DNN), statistics
Following profiles are welcome, either:
Strong experience with natural language processing
 
Excellent English writing and speaking skills are required in any case.
 
References :
T Gröndahl, L Pajola, M Juuti, M Conti, N Asokan (2018) ?All You Need is? Love?: Evading Hate-speech Detection, arXiv preprint arXiv:1808.09115
Wiegand, M., Klakow, D. (2008). Optimizing Language Models for Polarity Classification. In Proceedings of ECIR, pp. 612-616.
Wiegand, M., Ruppenhofer, J. (2015). Opinion Holder and Target Extraction based on the Induction of Verbal Categories. In Proceedings of CoNLL, pp. 215-225.
Wiegand, M., Ruppenhofer J., Schmidt A.,  C. Greenberg (2018) Inducing a Lexicon of Abusive Words ? A Feature-Based Approach. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Wiegand, M., Wolf, M., Ruppenhofer, J. (2017) Negation Modeling for German Polarity Classification. In Proceedings of GSCL.
Zhang Z., Luo L. (2018). Hate speech detection: a solved problem? The Challenging Case of Long Tail on Twitter. arxiv.org/pdf/1803.03662
Back  Top

6-2(2019-06-07) PhD grant at ISIR and STMS, Paris, France

 

Modélisation multimodale de l’expressivité et de l’alignement pour l’interaction humain-machine

Directrice de thèse : Catherine Pelachaud (ISIR)

Co-encadrant : Nicolas Obin (STMS)

Contexte

Cette thèse s’inscrit dans un contexte particulièrement riche en développement d’interfaces de communication entre l’humain et la machine. Par exemple, l’émergence et la démocratisation des assistants personnels (smartphones, home assistants, chatbots) font de l’interaction avec la machine une réalité quotidienne pour de plus en plus d’individus. Cette pratique tend à s’amplifier et à se généraliser à un grand nombre d’usages et de pratique de l’être humain : depuis les agents d’accueil (aujourd’hui, quelques robots Pepper plus pour la démo que pour un usage réel), la consultation à distance, ou les agents embarqués dans les véhicules autonomes. L’expansion des usages appelle à une extension des modalités d’interaction et à l’amélioration de la qualité de l’interaction avec la machine : aujourd’hui, la voix constitue la modalité privilégiée de l’interaction, et les scénarios d’interaction demeurent très limités (demande d’information, question-réponse, pas de réelle interaction dans la durée). Les limitations principales sont d’une part une faible expressivité : le comportement de l’agent est encore souvent monomodal (voix, comme les assistants Alexa ou Google Home) et demeure très monotone, ce qui limite grandement l’acceptabilité, la durée et la qualité de l’interaction ; et d’autre part le comportement de l’agent est peu ou pas adapté à l’interlocuteur, ce qui diminue l’engagement de l’humain dans l’interaction. Lors d’une interaction humain-humain les phénomènes d’alignement (e.g., ton de voix, vitesse de mouvement corporel) sont des indices de compréhension commune et d’engagement dans l’interaction (Pickering et Garrod, 1999; Castellano et al, 2012). L’engagement est marqué par des comportements non-verbaux sociaux (nonverbal social behaviors) à des moments spécifiques de l’interaction : ce peut être des signaux de feedbacks (pour indiquer être en phase avec l’interactant), ou bien une forme d’imitation (par exemple : un sourire appelle un autre sourire, le ton de la voix reprend des éléments de celui de l’interactant), ou encore des signaux synchronisés avec ceux de l’interactant (la gestion des tours de parole). Cette thèse vise à modéliser le comportement de l’agent en fonction de celui de l’utilisateur pour qu’il puisse montrer son engagement attentionnel en vue de maintenir l’interaction et de rendre ses messages plus compréhensifs. L’adaptation du comportement de l’agent se produira à différents niveaux comportementaux (prosodique, lexicale, comportementale, imitation, tour de parole…). L’interaction humain-machine, avec un fort potentiel applicatif dans de nombreux domaines, est un exemple d'interdisciplinarité nécessaire entre humanités numériques, robotique, et intelligence artificielle.

Objectif

L’objectif de la thèse est de mieux comprendre et à modéliser les mécanismes qui régissent l’interaction multimodale (voix et geste) entre un humain et une machine, pour permettre de lever des verrous technologiques et permettre d’élaborer un agent conversationnel capable de s’adapter de manière naturelle et cohérente à un interactant humain.

1) Expressifs (Léon, 1993) : capable d’avoir une expression variée et cohérente pour maintenir l’attention de l’interlocuteur, souligner les points importants, améliorer la qualité de l’interaction et en allonger la durée (dépasser un ou deux tours de parole)

2) Alignés sur le comportement multimodal de l’interlocuteur (Pickering et Garrod, 1999; Castellano et al, 2012; Clavel et al, 2016) : c’est-à-dire capable d’adapter son comportement en fonction du comportement de l’interlocuteur, pour renforcer l’engagement de ce dernier dans l’interaction.

Dans un premier temps, la thèse proposera de réaliser une architecture neuronale unifiée pour la modélisation générative du comportement multimodale de l’agent. L’expressivité de l’agent, prosodique (Obin, 2011; Obin, 2015) et gestuelle (Pelachaud, 2009), sera modélisée par des architectures neuronales récurrentes aujourd'hui couramment utilisées pour la voix et le geste (Bahdanau et al, 2014, Wang, 2017, Robinson & Obin, 2019). La thèse se focalisera sur deux aspects essentiels de la modélisation du comportement de l’agent : le développement d’architectures structurées sur plusieurs échelles temporelles pour améliorer la modélisation de la variabilité prosodique et gestuelle à l’échelle de la phrase et à l’échelle du discours (Le Moine & Obin, 2019), et l’apprentissage d’un comportement multimodal cohérent par l’approfondissement de mécanismes d’attention multimodaux partagés appliqués à la synchronicité des profils prosodiques et gestuels générés (He, 2018).

Dans un deuxième temps, la thèse s’attaquera à l’alignement du comportement de l’agent avec celui de l’humain. La thèse approfondira particulièrement l’apprentissage interactif et par imitation pour adapter de manière cohérente le comportement multimodal de l’agent à l’humain (Weber, 2018; Mancini, 2019), à partir des bases de données de dialogues accessibles (telles que NoXi (récoltées à l’ISIR et annotées en terme d’engagement), IEMOCAP (USC, Carlos Busso), Gest-IS (Edinburgh University, Katya Saint-Amard)) pour apprendre la relation et aussi leur adaptation au cours de l'interaction entre les profils prosodiques et comportementaux des interlocuteurs.

La thèse sera co-encadrée par Catherine Pelachaud, de l’équipe PIRoS de l’ISIR, spécialisée en interaction humain-machine et agents conversationnels, et par Nicolas Obin, de l’équipe Analyse et Synthèse des Sons (AS) de STMS, spécialisée en modélisation générative de signaux de parole.. Le doctorant bénéficiera par ailleurs des connaissances, savoir-faire, et outils existants à STMS et à l’ISIR (par exemple : synthétiseur de parole ircamTTS développé à STMS, plateforme GRETA développée à l’ISIR) et de la logistique de calcul de STMS (serveur de calculs, GPU).

Bibliographie

(Bevacqua et al., 2012) Elisabetta Bevacqua, Etienne de Sevin, Sylwia Julia Hyniewska, Catherine Pelachaud, A listener model: Introducing personality traits, Journal on Multimodal User Interfaces, special issue Interacting ECAs, Elisabeth André, Marc Cavazza and Catherine Pelachaud (Guest Editors), July 2012, 6(1-2), pp 27-38.

(Castellano et al., 2012) G. Castellano, M. Mancini, C. Peters, P. W. McOwan. Expressive copying behavior for social agents: a perceptual analysis. IEEE Trans Syst, Man Cybern, Part A: Syst Hum 42(3), 2012.

(Clavel et al., 2016) Chloé Clavel, Angelo Cafaro, Sabrina Campano, and Catherine Pelachaud, Fostering user engagement in face-to-face human-agent interactions, in A. Esposito and L. Jain (Eds), Toward Robotic Socially Believable Behaving Systems - Volume I: Modeling Social Signals, Springer Series on Intelligent Systems Reference Library (ISRL), 2016

(Glas and Pelachaud, 2015) N. Glas, C. Pelachaud, Definitions of Engagement in Human-Agent Interaction, workshop ENHANCE, in International Conference on Affective Computing and Intelligent Interaction (ACII), 2015.

(Hall et al., 2005) L. Hall, S. Woods, R. Aylett, L. Newall, A. Paiva. Achieving empathic engagement through affective interaction with synthetic characters. Affective computing and intelligent interaction, 2005.

(He, 2018) Xiaodong He, Deep Attention Mechanism for Multimodal Intelligence: Perception, Reasoning, & Expression across Language & Vision, Microsoft Research, AI NEXTCon, 2018.

(Le Moine & Obin, 2019) Clément Lemoine, Modélisation neuronale de l’expressivité pour la transformation de la voix, stage de Master, 2019.

(Léon, 1993) P. Léon. Précis de phonostylistique : Parole et expressivité. Paris:Nathan, 1993.

(Obin, 2011) N. Obin. MeLos: Analysis and Modelling of Speech Prosody and Speaking Style, PhD. Thesis, Ircam-Upmc, 2011.

(Obin, 2015) N. Obin, C. Veaux, P. Lanchantin. Exploiting Alternatives for Text-To-Speech Synthesis: From Machine to Human, in Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis. Chapter 3: Control of Prosody in Speech Synthesis, p.189-202, Springer Verlag, February, 2015.

(Ochs et al., 2008) M. Ochs, C. Pelachaud, D. Sadek, An Empathic Virtual Dialog Agent to Improve Human-Machine Interaction, Seventh International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Estoril Portugal, May 2008.

(Paiva et al., 2017) A. Paiva, I. Leite, H. Boukricha, Hana I. Wachsmuth 'Empathy in Virtual Agents and Robots: A Survey.', ACM Trans. Interact. Intell. Syst. (2017), 7 (3):11:1-11:40.

(Pelachaud, 2009) C. Pelachaud, Studies on Gesture Expressivity for a Virtual Agent, Speech Communication, special issue in honor of Björn Granstrom and Rolf Carlson, 51 (2009) 630-639.

(Poggi, 2007) I. Poggi. Mind, hands, face and body: a goal and belief view of multimodal communication. Weidler, Berlin, 2007.

(Robinson & Obin, 2019) C. Robinson, N. Obin, A. Roebel. Sequence-to-sequence modelling of F0 for speech emotion conversion, in IEEE International Conference on Audio, Signal, and Speech Processing (ICASSP), 2019.

(Sadoughi et al., 2017) Najmeh Sadoughi, Yang Liu, and Carlos Busso, 'Meaningful head movements driven by emotional synthetic speech,' Speech Communication, vol. 95, pp. 87-99, December 2017.

(Sidner and Dzikovska, 2002) C. L. Sidner, M. Dzikovska. Human-robot interaction: engagement between humans and robots for hosting activities. In: IEEE int conf on multimodal interfaces, 2002.

(Wang, 2017) Xin Wang, Shinji Takaki, Junichi Yamagishi. An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis, Interspeech, 2017

(Wang, 2018) Yuxuan Wang, Daisy Stanton, Yu Zhang, RJ Skerry-Ryan, Eric Battenberg, Joel Shor, Ying Xiao, Fei Re, Ye Jia, Rif A. Saurous. « Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis », 2018.

(Weber, 2018) K. Weber, H. Ritschel, I. Aslan, F. Lingenfelser, E. André, How to Shape the Humor of a Robot - Social Behavior Adaptation Based on Reinforcement Learning, ACM International Conference on Multimodal Interaction, 2018.

(Mancini, 2019) M. Mancini, B. Biancardi, S. Dermouche, P. Lerner, C. Pelachaud, Managing Agent’s Impression Based on User’s Engagement Detection, Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents, 2019.

Back  Top

6-3(2019-06-14) PhD position: Privacy preserving and personalized transformations for speech recognition, INRIA Nancy and Univ.Le Mans, France

 

Thesis title

Privacy preserving and personalized transformations for speech recognition

This PhD thesis fits within the scope of a collaborative project (funded by the French National Research Agency) involving several French teams, among which, the MULTISPEECH team of Inria Nancy - Grand-Est and the LIUM (Laboratoire d'Informatique de l'Université du Mans).

This PhD position is in collaboration between the Multispeech team of the LORIA laboratory (Nancy) and Le Mans University. The thesis will be co-supervised by Denis Jouvet (https://members.loria.fr/DJouvet/) and Anthony Larcher (https://lium.univlemans.fr/team/anthony-larcher/). The selected candidate is expected to spend time in both teams over the course of the PhD.

Scientific Context

Over the last decade, great progress has been made in automatic speech recognition [Saon et al., 2017; Xiong et al., 2017]. This is due to the maturity of machine learning techniques (e.g., advanced forms of deep learning), to the availability of very large datasets, and to the increase in computational power. Consequently, the use of speech recognition is now spreading in many applications, such as virtual assistants (as for instance Apple’s Siri, Google Now, Microsoft’s Cortana, or Amazon’s Alexa) which collect, process and store personal speech data in centralized servers, raising serious concerns regarding the privacy of the data of their users. Embedded speech recognition frameworks have recently been introduced to address privacy issues during the recognition phase: in this case, a (pretrained) speech recognition model is shipped to the user's device so that the processing can be done locally without the user sharing its data. However, speech recognition technology still has limited performance in adverse conditions (e.g., noisy environments, reverberated speech, strong accents, etc.) and thus, there is a need for performance improvement. This can only be achieved by using large speech corpora that are representative of the actual users and of the various usage conditions. There is therefore a strong need to share speech data for improved training that is beneficial to all users, while preserving the privacy of the users, which means at least keeping the speaker identity and voice characteristics private1.

1 Note that when sharing data, users may want not to share data conveying private information at the linguistic level (e.g., phone number, person name, …). Such privacy aspects also need to be taken into account, but they are out-of-the scope of this thesis.

Missions: (objectives, approach, etc.)

Within this context, the objective of the proposed thesis is twofold. First, it aims at finding a privacy preserving transform of the speech data, and, second, it will investigate the use of additional personalized transforms, that can be applied on the user’s terminal, to increase speech recognition performance.

In the proposed approach, the device of each user will not share its raw speech data, but a privacy preserving transformation of the user speech data. In such approach, some private computations will be handled locally, while some cross-user computations may be carried out on a server using the transformed speech data, which protect the speaker identity and some of his/her features (gender, sentiment, emotions...). More specifically, this will rely on a representation learning to separate the features of the user data that can expose private information from generic ones useful for the task of interest, i.e., here, the recognition of the linguistic content. We will build upon ideas of Generative Adversarial Networks (GANs) for proposing such a privacy preserving transform. Since a few years, GANs are getting more and more used in deep learning. They

typically rely on both a generative network and a discriminative network, where the generator aims to output samples that the discriminator cannot distinguish from the true samples [Goodfellow et al., 2014; Creswell et al., 2018]. They have also been used as autoencoders [Makhzani et al., 2015] which are made of three mains blocks: encoder, generator and discriminator. In our case, the discriminators shall focus on discriminating between speakers and/or between voice-related classes (defined according to gender, emotions, etc). The training objective will be to maximize the speech recognition performance (using the privacy preserving transformed signal) while minimizing the available speaker or voice-related information measured by the discriminator.

As devices are getting more and more personal, it creates opportunities to make speech recognition more personalized. This includes two aspects: adapting the model parameters to the speaker (and to the device) and introducing personalized transforms to help hiding the speaker voice identity. Both aspects will be investigated. Voice conversion approaches provide example of transforms aiming at modifying the voice of a speaker so that it sounds like the voice of another target speaker [e.g., Chen et al., 2014; Mohammadi & Kain, 2014]. Similar approaches can thus be applied to map speaker specific features to those of a standard (or average) speaker, which thus would help concealing its identity. To take benefit of the increased personal usage of terminals, speaker and environment specific adaptation will be investigated to improve speech recognition performance. Collaborative learning mixing speech and speaker recognition has been shown to benefit both tasks [Liu et al. 2018; Garimella et al. 2015] and provide a way to combine both information in a single framework. This approach will be compared to adaptation of deep neural networks-based models [e.g., Abdel-Hamid & Jiang, 2013] to handle best different amounts of adaptation data.

Skills and profile:

Master in machine learning or in computer science

Background in statistics, and in deep learning

Experience with deep learning tools is a plus

Good computer skills (preferably in Python)

Experience in speech and/or speaker recognition is a plus

Bibliography:

[Abdel-Hamid & Jiang, 2013] Abdel-Hamid, O., & Jiang, H. Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code. In ICASSP-2013, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7942-7946, 2013.

[Chen et al., 2014] Chen, L. H., Ling, Z. H., Liu, L. J., & Dai, L. R. Voice conversion using deep neural networks with layer-wise generative training. TASLP-2014, IEEE/ACM Transactions on Audio, Speech and Language Processing, 22(12), pp. 1859-1872, 2014.

[Creswell et al., 2018] Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., and Bharath, A. A. Generative adversarial networks: An overview. IEEE Signal Processing Magazine 35, 1, 53-65, 2018.

[Garimella et al. 2015] Garimella, S., Mandal, A., Strom, N., Hoffmeister, B., Matsoukas, S., & Parthasarathi, S. H. K., Robust i-vector based adaptation of DNN acoustic model for speech recognition. In INTERSPEECH, 2015.

[Goodfellow et al., 2014] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672-2680, 2014.

[Liu et al. 2018] Y. Liu, L. He, J. Liu, and M. Johnson, Speaker Embedding Extraction with Phonetic Information,' in INTERSPEECH , pp. 2247-2251, 2018

[Makhzani, 2015] Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.

[Mohammadi & Kain, 2014] Mohammadi, S. H., & Kain, A. Voice conversion using deep neural networks with speaker-independent pre-training. In SLT-2014, Spoken Language Technology Workshop , pp. 19-23, 2014.

[Saon et al., 2017] G. Saon, G. Kurata, T. Sercu, K. Audhkhasi, S. Thomas, D. Dimitriadis, X. Cui, B. Ramabhadran, M. Picheny, L.-L. Lim, B. Roomi, and P. Hall. English conversational telephone speech recognition by humans and machines. Technical report, arXiv:1703.02136, 2017.

[Xiong et al., 2017] W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and G. Zweig. Achieving human parity in conversational speech recognition. Technical report, arXiv:1610.05256, 2017.

Additional information:

Supervision and contact:

Denis Jouvet (denis.jouvet@loria.fr; https://members.loria.fr/DJouvet/)

Anthony Larcher (anthony.larcher@univ-lemans.fr; https://lium.univlemans.fr/team/anthony-larcher/)

Additional link:

Ecole Doctorale IAEM Lorraine (http://iaem.univ-lorraine.fr/)

Duration: 3 years

Starting date: autumn 2019

The candidates are required to provide the following documents in a single pdf or ZIP file:

 CV

 A cover/motivation letter describing their interest in the topic

 Degree certificates and transcripts for Bachelor and Master (or the last 5 years)

Master thesis (or equivalent) if it is already completed, or a description of the work in progress, otherwise

 The publications (or web links) of the candidate, if any (it is not expected that they have any)

In addition, one recommendation letter from the person who supervises(d) the Master thesis (or research project or internship) should be sent directly by his/her author to the prospective PhD advisor.

Back  Top

6-4(2019-06-16) PhD position: Hybrid Bayesian and deep neural modeling for weakly supervised learning of sensory-motor speech representations, University of Grenoble-Alpes, France

Open fully-funded PhD position: “Hybrid Bayesian and deep neural modeling for weakly supervised

learning of sensory-motor speech representations”

The Deep-COSMO project, part of the new AI institute in Grenoble, is welcoming applications for a 3-year, fully funded

PhD scholarship starting October, 1st, 2019 at GIPSA-lab (Grenoble, France)

TOPIC: Representation learning, speech production and perception, Bayesian cognitive models, generative neural

networks

RESEARCH FIELD: Computer Science, Cognitive Science, Machine Learning, Artificial Intelligence, Speech Processing

SUPERVISION: J. Diard (LPNC); T. Hueber, L. Girin, J.-L. Schwartz (GIPSA-Lab)

IDEX PROJECT TITLE: Multidisciplinary Institute for Artificial Intelligence – Speech chair (P. Perrier)

SCIENTIFIC DEPARTMENT (LABORATORY’S NAME): GIPSA-lab

DOCTORAL SCHOOL: MSTII (maths and computer science) or EEATS (signal processing) or EDISCE (cognitive science),

depending on the candidate’s profile and career plan

TYPE of CONTRACT: 3-year doctoral contract

JOB STATUS: Full time

HOURS PER WEEK: 35

SALARY: between 1770 € and 2100 € gross per month (depending on complementary activity or not)

OFFER STARTING DATE: October, 1st, 2019

SUBJECT DESCRIPTION:

General objective

How can a child learn to speak from hearing sounds, without any motor instruction provided by his/her environment?

The general objective of this PhD project is to develop a computational agent, able to learn speech representations

from raw speech data in a weakly supervised configuration. This agent will involve an articulatory model of the human

vocal tract, an articulatory-to-acoustic synthesis system, and a learning architecture combining deep learning

algorithms and developmental principles inspired from cognitive sciences. This PhD will be part of the “Speech

communication” chair in the Multidisciplinary Institute for Artificial Intelligence in Grenoble (MIAI).

Method

This work will capitalize on two bricks of research recently developed in Grenoble. First, a Bayesian computational

model of speech communication called COSMO (Communicating about Objects using SensoriMotor Operations)

(Moulin-Frier et al., 2012, 2015; Laurent et al., 2017; Barnaud et al., 2019) was jointly developed by GIPSA and LPNC.

This model associates speech production and speech perception models in a single architecture. The random variables

in COSMO represent the signals and the sensori-motor processes involved in the speech production/perception loop.

COSMO learns their probability distributions from speech examples provided by the environment, and is then able to

perceive and produce speech sounds associated to speech categories. So far, COSMO was mostly tested on synthetic

data. One of the main challenges is now to confront COSMO to real-world data.

Second, we will also capitalize on a set of computational models for automatic processing and learning of sensorymotor

distributions in speech developed at GIPSA. This comprises a set of transfer-learning algorithms (Hueber et

al., 2015, Girin et al. 2017) aiming at adapting acoustic-articulatory knowledge on one speaker, towards another

speaker, using a limited amount of data, possibly incomplete and noisy; together with a set of deep neural networks

able to process raw articulatory data (Hueber et al., 2016; Tatulli & Hueber, 2017).

The first step will consist in designing, implementing and testing a “deep” version of COSMO, in which some of the

probability distributions are implemented by generative neural models (e.g. VAE, GAN). This choice is motivated by

the ability of such techniques to deal with raw, noisy and complex data, as well as their flexibility in terms of transfer

learning. The second stage will consist in reformulating entirely the speech communication agent in an end-to-end

neural architecture.

Outputs

The system will be tested in terms of both efficiency of the learning process – hence ability to generate realistic speech

sequences after convergence – and coherence of the motor strategies discovered by the computational agent, in spite

of the fact that no motor data will be provided for learning. The outputs are both (1) theoretical – for better

understanding the cognitive processes at hand in speech development and speech communication; (2) technical – for

integrating knowledge about speech production and cognitive processes in a machine learning architecture; and (3)

technological – for proposing a new generation of autonomous speech technologies.

ELIGIBILITY CRITERIA

Applicants must have:

- A Master's degree (or be about to earn one) or have a university degree equivalent to a European Master's (5-year

duration), in Computer Science, Cognitive Science, Signal Processing or Applied Mathematics.

- Solid skills in Machine Learning or probabilistic modeling + General knowledge in natural language processing

and/or speech processing (an affinity for cognitive sciences and speech sciences is welcome).

- Very good programming skills (mostly in Python).

- Good oral and written communication in English.

- Ability to work autonomously and in collaboration with supervisors and other team members.

SELECTION PROCESS

Applicants will have to send their CV + an application letter in English + copy of their last diploma to:

Jean-Luc.Schwartzr@gipsa-lab.fr, Thomas.Hueber@gipsa-lab.fr;

Letters of recommendation are welcome. Contact before preparing a complete application are welcome too.

Applications will be evaluated as they are received: the position is open until it is filled, with deadline on July 10th, 2019

 

Back  Top

6-5(2019-06-16) PhD thesis proposal Incremental sequence-to-sequence mapping for speech generation using deep neural networks, GIPSALab, Grenoble, France

PhD thesis proposal

Incremental sequence-to-sequence mapping for

speech generation using deep neural networks

June 17, 2019

1 Context and objectives

In recent years, deep neural networks have been widely used to address sequence-

to-sequence (S2S) learning. S2S models can solve many tasks where source

and target sequences have di
erent lengths such as: automatic speech recog-

nition, machine translation, speech translation, text-to-speech synthesis, etc.

Recurrent, convolutional and transformer architectures, coupled with attention

models, have shown their ability to capture and model complex temporal de-

pendencies between a source and a target sequence of multidimensional discrete

and/or continuous data. Importantly, end-to-end training alleviates the need

to previously extract handcrafted features from the data by learning hierarchi-

cal representations directly from raw data (e.g. character string, video, speech

waveform, etc.).

The most common models are composed of an encoder that reads the full in-

put sequence (i.e. from its beginning to its end) before the decoder produces the

corresponding output sequence. This implies a latency equals to the length of

the input sequence. In particular, for a text-to-speech (TTS) system, the speech

waveform is usually synthesized from a complete text utterance (e.g. a sequence

of words with explicit begin/end-of-utterance markers). Such approach cannot

be used in a truly interactive scenario, in particular by a speech-handicapped

person to communicate orally'. Indeed, the interlocutor has to wait for the

complete utterance to be typed before being able to listen to the synthetic voice,

hence limiting the dynamics and naturalness of the interaction.

The goal of this project is to develop a general methodology for incremental

sequence-to-sequence mapping, with application to interactive speech technolo-

gies. It will require the development of end-to-end classi cation and regression

neural models able to deliver chunks of output data on-the-y, from only a par-

tial observation of input data. The goal is to learn an ecient policy that leads

to an optimal trade-o
between (variable) latency and accuracy of the decoding

process. Possible strategies to decode the output data as soon as possible in-

clude: (i) Predicting online he future' of the output sequence from he past

1

and present' of the input sequence, with an acceptable tolerance to possible er-

rors, or (2) learn automatically from the data an optimal waiting policy' that

prevents the model to output data when the uncertainty is too high. The devel-

oped methodology will be applied to address two speech processing problems:

(i) Incremental Text-to-Speech synthesis in which speech is synthesized while

the user is typing the text (possibly with a variable latency), and (ii) Incremen-

tal speech enhancement/inpainting in which portions of the speech signal are

unintelligible because of sudden noise or speech production disorders, and must

be replaced on-the-y with reconstructed portions.

2 Work plan

The proposed working plan is the following :

 Bibliographic work on S2S neural models, in the context of speech recogni-

tion, speech synthesis, and machine translation as well as their incremental

(low-latency) variations

 Investigating new architectures, losses, and training strategies toward in-

cremental S2S models.

 Implementing and evaluating the proposed techniques in the context of

end-to-end neural TTS systems (the baseline system may be a neural

TTS trained with past information/left-context only).

 Implementing and evaluating the proposed techniques in the context of

speech enhancement/inpainting, rst on simulated noisy speech and then

on pathological speech.

3 Requirements

We are looking for an outstanding and highly motivated PhD candidate to work

on this subject. Following requirements are mandatory:

 Engineering degree and/or a Master's degree in Computer Science, Signal

Processing or Applied Mathematics.

 Solid skills in Machine Learning. General knowledge in natural language

processing and/or speech processing.

 Excellent programming skills (mostly in Python and deep learning frame-

works).

 Good oral and written communication in English.

 Ability to work autonomously and in collaboration with supervisors and

other team members.

2

4 Work context

Grenoble Alpes Univ. o
ers an excellent research environment with ample com-

puting facilities, as well as remarkable surroundings to explore over the week-

ends. The PhD project will be funded by the Grenoble Arti cial Intelligence

Institute (MIAI). The PhD candidate will work both at GIPSA-lab (CRISSP

team) and LIG-lab (GETALP team). The duration of the PhD is 3 years. The

salary is between 1770 and 2100 euros gross per month (depending on comple-

mentary activity or not).

5 How to apply?

Applications should include a detailed CV; a copy of their last diploma; at least

two references (people likely to be contacted); a cover letter of one page; a one-

page summary of the Master thesis; the two last transcripts of notes (Master or

engineering school). Applications should be sent to thomas.hueber@gipsa-lab.fr,

laurent.girin@gipsa-lab.fr and laurent.besacier@imag.fr. Applications will be

evaluated as they are received: the position is open until it is lled, with deadline

on July 10th, 2019.

Back  Top

6-6(2019-06-20) Post-doc position, CNRS and Unv.Aix-Marseille, Aix-en-Provence, France


 POST-DOC POSITION (18 months) - Forensic Voice Comparison (VoxCrim project): ability, limitations and specificities of listeners in speaker identification tasks
Laboratoire Parole et Langage (CNRS and Aix-Marseille Université) ? Aix-en-Provence, France


CONTEXT
The post-doc will be carried out within the framework of the VoxCrim project, funded by an ANR (Agence nationale de la recherche) grant (2017-2021, https://anr.fr/Project-ANR-17-CE39-0016). VoxCrim focuses on national security and legal/justice applications and aims to provide a validated scientific objective framework for all types of forensic voice comparison methods (automatic and phonetic). The goal is to develop certified standards to determine the specific areas in which voice comparison methods are applicable. The project includes two complementary subject areas: 1. the proposal of methodological standards to homogenize the expertise of voice comparison in a judicial environment, 2. the development of basic research in the fields of automatic speech processing and phonetics (speaker characteristics in the production and perception of speech).
The post-doc will participate in the second subject area (speaker characteristics in the production and perception of speech). Two questions need to be answered: What are the abilities and limits of listeners in speaker identification tasks? Which cues do listeners use to identify speakers?

TASKS
The main objective will be to conduct perception experiments aimed at assessing the ability of listeners in several speaker identification tasks.
The post-doctoral fellow will:
-    design experimental protocols
-    create and manipulate acoustic stimuli
-    run experiments and collect data
-    process data and perform statistical analysis
Finally, results will be presented at conferences and published in international journals.

WORK ENVIRONNEMENT
The postdoctoral fellow will work at the Laboratoire Parole et Langage (http://www.lpl-aix.fr/), a laboratory whose research interests are extremely varied (including linguistics, phonetics, neuroscience, psycholinguistics, sociolinguistics, and computer science). He or she will benefit from this stimulating environment and interact with all the members of the laboratory (faculty members, other post-docs, engineers, doctoral students, etc.). He or she will have the opportunity to discover all the projects of the laboratory.

EXPECTED PROFILE

The postdoctoral fellow will have a PhD in the speech sciences and/or in psychoacoustics (auditory measurements, audio signal processing). A strong background in data processing and statistics is also be required. A good command of French and English will also be appreciated.

18 months. Beginning in autumn 2019
Monthly salary: ~?1900  net (depending on experience)
Location : Laboratoire Parole et Langage (http://www.lpl-aix.fr/), Aix Marseille Université, CNRS UMR 7309, Aix-en-Provence, France
For additional information: christine.meunier@univ-amu.fr

-Application: CV and cover letter
-Send application as soon as possible to christine.meunier@univ-amu.fr


Supervisors: Dr. Christine Meunier, CNRS Researcher and Dr. Alain Ghio, Research Engineer - Laboratoire Parole et Langage, Aix-en-Provence, France.

Back  Top

6-7(2019-06-21) Ingénieur d'études, LIG, Univ.de Grenoble-Alpes, France

RECRUTEMENT D?UN INGÉNIEUR D?ÉTUDES EN TRAITEMENT AUTOMATIQUE DES LANGUES NATURELLES ET EN DÉVELOPPEMENT D?UNE INTERFACE IHM-WEB

Début de contrat: Octobre 2019
Durée: 7 mois

Salaire: 2000? brut/mois


Profil :
- Titulaire d?un Master ou d?un doctorat en TAL
- Une formation en sciences du langage sera appréciée.
- Compétences opérationnelles en génie logiciel (gestion de version, tests, qualité de code) et Python

- Des compétences en C/C++ seraient un plus

- Une expérience en traitement automatique de la parole est requise ainsi qu?un bon niveau de français.

- Une experience en METEOR, firepad, Node.JS, mongoDB, firebase serait un plus.

Ce poste nécessite des capacités de travail en équipe et en autonomie.

Une connaissance du contexte linguistique de la surdité serait un atout supplémentaire.

*Description du projet et des missions*
Dans le cadre du projet MANES (Médiation et Accessibilité Numérique pour les Étudiants Sourds) dirigé par François Portet (LIG), Isabelle Estève (LIDILEM) et Marion Fabre (ECP) dont une partie du financement est assurée par PULSALYS, IDEX Lyon-Saint Etienne, nous recrutons un ingénieur d?études pour un CDD de 7 mois.

L?objectif général du projet est de développer un dispositif de sous-titrage en temps-réel pour rendre accessible le discours oral de l?enseignant aux étudiants sourds, de façon à favoriser l?appropriation individuelle des savoirs, par le biais de la prise de notes. La réalisation technologique et les capacités de traitement de l?écrit par les publics sourds seront les deux axes de ce projet.

La mission de l?ingénieur d?études consiste d?une part à développer, évaluer et améliorer des prototypes basés sur les dernières avancées scientifiques et à les fusionner pour réaliser un prototype de sous-titrage automatique en temps réel, à partir de la plateforme Kaldi.

D?autre part, à concevoir une interface IHM pour la retranscription automatique en temps réel du discours de l?enseignant et la projection du sous-titrage en cours, en y intégrant le prototype mentionné ci-dessus.

Le candidat aura en charge l?élaboration du prototype pour des expérimentations en salle de cours et les réajustements de l?interface liés à ces expérimentations.

Missions en Traitement automatique des Langues :
- Prise de connaissance de l?état de l'art des systèmes de sous-titrage automatique
- Test de transcription semi-automatique et vérification des extraits oraux de cours magistraux
- Adaptation du système temps-réel de transcription automatique à réseaux de neurones (Kaldi)
- Traitement en temps-réel des transcriptions pour le sous-titrage adapté aux publics sourds

Les exigences fonctionnelles envisagées pour l?implémentation de la plateforme Kaldi sont : le repérage des mots-clés et des synonymes en temps réel (déjà existant dans KALDI) et le développement de nouvelles fonctionnalités : la segmentation et la simplification. Des perspectives d?adaptations pour le Off-Line seront aussi à envisager.


Missions en développement :
- Réalisation d?une application web permettant la retranscription en temps réel du discours de l?enseignant et la projection de la retranscription obtenue (vidéo projecteur et possibilité d?extension pour une interface mobile)
- Elaboration de l?interface étudiant : stockage, récupération de la trace écrite pour retravail et modification.
- Elaboration de l?interface enseignant : paramétrisation des éléments clés du cours
- Documentation : description et mode d?emploi de l?interface d?IHM.


*Environnement de travail*

Le projet est porté par le laboratoire Education, Cultures et Politiques (ECP, EA 4571), Université Lumière Lyon 2 encollaboration avec le Laboratoire de Linguistique et Didactique des Langues Étrangères et Maternelles (LIDILEM), Université Grenoble-Alpes et le Laboratoire d'Informatique de Grenoble (LIG).

Le poste sera accueilli physiquement au Laboratoire d'Informatique de Grenoble, UMR CNRS, au sein de l'équipe GETALP. L'équipe GETALP (https://lig-getalp.imag.fr) regroupe plus de 40 chercheurs, ingénieurs et étudiants dans le domaine du traitement automatisé des langues et de la parole multilingue.

*Candidature*
Envoyer un CV, une lettre de motivation et 1 à 3 lettres de recommandation à Marion.Fabre@univ-lyon2.fr, François.portet@imag.fr, Benjamin.Lecouteux@imag.fr et isabelle.esteve@univ-grenoble-alpes.fr.
Les candidatures seront examinées dès réception, la convocation éventuelle à un entretien d?embauche est prévue début juillet. Merci de candidater dès que possible et avant le 1er Juillet 2019 minuit.

 

 

Back  Top

6-8(2019-06-21) Post doc at LIUM, Univ. du Mans, Le Mans, France

Post-doc position open
------------------------------------
The Speech and Language Technology Group in Le Mans University is looking for
a post-doc scientist to develop autonomous systems

Keywords: Deep Learning, lifelong, autonomous systems, unsupervised learning,
active-learning, interactive-learning

Context
------------------------------------
The LST team from LIUM (Le Mans University) is focusing on autonomous system?s  behavior
for the task of speaker diarization and machine translation. The ALLIES project
(European Chist-ERA collaborative project) aims at developing evaluation protocols,
metrics
and scenarios for lifelong learning autonomous systems. The goal is to enable
auto-adaptable
systems that can also auto-evaluate in order to sustain their performance across time.
Autonomous systems can rely on human domain experts via active and interactive learning
processes to be define within the ALLIES project.

Missions
------------------------------------
Develop an autonomous system for speaker diarization by integrating lifelong learning,
active and interactive learning components. The research work will be related to some of
the following topics:
    - unsupervised adaptation
    - unsupervised evaluation
    - active learning (based on the unsupervised evaluation process, the autonomous system
        is free to require additional knowledge from the human domain expert)
    - Interactive learning (a human domain expert provides specific knowledge to the
autonomous
        system.  This  information must be taken into account by the system)
    - Performance will be analyzed using protocols, metrics and scenarios developed for
the ALLIES project.

Participation to the ALLIES benchmarking evaluation for speaker diarization.
During the ALLIES project, LIUM is organizing two international evaluation
campaigns (one for Speaker Diarization jointly organized with Albayzin and the
second one for Machine Translation jointly with WMT)
The benchmarking evaluation will serve to validate approaches developed during the
post-doc

Dissemination
The research will be published in the major conferences and journals

------------------------------------
Duration: 12 months
Salary: 2 365,14? (after taxes)

Start: as soon as possible, latest January 2020

Location: LIUM, Le Mans University

Supervisers: Anthony Larcher (anthony.larcher@univ-lemans.fr) and Loïc Barrault
(loic.barrault@univ-lemans.fr)

Expected competences:
    - Phd in Machine Learning and Deep Learning
    - Experience in speech processing is positive
    - Python fluent
    - familiar with a deep learning toolkit (Pytorch, TensorFlow)

ALLIES website: https://projets-lium.univ-lemans.fr/allies/

Back  Top

6-9(2019-06-22) Responsable de IA H/F. Manager de l’équipe R&D, Zaion, Paris, France

 

ZAION est une société innovante en pleine expansion spécialisée dans la technologie des robots conversationnels : callbot et chatbot intégrant de l’Intelligence Artificielle.

ZAION a développé une solution qui s’appuie sur une expérience de plus de 20 ans de la Relation Client. Cette solution en rupture technologique reçoit un accueil très favorable au niveau international et nous comptons déjà 18 clients actifs (GENERALI, MNH, APRIL, CROUS, EUROP ASSISTANCE, PRO BTP …).

Nous sommes actuellement parmi les seuls au monde à proposer une offre de ce type entièrement tournée vers la performance. Nous rejoindre, c’est prendre part à une aventure passionnante au sein d’une équipe ambitieuse afin de devenir la référence sur le marché des robots conversationnels.

Dans le cadre de son développement ZAION recherche son Responsable de IA H/F. Manager de l’équipe R&D, votre rôle est stratégique dans le développement et l’expansion de la société. Vous développerez, une solution qui permet de détecter les émotions dans les conversations. Nous souhaitons augmenter les fonctionnalités cognitives de nos callbots afin qu’ils puissent détecter les émotions de leurs interlocuteurs (joie, stress, colère, tristesse…) et donc adapter leurs réponses en conséquence.

Vos missions principales :

- Vous participez à la création du pôle R&D de ZAION et piloterez à votre arrivée votre premier projet de reconnaissance d’émotion dans la voix.

- Construisez, adaptez et faites évoluer nos services de détection d’émotion dans la voix

- Analysez de bases de données conséquentes de conversations pour en extraire les conversations émotionnellement pertinentes

- Construisez une base de données de conversations labelisées avec des étiquettes émotionnelles

- Formez et évaluez des modèles d'apprentissage automatique pour la classification d’émotion

- Déployez vos modèles en production

- Améliorez en continue le système de détection des émotions dans la voix

Qualifications requises et expérience antérieure :

-Vous avez une expérience de 5 ans minimum comme Data Scientist/Machine Learning appliqué à l’Audio et une appétence à l’encadrement 

- Diplômé d’une école d’Ingénieur ou Master en informatique ou un doctorat en informatique mathématiques avec des compétences solides en traitements de signal (audio de préférence)

- Solide formation théorique en apprentissage machine et dans les domaines mathématiques pertinents (clustering, classification, factorisation matricielle, inférence bayésienne, deep learning...)

- La mise à disposition de modèles d'apprentissage machine dans un environnement de production serait un plus

- Vous maîtrisez un ou plusieurs des langages suivants : Python, Frameworks de machine Learning/Deep Learning (Pytorch, TensorFlow,Sci-kit learn, Keras) et Javascript

- Vous maîtrisez les techniques du traitement du signal audio

- Une expérience confirmée dans la labélisation de grande BDD (audio de préférence) est indispensable ;

- Votre personnalité : Leader, autonome, passionné par votre métier, vous savez animer une équipe en mode projet

- Vous parlez anglais couramment

Merci d’envoyer votre candidature à : alegentil@zaion.ai

Back  Top

6-10(2019-06-23) 3 open roles at Speechmatics, Cambridge, UK

1.SPEECH RECOGNITION INTERN

Location: Cambridge, UK

Contact: careers@speechmatics.com

 

As an intern at Speechmatics I have worked on projects that use real machine learning to deliver real value to people across the world. There are few places where the machine learning being used is at the bleeding edge of the field, but Speechmatics is one of them. The company has an amazing culture that allows you to grow as a programmer and as a person. If you want to be a part of a fast-growing machine learning company where you, personally, will make a difference then Speechmatics could well be the place for you!” 

 

  • Sam Ringer, Machine Learning Engineer (previously R&D Intern), Speechmatics 



Background

Speech technology is one of the most popular discussion items at the moment, yet speech interaction is limited to “Alexa, turn on the light”, or “Siri, where is the nearest coffee shop?” We are taking speech technology to the next level using our expertise in machine learning and speech-to-text technology to enable our customers to use conversational speech recognition. Our solutions power subtitling on TV, content discovery for videos, compliance solutions in banks, improve efficiency of meetings, and many other use-cases. Our mission is to improve human communication with a global speech engine, that works and put speech back at the heart of communication.

At Speechmatics you’ll be working with some of the smartest minds in the industry, working on cutting-edge projects and deploying the latest machine learning techniques to disrupt the market, providing customers with the best speech technology available, all whilst immersed in a progressive and great company culture. You can enjoy benefits including, share options, healthcare, life assurance, Bike Doctor, massages, regular BBQs, Brew Dogs in the fridge, no red tape, a top end laptop and much more. We’re building a company that truly strives to be world-leading and we’re looking for people who wholeheartedly believe they can be additive to our culture, bring new ideas to the table and get stuff done. If that’s you, carry on reading.



The Opportunity

The Speechmatics Engineering team develops and maintains speech-oriented products and services that will be used by businesses worldwide and is responsible for the complete product development cycle for these products. In this internship, you’ll help to support fundamental speech and language processing research to improve our performance and language coverage as well as helping to build products and features to delight our users.

Because you will be joining a rapidly expanding team, you will need to be a team player who thrives in a fast-paced environment, with a focus on investigating ideas and rapidly moving research developments into products. We strongly encourage versatility and knowledge transfer within and across teams. You will be expected to learn fast and feel emboldened to ask for support as you need it.

Prior experience of speech recognition is desirable, although Speechmatics has a team of speech recognition engineers who will collaborate and share any specialised knowledge required. If you are enthusiastic about speech recognition and machine learning in general, with the drive to deliver the best possible technology solutions, then we want to hear from you!

Our internships are not time constrained to specific dates – we can work out mutually agreeable start and end dates as part of the application process.



Key Responsibilities

  • Exploring and evaluating research ideas

  • Increasing and improving our language coverage

  • Prototyping new and improved features

  • Helping the company to take your R&D through to production

  • Communicating your work internally



Requirements

Essential

  • Team player

  • Enthusiasm for speech recognition and machine learning

  • Technical understanding of speech recognition or related discipline

  • Ability to rapidly deliver on ideas

  • Competent in Python and/or C/C++

  • Have or be studying towards a degree involving speech recognition, machine learning / computer science or related field



Desirable

  • Practical experience of ASR and ML packages such as Kaldi, HTK or TensorFlow

  • Commercial experience of speech recognition

  • Software development experience



Salary

Competitive salary (dependent on experience), flexible working and some awesome benefits & perks.



Interested?

Get in touch! Send your CV and covering letter to careers@speechmatics.com.

 

2.SPEECH RECOGNITION ENGINEER

Location: Cambridge, UK

Contact: careers@speechmatics.com

 

'As a Speech Recognition Engineer at Speechmatics, I work on solving a multitude of problems related to improving the accuracy and delivering new features for a global automatic speech recognition engine. As a member of the speech team, I work across every aspect of speech and implement the latest research in acoustic and language modelling. The team is supportive and also rich in terms of skills and backgrounds. Speechmatics offer progressive and rewarding opportunities in one of the best speech technology companies in the world.' 

 

  • André Mansikkaniemi, Speech Recognition Engineer at Speechmatics 



Background

Speech technology is one of the most popular discussion items at the moment, yet speech interaction is limited to “Alexa, turn on the light”, or “Siri, where is the nearest coffee shop?” We are taking speech technology to the next level using our expertise in machine learning and speech-to-text technology to enable our customers to use conversational speech recognition. Our solutions power subtitling on TV, content discovery for videos, compliance solutions in banks, improve efficiency of meetings, and many other use-cases. Our mission is to improve human communication with a global speech engine, that works and put speech back at the heart of communication.

At Speechmatics you’ll be working with some of the smartest minds in the industry, working on cutting-edge projects and deploying the latest machine learning techniques to disrupt the market, providing customers with the best speech technology available, all whilst immersed in a progressive and great company culture. You can enjoy benefits including, share options, healthcare, life assurance, Bike Doctor, massages, regular BBQs, Brew Dogs in the fridge, no red tape, a top end laptop and much more. We’re building a company that truly strives to be world-leading and we’re looking for people who wholeheartedly believe they can be additive to our culture, bring new ideas to the table and get stuff done. If that’s you, carry on reading.



The Opportunity 

We are looking for a talented speech engineer to help us build the best speech technology for anybody, anywhere, in any language. You will be part of a team that is working on our core ASR capabilities to improve our speed and accuracy and develop novel features that we can support in all languages. Your work will feed into our ground-breaking framework to support the building of ASR models in every language pack published by the company. You will be responsible for keeping our system the most accurate and useful commercial speech recognition system available. 

As you will be joining a small team, you will need to be a team player who thrives in a fast-paced environment, with a focus on rapidly moving research developments into products. Bringing skills into the team is as important as a can-do attitude. We strongly encourage versatility and knowledge transfer within the team, so that we can share efficiently what needs to be done to meet our commitments to the rest of the company. 



Key Responsibilities 

  • Research and development of improved speed and accuracy across our range of world leading ASR products and related features

  • Delivering the software that provides an easy-to-use feature rich ASR product for our customers

  • Enhancing our machine learning framework that robustly builds any language with the best possible performance

  • Taking data all the way from its raw form through to a finished model

  • Working within a team in an agile environment

  • Working closely with other technical teams and product team to deliver on the company’s technical vision

 

Requirements 

Essential 

  • Graduate degree in Statistics, Engineering, Mathematics, Computer Science 

  • Knowledge of key natural language processing or related technologies, such as speech recognition, text-to-speech or natural language understanding 

  • Experience working with standard speech and/or ML toolkits, e.g. Kaldi, KenLM, TensorFlow, etc. 

  • Solid Python programming skills

  • Experience using Unix/Linux 

  • Quick and enthusiastic learner

  • Excellent teamwork and communications skills

  • Analytical mind-set with a data-driven approach to making decisions and attention to detail 

 

Desirable 

  • Postgraduate degree in related discipline 

  • Commercial work experience in ASR or a related field 

  • Experience of working in an Agile framework 

  • Expertise in modern speech recognition, including WFSTs, lattice processing, neural net (RNN / DNN / LSTM), acoustic and language models, decoding 

  • Comprehensive knowledge of machine learning and statistical modelling 

  • Experience in deep machine learning and related toolkits, e.g. Theano, Torch, etc. 

  • Deep expertise in Python and/or C++ software development 

  • Experience working effectively with software engineering teams or as a Software Engineer 



Salary

Competitive salary (dependent on experience), flexible working and some awesome benefits & perks.



Interested?

Get in touch! Send your CV and covering letter to careers@speechmatics.com.

 

3.SENIOR SPEECH RECOGNITION ENGINEER

Location: Cambridge, UK

Contact: careers@speechmatics.com

 

'As a Speech Recognition Engineer at Speechmatics, I work on solving a multitude of problems related to improving the accuracy and delivering new features for a global Automatic Speech Recognition engine. As a member of the speech team, I work across every aspect of speech and implement the latest research in acoustic and language modelling. The team is supportive and also rich in terms of skills and backgrounds. Speechmatics offer progressive and rewarding opportunities in one of the best speech technology companies in the world.'

 

  • André Mansikkaniemi, Speech Recognition Engineer, Speechmatics



Background

Speech technology is one of the most popular discussion items at the moment, yet speech interaction is limited to “Alexa, turn on the light”, or “Siri, where is the nearest coffee shop?” We are taking speech technology to the next level using our expertise in machine learning and speech-to-text technology to enable our customers to use conversational speech recognition. Our solutions power subtitling on TV, content discovery for videos, compliance solutions in banks, improve efficiency of meetings, and many other use-cases. Our mission is to improve human communication with a global speech engine, that works and put speech back at the heart of communication.

At Speechmatics you’ll be working with some of the smartest minds in the industry, working on cutting-edge projects and deploying the latest machine learning techniques to disrupt the market, providing customers with the best speech technology available, all whilst immersed in a progressive and great company culture. You can enjoy benefits including, share options, healthcare, life assurance, Bike Doctor, massages, regular BBQs, Brew Dogs in the fridge, no red tape, a top end laptop and much more. We’re building a company that truly strives to be world-leading and we’re looking for people who wholeheartedly believe they can be additive to our culture, bring new ideas to the table and get stuff done. If that’s you, carry on reading.



The Opportunity

We are looking for a talented speech engineer to help us build the best speech technology for anybody, anywhere, in any language. You will be part of a team that is working on our core ASR capabilities to improve our speed and accuracy and develop novel features that we can support in all languages. Your work will feed into our ground-breaking framework to support the building of ASR models in every language pack published by the company. You will be responsible for keeping our system the most accurate and useful commercial speech recognition system available.

As you will be joining a small team, you will need to be a team player who thrives in a fast-paced environment, with a focus on rapidly moving research developments into products. Bringing skills into the team is as important as a can-do attitude. We strongly encourage versatility and knowledge transfer within the team, so that we can share efficiently what needs to be done to meet our commitments to the rest of the company.

 

Key Responsibilities

  • Research and development of improved speed and accuracy across our range of world leading ASR products and related features

  • Delivering the software that provides an easy-to-use feature rich ASR product for our customers

  • Enhancing our machine learning framework that robustly builds any language with the best possible performance

  • Taking data all the way from its raw form through to a finished model

  • Working within a team in an agile environment

  • Working closely with other technical teams and product team to deliver on the company’s technical vision



Requirements

Essential

  • Commercial experience in ASR or a related field

  • Graduate degree in Statistics, Engineering, Mathematics, or Computer Science

  • Expertise in modern speech recognition, including WFSTs, lattice processing, neural net (RNN / DNN / LSTM), acoustic and language models, decoding

  • Experience working with standard speech and/or ML toolkits, e.g. Kaldi, KenLM, TensorFlow, etc.

  • Solid Python programming skills

  • Experience using Unix/Linux

  • Drive to help those around you learn and improve every day

  • Excellent teamwork and communications skills

  • Analytical mind-set with a data-driven approach to making decisions and attention to detail

 

Desirable

  • Postgraduate degree in related discipline

  • Experience of working in an Agile framework

  • Comprehensive knowledge of machine learning and statistical modelling

  • Experience in deep machine learning and related toolkits, e.g. Theano, Torch, etc.

  • Deep expertise in Python and/or C++ software development

  • Experience working effectively with software engineering teams or as a Software Engineer



Salary

Competitive salary (dependent on experience), flexible working and some awesome benefits & perks.



Interested?

Get in touch! Send your CV and covering letter to careers@speechmatics.com.

 

More about Speechmatics’ culture



Live for the wow | Build authentic relationships | Be the adventure



Innovation is what we do. We build, we iterate, we develop the next thing that delivers that wow moment. We see value in building long-term, authentic relationships that last and are based on trust and honesty. With our customers, our colleagues, our leaders, our suppliers or within our local community. Our journey should be fun and exciting. We will celebrate our successes and learn from our mistakes together along the way. We embrace learning and change to grow naturally and organically as a company and individuals. We trust, we’re honest, kind and respectful.

 

Back  Top

6-11(2019-07-11) 3year Early Stage Researcher PhD positions

Applications are invited for a three-year Early Stage Researcher PhD positions in the

speech technology for pathological speech.

Description

The thesis focuses on studying the link between the internal representations of Deep Neural Networks (DNNs) and

the subjective representation of speech intelligibility. We propose to explore the saliency detection capabilities of

DNNs when used in a regression task for predicting speech intelligibility scores as given by human experts. By

saliency, we mean to retrieve which frequency bands are important and used by a DNN to make its predictions.

The final expectation is to identify regions of interest in the speech signal, both in time and frequency, that

characterise the level of speech impairment.

The experiments will be processed on various samples of speech performed by 150 people (100 patients and 50

healthy controls). This database was recorded within the INCA C2SI project, and contains speech from patients

treated for cancer of the oral cavity or pharynx. It contains also various metadata such as the location of the tumor,

the impairment in terms of severity and intelligibility that were appreciated by human experts, self evaluation

questionnaires on the patient’s quality of life… Various tasks were recorded such as a sustained vowel, read

speech, nonsense words, prosodic exercises, picture description, etc. There will be also the possibility to extend

the work to another corpus which is composed of voice of patients suffering from Parkinson disease.

At first, the PhD will have to take benefit from the various analysis and descriptions that were done during the C2SI

project trying to correlate the impact of the tumor and the communication ability. Those results will help attesting

the human representation of the impact of the disease. Then, a DNN representation will be modeled to fit the data,

taking care of the data sparsity. The last part of the work will be to explore the intern representation of the DNN,

trying to explore what part of the signal help to make a decision on the impact of the disease and that will be the

final goal of the thesis, studying the automatic representation that lies in the model the student will propose.

This work is funded by the TAPAS project (https://www.tapas-etn-eu.org) which is a Horizon 2020 Marie

Skłodowska-Curie Actions Initial Training Network European Training Network (MSCA-ITN-ETN) project that aims

to transform the well being of people across Europe with debilitating speech pathologies (e.g., due to stroke,

Parkinson's, etc.). These groups face communication problems that can lead to social exclusion. They are now

being further marginalised by a new wave of speech technology that is increasingly woven into everyday life but

which is not robust to atypical speech.

The supervision of the PhD will take place at IRIT laboratory by the SAMoVA team in Toulouse. SAMoVA does

research in the domain of “analysis, modeling and structuring of audiovisual content”. The application areas are

diverse: speech processing, identification of languages, speaker verification and speech and music indexing. The

researchers expertise covers novel machine learning and audio processing technologies and is now focused on

deep learning methods, leading to several publications in international conferences.

Eligibility Criteria

Early Stage Researchers (ESRs) shall, at the time of recruitment by the host organization, be in the first four

years (full-time equivalent research experience) of their research careers.

- The ESR may be a national of a Member State, of an Associated Country or of any Third Country.

- The ESR must not have resided or carried out her/his main activity (work, studies, etc.) in the country of her/his

host organization for more than 12 months in the 3 years immediately prior to her/his recruitment.

- Holds a Master’s degree or equivalent, which formally entitles to embark on a Doctorate.

- Does not hold a PhD degree.

Duration of recruitement: 36 months

Contact: Julie Mauclair (mauclair@irit.fr)

 

 

Back  Top

6-12(2019-07-17) Chief Technical Officer (CTO) at ELDA

Chief Technical Officer (CTO)

Under the supervision of the CEO, the responsibilities of the Chief Technical Officer (CTO) include planning and supervising technical development of tools, software components or applications for language resource production and management.
He/she will be in charge of managing the current language resources production workflows and co-ordinating ELDA?s participation in R&D projects while being also hands-on whenever required by the language resource production and management team. He/she will liaise with external partners at all phases of the projects (submission to calls for proposals, building and management of project teams) within the framework of international, publicly- or privately-funded projects.

This yields excellent opportunities for creative and motivated candidates wishing to participate actively to the Language Engineering field.

Profile:
?    PhD in Computer Science, Natural Language Processing, or equivalent
?    Experience in Natural Language Processing (speech processing, data mining, machine translation, etc.)
?    Familiarity with open source and free software
?    Knowledge of a statically typed functional programming language (OCaml preferred) is a plus
?    Good level in English, with strong writing and documentation skills in English
?    Dynamic and communicative, flexible to work on different tasks in parallel
?    Ability to work independently and as part of a multidisciplinary team
?    Citizenship (or residency papers) of a European Union country
?    Good level in Python, knowledge of Django would be a plus
?    Proficiency in classic shell scripting in a Linux environment (POSIX tools, Bash, awk)

Salary: Commensurate with qualifications and experience (between 45-55K?).
Other benefits: complementary health insurance and meal vouchers

Applicants should email a cover letter addressing the points listed above together with a curriculum vitae to: job@elda.org

ELDA is acting as the distribution agency of the European Language Resources Association (ELRA). ELRA was established in February 1995, with the support of the European Commission, to promote the development and exploitation of Language Resources (LRs). Language Resources include all data necessary for language engineering, such as monolingual and multilingual lexica, text corpora, speech databases and terminology. The role of this non-profit membership Association is to promote the production of LRs, to collect and to validate them and, foremost, make them available to users. The association also gathers information on market needs and trends.

For further information about ELDA/ELRA, visit: ww.elra.info

Back  Top

6-13(2019-07-19) Two Post-doctoral positions at Le Mans University , France

 2 Post-doctoral positions at Le Mans University on Deep learning approaches speech processing

*Place of work* Le Mans University, Le Mans ? France

*Starting date* From now to June 2020

*Salary* between 2 300 and 2 600 ? /month

*Duration* 12 months and 24 months (can be combined in a 36 months position)

****************************************
1st position
****************************************

* Context *
The LST team from LIUM (Le Mans University) is focusing on autonomous system?s  behavior
for the task of speaker diarization and machine translation.
The ALLIES project (European Chist-ERA collaborative project) aims at developing
evaluation protocols, metrics and scenarios for lifelong learning autonomous systems.
The goal is to enable auto-adaptable systems that can also auto-evaluate in order to
sustain their performance across time. Autonomous systems can rely on human domain
experts via active and interactive learning processes to be define within the ALLIES project.

* Missions *
Develop an autonomous system for speaker diarization by integrating lifelong learning,
active and interactive learning components. The research work will be related to some of the following topics:
- unsupervised adaptation
- unsupervised evaluation
- active learning (based on the unsupervised evaluation process, the autonomous
   system is free to require additional knowledge from the human domain expert)
- Interactive learning (a human domain expert provides specific knowledge to
   the autonomous system.  This  information must be taken into account by the system)
Performance will be analyzed using protocols, metrics and scenarios developed for the ALLIES project.

Participation to the ALLIES benchmarking evaluation for speaker diarization.
During the ALLIES project, LIUM is organizing two international evaluation
campaigns (one for Speaker Diarization jointly organized with Albayzin and the
second one for Machine Translation jointly with WMT)
The benchmarking evaluation will serve to validate approaches developed during the post-doc

* Dissemination*
The research will be published in the major conferences and journals

* Duration * 12 months
* Salary * 2 365,14? (after taxes)

* Start * as soon as possible, latest January 2020

* Supervisers * Anthony Larcher (anthony.larcher@univ-lemans.fr) and Loïc Barrault (loic.barrault@univ-lemans.fr)

Expected competences:
    - Phd in Machine Learning and Deep Learning
    - Experience in speech processing is positive
    - Python fluent
    - familiar with a deep learning toolkit (Pytorch, TensorFlow)

ALLIES website: https://projets-lium.univ-lemans.fr/allies/

****************************************
2nd position
****************************************

* Context *
The LST team from LIUM (Le Mans University) is focusing on evolutive end-to-end
neural networks for speaker recognition. The Extensor project (French ANR funded)
aims at developing novel architectures for end-to-end speaker recognition as well as
explaining the behavior of those networks. The focus of Extensor is threefold:
get rid of the legacy of bayesian system?s architecture and explore wider opportunities offered in deep learning;
explore real end-to-end architectures exploiting the tax signal instead of classical features (such as MFCC of filterbanks);
Develop tools for explainability in speaker recognition.

* Missions *
Develop end-to-end speaker recognition system based on state-of-the-art approaches  (x-vectors, sincnet?)
Develop evolutive architectures making use of existing genetic algorithms and study their behavior.
Participate to the three hackathons organized by the Extensor project in order to develop
tools for evolutive neural network architecture and explainability for speaker recognition.
Dissemination: the research will be published in the major conferences and journals

* Duration * 24 months
* Salary * 2 600? (after taxes)

* Start * as soon as possible, latest June 2020

* Location * LIUM, Le Mans University

* Superviser * Anthony Larcher (anthony.larcher@univ-lemans.fr)

Expected competences:
    - Phd in Machine Learning and Deep Learning
    - Experience in speech processing is positive
    - Python fluent
    - familiar with a deep learning toolkit (Pytorch, TensorFlow)

 

--

Anthony Larcher Maître de Conférences, HDR / Associate Professor
Directeur de l'Institut Informatique Claude Chappe
co-responsable de la Spécialité Informatique
Responsable de l'option Interface Personnes Systèmes Tél. +33 (0)2 43 83 38 30
Avenue Olivier Messiaen, 72085 - LE MANS Cedex 09 univ-lemans.fr

Back  Top

6-14(2019-07-20) Three-year Early Stage Researcher PhD positions, IRIT, Toulouse, France

Applications are invited for a three-year Early Stage Researcher PhD positions in the

speech technology for pathological speech.

Description

The thesis focuses on studying the link between the internal representations of Deep Neural Networks (DNNs) and

the subjective representation of speech intelligibility. We propose to explore the saliency detection capabilities of

DNNs when used in a regression task for predicting speech intelligibility scores as given by human experts. By

saliency, we mean to retrieve which frequency bands are important and used by a DNN to make its predictions.

The final expectation is to identify regions of interest in the speech signal, both in time and frequency, that

characterise the level of speech impairment.

The experiments will be processed on various samples of speech performed by 150 people (100 patients and 50

healthy controls). This database was recorded within the INCA C2SI project, and contains speech from patients

treated for cancer of the oral cavity or pharynx. It contains also various metadata such as the location of the tumor,

the impairment in terms of severity and intelligibility that were appreciated by human experts, self evaluation

questionnaires on the patient’s quality of life… Various tasks were recorded such as a sustained vowel, read

speech, nonsense words, prosodic exercises, picture description, etc. There will be also the possibility to extend

the work to another corpus which is composed of voice of patients suffering from Parkinson disease.

At first, the PhD will have to take benefit from the various analysis and descriptions that were done during the C2SI

project trying to correlate the impact of the tumor and the communication ability. Those results will help attesting

the human representation of the impact of the disease. Then, a DNN representation will be modeled to fit the data,

taking care of the data sparsity. The last part of the work will be to explore the intern representation of the DNN,

trying to explore what part of the signal help to make a decision on the impact of the disease and that will be the

final goal of the thesis, studying the automatic representation that lies in the model the student will propose.

This work is funded by the TAPAS project (https://www.tapas-etn-eu.org) which is a Horizon 2020 Marie

Skłodowska-Curie Actions Initial Training Network European Training Network (MSCA-ITN-ETN) project that aims

to transform the well being of people across Europe with debilitating speech pathologies (e.g., due to stroke,

Parkinson's, etc.). These groups face communication problems that can lead to social exclusion. They are now

being further marginalised by a new wave of speech technology that is increasingly woven into everyday life but

which is not robust to atypical speech.

The supervision of the PhD will take place at IRIT laboratory by the SAMoVA team in Toulouse. SAMoVA does

research in the domain of “analysis, modeling and structuring of audiovisual content”. The application areas are

diverse: speech processing, identification of languages, speaker verification and speech and music indexing. The

researchers expertise covers novel machine learning and audio processing technologies and is now focused on

deep learning methods, leading to several publications in international conferences.

Eligibility Criteria

Early Stage Researchers (ESRs) shall, at the time of recruitment by the host organization, be in the first four

years (full-time equivalent research experience) of their research careers.

- The ESR may be a national of a Member State, of an Associated Country or of any Third Country.

- The ESR must not have resided or carried out her/his main activity (work, studies, etc.) in the country of her/his

host organization for more than 12 months in the 3 years immediately prior to her/his recruitment.

- Holds a Master’s degree or equivalent, which formally entitles to embark on a Doctorate.

- Does not hold a PhD degree.

Duration of recruitment: 36 months.

Contact : Julie Mauclair (mauclair@irit.fr)

Back  Top

6-15(2019-07-23) PhD position at LORIA-INRIA, Nancy, France
Automatic classification using deep learning of hate speech posted on the Internet


Supervisors: Irina Illina, MdC, HDR, Dominique Fohr, CR CNRS
Team: Multispeech, LORIA-INRIA, France
Contact: illina@loria.fr, dominique.fohr@loria.fr
Duration of PhD Thesis : 3 years
Deadline to apply : August  15th, 2019
Required skills: background in statistics, natural language processing and computer program skills (Perl, Python). Candidates should email a detailed CV with diploma

Keywords: hate speech, social media, natural language processing.

The rapid development of the Internet and social networks has brought great benefits to women and men in their daily lives. Unfortunately, the dark side of these benefits has led to an increase in hate speech and terrorism as the most common and powerful threats on a global scale. Hate speech is a type of offensive communication mechanism that expresses an ideology of hatred often using stereotypes. Hate speech can target different societal characteristics such as gender, religion, race, disability, etc. Hate speech is the subject of different national and international legal frameworks. Hate speech is a type of terrorism and often follows a terrorist incident or event.


Social networks are incredibly popular today. Nowadays, Twitter, LinkedIn, Facebook and YouTube are used as a standard tool for communicating ideas, beliefs and feelings. Only a small percentage of people use part of the network for unhealthy activities such as hate speech and terrorism. But the impact of this low percentage of users is extremely damaging. For years, social media companies such as Twitter, Facebook and YouTube have invested hundreds of millions of dollars each year in the task of detecting, classifying and moderating hate. But these efforts are mainly based on manually revising the content to identify and remove offensive content, which is extremely expensive.

This thesis aims at designing automatic and evolving methods for the classification of hate speech in the field of social media. Despite the studies already published on this subject, the results show that the task remains very difficult. We will use semantic content analysis methodologies from automatic language processing (NLP) and methodologies based on deep learning (DNN) which is the revolution in the field of artificial intelligence. During this thesis, we will develop a research protocol to classify hate speech in the text in terms of hateful, aggressive, insulting, ironic, neutral, etc. character. This type of problem is placed in the context of the multi-label classification.

In addition, the problem of obfuscation of words in hate messages will need to be addressed. People who want to write hate speech on the Internet know that they risk being censored by rudimentary automatic systems of moderation. So, users try to obscure their words by changing the spelling or the spelling of words.

Among the crucial points of this thesis are the choice of the DNN architecture and the relevant representation of the data, ie the text of the internet message. The system designed will be validated on real flows of social networks.

Skills

Strong background in mathematics, machine learning (DNN), statistics

Following profiles are welcome, either: Strong experience with natural language processing

Excellent English writing and speaking skills are required in any case.

References :
T Gröndahl, L Pajola, M Juuti, M Conti, N Asokan (2018) ?All You Need is? Love?: Evading Hate-speech Detection, arXiv preprint arXiv:1808.09115
Wiegand, M., Klakow, D. (2008). Optimizing Language Models for Polarity Classification. In Proceedings of ECIR, pp. 612-616.
Wiegand, M., Ruppenhofer, J. (2015). Opinion Holder and Target Extraction based on the Induction of Verbal Categories. In Proceedings of CoNLL, pp. 215-225.
Wiegand, M., Ruppenhofer J., Schmidt A.,  C. Greenberg (2018) Inducing a Lexicon of Abusive Words ? A Feature-Based Approach. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Wiegand, M., Wolf, M., Ruppenhofer, J. (2017) Negation Modeling for German Polarity Classification. In Proceedings of GSCL.
Zhang Z., Luo L. (2018). Hate speech detection: a solved problem? The Challenging Case of Long Tail on Twitter. arxiv.org/pdf/1803.03662
Back  Top

6-16(2019-07-29) PhD position, Vrij Universiteit Berussels, Belgkium

PhD position in

Agent Based Modeling of

Cognitively Plausible Emergent Behavior

 

In the context of seed funding for AI research in Flanders prof. Bart de Boer is looking for a PhD student for the origins of language group of the AI-lab of the Vrije Universiteit Brussels.

 

PhD position offered

We offer a four year PhD position funded by a scholarship with a yearly bench fee. The PhD work will consist of building an agent-based simulation in which we can investigate emergence of behavior in a cognitively realistic setting. This means that the agents are not fully rational and that they show behavior similar to that of humans, and that interests of agents are not necessarily always aligned. The modeling will primarily focus on emergence of speech, but the simulation should be general enough that it can be easily adapted to other areas, such as traffic or economic interactions.

 

What we are looking for

We are looking for an enthusiastic student with a degree in artificial intelligence, cognitive science, linguistics or equivalent and who has experience programming agent-based or cognitive models, preferably in Python or C++. Knowledge of speech and speech processing is a bonus. The starting date is negotiable, but preferably no later than September 2019.

 

How to apply

Send a recent CV, detailing your academic record and your programming experience as well as a letter of motivation to prof. Bart de Boer. At this stage we ask you not to send copies of your diplomas or letters of reference. These we will request directly if we decide to further pursue your application, If you have any questions please email prof. Bart de Boer.

 

Links

Context: https://ai.vub.ac.be/node/1687

Email Bart de Boer: bart@ai.vub.ac.be

Back  Top

6-17(2019-07-29) Visiting postdoc at Vrije Universiteit, Brussels, Belgium

Visiting postdoc in

Cognitively Plausible Emergent Behavior

 

In the context of seed funding for AI research in Flanders prof. Bart de Boer is looking for a short term (three-six months) visiting postdoc for the origins of language group of the AI-lab of the Vrije Universiteit Brussels.

 

Position offered

We offer a three-six months visiting postdoc position funded by a scholarship and with a bench fee. The work should consist of agent-based simulation, or of experiments to investigate emergence of behavior in a cognitively realistic setting. This means that in a computer simulation, the agents are not fully rational and that they show behavior similar to that of humans, and that interests of agents are not necessarily always aligned. Experiments should focus on factors that are typical for human settings, but that are generally idealized away, such as altruism, conflicts of interests and other 'non-rational' behaviors. We are most interested in modeling emergence of speech, but we welcome applications proposing other areas, such as traffic or economic interactions.

 

What we are looking for

We are looking for an enthusiastic postdoc with a track record in artificial intelligence, cognitive science, linguistics or equivalent and who has either experience programming agent-based or cognitive models, or who has experience with the interaction between computer models and experiments. The starting date is negotiable, but preferably no later than September 2019.

 

How to apply

Send a recent CV, detailing your academic record and your programming experience as well as a letter of motivation to prof. Bart de Boer. Be sure to include a short (1-page) outline of your proposed project in the letter of motivation, as well as a short planning. At this stage we ask you not to send copies of your diplomas or letters of reference. These we will request directly if we decide to further pursue your application, If you have any questions please email prof. Bart de Boer.

 

Links

Context: https://ai.vub.ac.be/node/1687

Email Bart de Boer: bart@ai.vub.ac.be

 

 

Back  Top

6-18(2019-08-02) Research engineer or Post-doc, at Eurecom, Inria, LIA, France

EURECOM (Nice, France), Inria (Nancy, France) and LIA (Avignon, France) are opening a
18-month Research Engineer or Postdoc position on speaker de-identification and voice
privacy.

For more information and to apply:
https://jobs.inria.fr/public/classic/en/offres/2019-01937

Back  Top

6-19(2019-08-02) Ph.D. position in Softbank robotics and Telecom-Paris, France

Ph.D. position in Softbank robotics and Telecom-Paris
 
Subject:  Automatic multimodal recognition of users? social behaviors
in human-robot interactions (HRI)

*Places of work*  Softbank Robotics [SB] (Paris 15e) & Telecom Paris [TP] Palaiseau (Paris outskirt)

*Starting date* December 2019

*Funding* CIFRE http://www.anrt.asso.fr/fr/cifre-7843

*Context*
The research activity of the Ph.D. candidate will contribute to :
- Softbank Robotics robot?s software NAOqi, within the Expressivity team responsible for ensuring an expressive, natural and fun interaction with our robots.
- the Social Computing topic [SocComp.] of the S2a team [SSA] at Telecom-ParisTech, in close collaboration with other researchers and Ph.D. students of the team.

* Candidate profile*
As a minimum requirement, the successful candidate should have:
?    A master in one or more of the following areas: human-agent interaction, deep learning, computational linguistics, cognitive sciences, affective computing, reinforcement learning, natural language processing, speech processing
?    Excellent programming skills (preferably in Python)
?    Excellent command of English
?    Very good communication skills, commitment, independent working style as well as initiative and team spirit

Given the multidisciplinary aspect of the subject, priority will be given to multidisciplinary profiles. Ph.D. applicant?s interest in social robotics is required.

*Keywords* Human-Machine Interaction, Social Robotics, Deep Learning, Social Computing, Natural Language Processing, Speech Processing, Computer Vision, Multimodality

*Supervision* :  
Industrial: Marine Chamoux (Softbank robotics),
Academic: Chloé Clavel [Clavel],  Giovanna Varni [Varni] (Telecom-Paris)

*How to apply*
Applications should be sent as soon as possible (the first review of applications will be made in early September). The application should be formatted as **a single pdf file** and should include:
?    A complete and detailed curriculum vitae
?    A letter of motivation
?   The academic credentials and the transcript of grades
?    The contact of two referees

The pdf file should be sent to the three supervisors: mchamoux@softbankrobotics.com, chloe.clavel@telecom-paristech.fr, giovanna.varni@telecom-paristech.fr



*Description*
Social robotics, and more broadly human-agent interaction is a field of human-machine interaction for which the integration of social behaviors is expected to have great potential. 'Socio-emotional behaviors' (emotions, social stances) include thus the role and the reactions of the user towards the robot during an interaction. These behaviors could be expressed differently depending:
-on the user (age, emotional state, ...): some users may have a dominant behavior with the robot, considering it a tool to achieve a goal. Others are more cooperative with the robot, they can be more friendly with it. Still others try to trap or 'troll' the robot.
-on the interaction context  (users do not behave in the same way when interacting with a pepper selling toys, or with a pepper bank secretary). Besides, in each of these situations, the robot must be able to adapt its behavior, and to provide a coherent interaction between the user and the robot, avoiding confusion and frustration.

This Ph.D. will focus on multimodal modeling for the prediction of the user's socio-emotional behaviors during interactions with a robot and on building an engine that is robust to real-life scenarios and different contexts. In particular, the Ph.D. candidate will address the following points:
- the encoding of contextual multimodal representations relevant for the modeling of socio-emotional behavior. Thanks to the robot, we have access to a lot of information on context (market, robot intention, demographics, multi or mono user interaction, etc.) that could be combined to our multimodal representation.
- the development and evaluation of models that take advantage of the complementarity of modalities in order to monitor the evolution of the user's socio-emotional behaviors during the interaction (e. g. taking into account the inherent sequentially of the interaction structure)
The models will be based on sequential neural approaches (recurrent networks) that integrate attention models as a continuation of the work done in [Hemamou] and [BenYoussef19].

Selected references of the team:
[Hemamou] L. Hemamou, G. Felhi, V. Vandenbussche, J.-C. Martin, C. Clavel, HireNet: a Hierarchical Attention Model for the Automatic Analysis of Asynchronous Video Job Interviews.  in AAAI 2019
[Garcia] Alexandre Garcia, Chloé Clavel, Slim Essid, Florence d?Alche-Buc, Structured Output Learning with Abstention: Application to Accurate Opinion Prediction, ICML 2018
[Clavel&Callejas] Clavel, C.; Callejas, Z., Sentiment analysis: from opinion mining to human-agent interaction, Affective Computing, IEEE Transactions on, 7.1 (2016) 74-93.
[Langlet] C. Langlet and C. Clavel, Improving social relationships in face-to-face human-agent interactions: when the agent wants to know user?s likes and dislikes , in ACL 2015
[Maslowski]  Irina Maslowski, Delphine Lagarde, and Chloé Clavel.  In-the-wild chatbot corpus: from opinion analysis to interaction problem detection, ICNLSSP 2017.
[Ben-Youssef17]  Atef Ben-Youssef, Chloé Clavel, Slim Essid, Miriam Bilac, Marine Chamoux, and Angelica Lim.  Ue-hri: a new dataset for the study of user engagement in spontaneous human-robot interactions.  In  Proceedings of the 19th ACM International Conference on Multimodal Interaction, pages 464?472. ACM, 2017.
[Ben-Youssef19] Atef Ben Youssef; Chloe Clavel; Slim Essid Early Detection of User Engagement Breakdown in Spontaneous Human-Humanoid Interaction, IEEE Transactions on Affective Computing, 2019
[Varni] Varni G., Hupont, I., Clavel, C., Chetouani, M. Computational Study of Primitive Emotional Contagion in Dyadic Interactions. IEEE Transactions on Affective Computing, 2017.

[SB] https://www.softbankrobotics.com/emea/fr
[TP] https://www.telecom-paristech.fr/eng/  
[SocComp.] https://www.tsi.telecom-paristech.fr/recherche/themes-de-recherche/analyse-automatique-des-donnees-sociales-social-computing/
[SSA] http://www.tsi.telecom-paristech.fr/ssa/#
[Clavel] https://clavel.wp.imt.fr/publications/
[Varni] https://sites.google.com/site/gvarnisite/



Back  Top

6-20(2019-08-03) Speech scientist at ETS Research

Speech scientist at ETS Research :

 

https://etscareers.pereless.com/index.cfm?fuseaction=83080.viewjobdetail&CID=83080&JID=290092

Back  Top

6-21(2019-08-12) Several positions in Forensic Speech Science or Forensic Data Science: Aston University, Birmingham, UK

Positions in Forensic Speech Science or Forensic Data Science:

- One Lecturer or Senior Lecturer

- Two Postdoctoral Researchers

 

Aston University, Birmingham, UK

 


Aston University has recently been awarded GBP 5.4 M from Research England?s Expanding Excellence in England (E3) Fund. The money is being used to expand the existing Centre for Forensic Linguistics into the substantially larger Aston Institute for Forensic Linguistics (AIFL). As part of the expansion, we are building a research team with expertise in forensic speech science and in forensic data science. In addition to conducting research in forensic speech science, members of the team will work on forensic inference and statistics more broadly, and on quantitative-measurement and statistical-model based approaches in other branches of forensic science. The latter potentially include but are not limited to: fingerprints, face, gait, ballistics, blood pattern analysis, and linguistics. The Forensic Speech Science Laboratory and the Centre for Forensic Data Science will be headed by Dr Geoffrey Stewart Morrison, and, in addition to the affiliation with AIFL, will be affiliated with the Computer Science Department in the School of Engineering and Applied Science.

 

We are looking to recruit the following positions:

 

Lecturer or Senior Lecturer in Forensic Speech Science or Forensic Data Science

Reference:    R190354


Salary:    Grade 9 £40,792 ? £48,677 or Grade 10 £50,132 ? £58,089

Contract Type:    Continuing 

Basis:    Full time

Closing Date:    23.59 hours BST on September 30, 2019

Interview Date:    To be confirmed

 

Two Postdoctoral Researchers in Forensic Speech Science or Forensic Data Science

Reference:    R190353


Salary:   Grade 8 £33,199 ? £39,609 or Grade 9 £40,792 ? £48,677

Contract Type:    Fixed term (3 years) 

Basis:    Full time or part time

Closing Date:    23.59 hours BST on September 30, 2019

Interview Date:    To be confirmed

 

The Lecturer or Senior Lecturer position will be a full-time permanent position and will include teaching and administrative responsibilities. The position is costed as a Grade 9 Lecturer, but an exceptionally well qualified and experienced successful applicant could potentially be appointed as a Grade 10 Senior Lecturer. Note: ?Lecturer? is equivalent to North American ?Assistant Professor?, ?Senior Lecturer? is equivalent to North American ?Associate Professor?, and ?Reader / Associate Professor? is an occasionally used additional rank between Senior Lecturer and Professor.

 

The Postdoctoral Researcher positions may be filled as full-time appointments (preferred) or via a combination of part-time appointments. The Postdoctoral Researcher positions will be fixed-term, but the plan is to build a team that will be successful in obtaining additional research funding that will sustain these positions.

 

All new team members must have a commitment to solving forensic problems. Previous experience working on forensic problems would be advantageous, but not essential. A background in forensic speech science, in other branches of forensic science, and/or in forensic inference and statistics would be advantageous, but not essential. At least one of the new team members must have a strong background in state-of-the-art automatic speaker recognition, with an ability to implement systems. Other useful backgrounds for members of the team would include biometrics, machine learning, natural language processing, and acoustic phonetics.

 

Candidates may apply for both the Lecturer / Senior Lecturer and the Research Associate positions. If positions are not filled after this round of recruitment, we will initiate another round of recruitment.

 

We also welcome enquiries from individuals who have obtained or are applying for their own postdoctoral fellowships, e.g., Marie Sklodowska-Curie Fellowships. For suitable candidates we would assist with the application process.

 

Potential candidates are encouraged to contact Dr Geoffrey Stewart Morrison to seek more information about these positions.

Tel:    +44 121 204 3901

e-mail:    g.s.morrison@aston.ac.uk

 

Dr Morrison will be attending Interspeech in September and would be happy to meet informally with potential applicants there.

 

Please visit our website http://www.aston.ac.uk/jobs for further information and to apply online.

 

Aston University is an equal opportunities employer and welcomes applications from all sections of the community.

Back  Top

6-22(2019-08-14) Postdoc at KTH, Stockholm, Sweden
We are looking for a postdoc to conduct research in a multidisciplinary expedition project funded by Wallenberg AI, Autonomous Systems and Software Program (WASP), Sweden?s largest individual research program, addressing compelling research topics that promise disruptive innovations in AI, autonomous systems and software for several years to come.
 
The project combines Formal Methods and Human-Robot Interaction with the goal of moving from conventional correct-by-design control with simple, static human models towards the synthesis of correct-by-design and socially acceptable controllers that consider complex human models based on empirical data. Two demonstrators, an autonomous driving scenario and a mobile robot navigation scenario in crowded social spaces, are planned to showcase the advances made in the project.
 
The focus of this position is on the development of data-driven models of human behavior that can be integrated with formal methods-based systems to better reflect real-world situations, as well as in the evaluation of the social acceptability of such systems. 
 
The candidate will work under the supervision of Assistant Prof. Iolanda Leite (https://iolandaleite.com/) and in close collaboration with another postdoctoral researcher working in the field of formal synthesis.
 
This is a two-year position. The starting date is open for discussion, but ideally, we would like the selected candidate to start ASAP.
 
 
QUALIFICATIONS
 
Candidates should have completed, or be near completion of, a Doctoral degree with a strong international publication record in areas such as (but not limited to) human-robot interaction, social robotics, multimodal perception, and artificial intelligence. Familiarity with formal methods, game theory, and control theory is an advantage.
 
Documented written and spoken English and programming skills are required. Experience with experimental design and statistical analysis is an important asset. Applicants must be strongly motivated, be able to work independently and possess good levels of cooperative and communicative abilities.
 
We look for candidates who are excited about being a part of a multidisciplinary team.
 
 
HOW TO APPLY
 
The application should include:
 
1. Curriculum vitae.
2. Transcripts from University/ University College.
3. A brief description of the candidate's research interests, including previous research and future goals (max 2 pages).
4. Contact of two references. We will contact the references only for selected candidates.
 
The application documents should be uploaded using the KTH's recruitment system:
 
 
The application deadline is ** September 13, 2019 ** 
 
-----------------
Iolanda Leite
Assistant Professor
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science
Division of Robotics, Perception and Learning (RPL)

Teknikringen 33, 4th floor, room 3424, SE-100 44 Stockholm, Sweden
Phone: +46-8 790 67 34
https://iolandaleite.com
 
Back  Top

6-23(2019-08-17) Fully funded PhD position at IDIAP, Martigny, Valais, Switzerland.

There is a fully funded PhD position open at Idiap Research Institute on spiking neural
architectures for speech prosody.

The research will build on work done recently at Idiap on creating tools for
physiologically plausible modelling of speech. The current 'toolbox' contains rudimentary
muscle models and means to drive these using conventional (deep) neural networks. The
main focus of the work will involve use of spiking neural networks such as the 'integrate
and fire' type that is broadly representative of those found in biological systems.
Whilst we have focused so far on prosody (actually intonation), the application is open
ended; the focus is on the neural modelling. A key problem to be solved will be that of
training of the spiking networks, especially with the recurrence that is usual in such
networks. We hope to be able to train and use spiking networks as easily as conventional
backpropagation networks, and to shed light on current understanding of how biological
spiking networks learn (e.g., via spike timing-dependent plasticity).

For more information, and to apply, please follow this link:
 http://www.idiap.ch/education-and-jobs/job-10263

Idiap is located in Martigny in French speaking Switzerland, but functions in English and
hosts many nationalities.  PhD students are registered at EPFL. All positions offer quite
generous salaries.  Martigny has a distillery and a micro-brewery and is close to all
manner of skiing, hiking and mountain life.

There are other open positions on Idiap's main page
 https://www.idiap.ch/en/join-us/job-opportunities

Back  Top

6-24(2019-08-18) PhD positions at IRIT, Toulouse, France

Applications are invited for a three-year Early Stage Researcher PhD positions in the speech technology for pathological speech.

Description
The thesis focuses on studying the link between the internal representations of Deep Neural Networks (DNNs) and the subjective representation of speech intelligibility. We propose to explore the saliency detection capabilities of DNNs when used in a regression task for predicting speech intelligibility scores as given by human experts. By saliency, we mean to retrieve which frequency bands are important and used by a DNN to make its predictions.
 
The final expectation is to identify regions of interest in the speech signal, both in time and frequency, that characterise the level of speech impairment.
 
The experiments will be processed on various samples of speech performed by 150 people (100 patients and 50 healthy controls). This database was recorded within the INCA C2SI project, and contains speech from patients treated for cancer of the oral cavity or pharynx. It contains also various metadata such as the location of the tumor, the impairment in terms of severity and intelligibility that were appreciated by human experts, self evaluation questionnaires on the patient?s quality of life? Various tasks were recorded such as a sustained vowel, read speech, nonsense words, prosodic exercises, picture description, etc.
There will be also the possibility to extend the work to another corpus which is composed of voice of patients suffering from Parkinson disease.
 
At first, the PhD will have to take benefit from the various analysis and descriptions that were done during the C2SI project trying to correlate the impact of the tumor and the communication ability. Those results will help attesting the human representation of the impact of the disease. Then, a DNN representation will be modeled to fit the data, taking care of the data sparsity. The last part of the work will be to explore the intern representation of the DNN, trying to explore what part of the signal help to make a decision on the impact of the disease and that will be the final goal of the thesis, studying the automatic representation that lies in the model the student will propose.
 
This work is funded by the TAPAS project (https://www.tapas-etn-eu.org) which is a Horizon 2020 Marie Sk?odowska-Curie Actions Initial Training Network European Training Network (MSCA-ITN-ETN) project that aims to transform the well being of people across Europe with debilitating speech pathologies (e.g., due to stroke, Parkinson's, etc.). These groups face communication problems that can lead to social exclusion. They are now being further marginalised by a new wave of speech technology that is increasingly woven into everyday life but which is not robust to atypical speech.
 
 
The supervision of the PhD will take place at IRIT laboratory by the SAMoVA team in Toulouse. SAMoVA does research in the domain of ?analysis, modeling and structuring of audiovisual content?. The application areas are diverse: speech processing, identification of languages, speaker verification and speech and music indexing. The researchers expertise covers novel machine learning and audio processing technologies and is now focused on deep learning methods, leading to several publications in international conferences.
 

Eligibility Criteria:

 

Early Stage Researchers (ESRs) shall, at the time of recruitment by the host organization, be in the first four years (full-time equivalent research experience) of their research careers.

- The ESR may be a national of a Member State, of an Associated Country or of any Third Country.
- The ESR must not have resided or carried out her/his main activity (work, studies, etc.) in the country of her/his host organization for more than 12 months in the 3 years immediately prior to her/his recruitment.
- Holds a Master?s degree or equivalent, which formally entitles to embark on a Doctorate.
- Does not hold a PhD degree.


Duration of recruitment: 36 months.

 
Applications can be done through the website : https://www.tapas-etn-eu.org/positions/recruitment
Contact : Julie Mauclair (mauclair@irit.fr)
Back  Top

6-25(2018-08-25) Post-doc position at INRIA Rennes, France

Post-doc position: Pattern mining for Neural Networks debugging: application to speech recognition

Advisors:  Elisa Fromont & Alexandre Termier, IRISA/INRIA RBA ? Lacodam team (Rennes)

Irina Illina & Emmanuel Vincent, LORIA/INRIA  ? Multispeech team (Nancy)
firstname.lastname@inria.fr

Location: INRIA RBA, team Lacodam (Rennes)

Keywords: discriminative pattern mining, neural networks analysis, explainability of black
box models, speech recognition.

Deadline to apply: September 30th, 2019

Context:

Understanding the inner working of deep neural networks (DNN) has attracted a lot of  attention in the past years [1, 2] and most problems were detected and analyzed using visualization techniques [3, 4].  Those techniques help to understand what an individual neuron  or a layer of neurons are computing. We would like to go beyond this by focusing on groups of neurons which are commonly highly activated when a network is making wrong predictions on a set of examples. In the same line as [1], where the authors theoretically link how a training example affects the predictions for a test example using the so called ?influence functions?, we would like to design a tool to ?debug? neural networks by identifying, using symbolic data mining methods, (connected) parts of the neural network architecture associated with erroneous or uncertain outputs.

In the context of speech recognition, this is especially important. A speech recognition system contains two main parts: an acoustic model and a language model. Nowadays models are trained with deep neural networks-based algorithms (DNN) and use very large learning corpora to train an important number of DNN hyperparameters. There are many works to automatically tune these hyperparameters. However, this induces a huge computational cost, and does not empower the human designers. It would be much more efficient to provide human designers with understandable clues about the reasons for the bad performance of the system, in order to benefit from their creativity to quickly reach more promising regions of the hyperparameter search space.

Description of the position:

This position is funded in the context of the HyAIAI ?Hybrid Approaches for Interpretable AI? INRIA project lab (https://www.inria.fr/en/research/researchteams/inria-project-labs). With this position, we would like to go beyond the current common visualization techniques that help to understand what an individual neuron or a layer of neurons is computing, by focusing on groups of neurons that are commonly highly activated when a network is making wrong predictions on a set of examples. Tools such as activation maximization [8] can be used to identify such neurons. We propose to use discriminativepattern mining, and, to begin with, the DiffNorm algorithm [6] in conjunction with the LCM one [7] to identify the discriminative activation patterns among the identified neurons.

The data will be provided by the MULTISPEECH team and will consist of two deep architectures as  representatives of acoustic and language models [9, 10]. Furthermore, the training data will be  provided, where the model parameters ultimately derive from. We will also extend our results by performing experiments with supervised and unsupervised learning to compare the features learned by these networks and to perform qualitative comparisons of the solutions learned by various deep architectures. Identifying ?faulty? groups of neurons could lead to the decomposition of the DL network into ?blocks? encompassing several layers. ?Faulty? blocks may be the first to be modified in the search for a better design.

The recruited person will benefit from the expertise of the LACODAM team in pattern mining and deep learning (https://team.inria.fr/lacodam/) and of the expertise of the MULTISPEECH team  (https://team.inria.fr/multispeech/) in speech analysis, language processing and deep learning. We would ideally like to recruit a 1 year (with possibly one additional year) post-doc with the following preferred skills:
? Some knowledge (interest) about speech recognition
? Knowledgeable in pattern mining (discriminative pattern mining is a plus)
? Knowledgeable in machine learning in general and deep learning particular
? Good programming skills in Python (for Keras and/or Tensor Flow)
? Very good English (understanding and writing)

 See the INRIA web site for the post-doc page.

The candidates should send a CV, 2 names of referees and a cover letter to the four researchers (firstname.lastname@inria.fr) mentioned above. Please indicate if you are applying for the post-doc or the PhD position. The selected candidates will be interviewed in September for an expected start in October-November 2019.

Bibliography:

[1] Pang Wei Koh, Percy Liang: Understanding Black-box Predictions via Influence Functions. ICML 2017: pp 1885-1894 (best paper).

[2] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals: Understanding deep learning requires rethinking generalization. ICLR 2017.

[3] Anh Mai Nguyen, Jason Yosinski, Jeff Clune: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. CVPR 2015: pp 427-436.

[4] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus: Intriguing properties of neural networks. ICLR 2014.

[5] Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi: Deep Text Classification Can be Fooled. IJCAI 2018: pp 4208-4215.

[6] Kailash Budhathoki and Jilles Vreeken. The difference and the norm?characterising similarities and differences between databases. In Joint European Conference on Machine Learning and  Knowledge Discovery in Databases, pages 206?223. Springer, 2015.

[7] Takeaki Uno, Tatsuya Asai, Yuzo Uchida, and Hiroki Arimura. Lcm: An efficient algorithm for enumerating frequent closed item sets. In Fimi, volume 90. Citeseer, 2003.

Back  Top

6-26(2019-08-28) Speech technologist/linguist at Cobaltspeech.

Cobalt Speech & Language (http://www.cobaltspeech.com/ ) is looking for a speech technologist/linguist to help find and create language resources for a project in French Canadian.

The project is short term (<2 months) part time (~5-8h  a week) which is ideal for a student to get an experience with speech industry.

The following skills are required:
  - Native French (does not have to be Canadian - though desirable)
  - Able to communicate in English
  - Basic understanding of speech technology and linguistics
  - Ability to run a python script.

For more information, please contact (Rasmus Dall) rasmus@cobaltspeech.com

Back  Top

6-27(2019-09-04)PhD thesis proposal, GIPSA Lab Grenoble France

PhD thesis proposal

Incremental sequence-to-sequence mapping for

speech generation using deep neural networks

September 4, 2019

1 Context and objectives

In recent years, deep neural networks have been widely used to address sequence-

to-sequence (S2S) learning. S2S models can solve many tasks where source

and target sequences have different lengths such as: automatic speech recog-

nition, machine translation, speech translation, text-to-speech synthesis, etc.

Recurrent, convolutional and transformer architectures, coupled with attention

models, have shown their ability to capture and model complex temporal de-

pendencies between a source and a target sequence of multidimensional discrete

and/or continuous data. Importantly, end-to-end training alleviates the need

to previously extract handcrafted features from the data by learning hierarchi-

cal representations directly from raw data (e.g. character string, video, speech

waveform, etc.).

The most common models are composed of an encoder that reads the full in-

put sequence (i.e. from its beginning to its end) before the decoder produces the

corresponding output sequence. This implies a latency equals to the length of

the input sequence. In particular, for a text-to-speech (TTS) system, the speech

waveform is usually synthesized from a complete text utterance (e.g. a sequence

of words with explicit begin/end-of-utterance markers). Such approach cannot

be used in a truly interactive scenario, in particular by a speech-handicapped

person to communicate orally'. Indeed, the interlocutor has to wait for the

complete utterance to be typed before being able to listen to the synthetic voice,

hence limiting the dynamics and naturalness of the interaction.

The goal of this project is to develop a general methodology for incremental

sequence-to-sequence mapping, with application to interactive speech technolo-

gies. It will require the development of end-to-end classi cation and regression

neural models able to deliver chunks of output data on-the-y, from only a par-

tial observation of input data. The goal is to learn an ecient policy that leads

 to an optimal trade-off between (variable) latency and accuracy of the decoding

process. Possible strategies to decode the output data as soon as possible in-

clude: (i) Predicting online he future' of the output sequence from he past

and present' of the input sequence, with an acceptable tolerance to possible er-

rors, or (2) learn automatically from the data an optimal waiting policy' that

prevents the model to output data when the uncertainty is too high. The devel-

oped methodology will be applied to address two speech processing problems:

(i) Incremental Text-to-Speech synthesis in which speech is synthesized while

the user is typing the text (possibly with a variable latency), and (ii) Incremen-

tal speech enhancement/inpainting in which portions of the speech signal are

unintelligible because of sudden noise or speech production disorders, and must

be replaced on-the-y with reconstructed portions.

2 Work plan

The proposed working plan is the following :

 Bibliographic work on S2S neural models, in the context of speech recogni-

tion, speech synthesis, and machine translation as well as their incremental

(low-latency) variations

 Investigating new architectures, losses, and training strategies toward in-

cremental S2S models.

 Implementing and evaluating the proposed techniques in the context of

end-to-end neural TTS systems (the baseline system may be a neural

TTS trained with past information/left-context only).

 Implementing and evaluating the proposed techniques in the context of

speech enhancement/inpainting, rst on simulated noisy speech and then

on pathological speech.

3 Requirements

We are looking for an outstanding and highly motivated PhD candidate to work

on this subject. Following requirements are mandatory:

 Engineering degree and/or a Master's degree in Computer Science, Signal

Processing or Applied Mathematics.

 Solid skills in Machine Learning. General knowledge in natural language

processing and/or speech processing.

 Excellent programming skills (mostly in Python and deep learning frame-

works).

 Good oral and written communication in English.

 Ability to work autonomously and in collaboration with supervisors and

other team members.

2

4 Work context

Grenoble Alpes Univ. o
ers an excellent research environment with ample com-

puting facilities, as well as remarkable surroundings to explore over the week-

ends. The PhD project will be funded by the Grenoble Artificial Intelligence

Institute (MIAI). The PhD candidate will work both at GIPSA-lab (CRISSP

team) and LIG-lab (GETALP team). The duration of the PhD is 3 years. The

salary is between 1770 and 2100 euros gross per month (depending on comple-

mentary activity or not).

5 How to apply?

Applications should include a detailed CV; a copy of their last diploma; at least

two references (people likely to be contacted); a cover letter of one page; a one-

page summary of the Master thesis; the two last transcripts of notes (Master or

engineering school). Applications should be sent to thomas.hueber@gipsa-lab.fr,

laurent.girin@gipsa-lab.fr and laurent.besacier@imag.fr. Applications will be

evaluated as they are received: the position is open until it is filled.

Back  Top

6-28(2019-09-04) Postdoc proposal, GIPSA Lab Grenoble, France

Postdoc proposal

Spontaneous Speech Recognition.

Application to Situated Corpora in French.

September 4, 2019

1 Postdoc Subject

The goal of the project is to advance the state-of-the-art in spontaneous auto-

matic speech recognition (ASR). Recent advances in ASR show excellent per-

formances on tasks such as read speech ASR (Librispeech), TV shows (MGB

challenge), but what about spontaneous communicative speech ?

This postdoc project would leverage existing transcribed corpora in French

(more than 300 hours) recorded in everyday communication (speech recordings

inside a family, in a shop, during an interview, etc.). One impact of the project

would be the automatization of transcription on very challenging data in order

to feed linguistic and phonetic studies at scale.

Research topics:

 End-to-end ASR models

 Spontaneous speech ASR

 Colloquial speech transcription

 Data augmentation for spontaneous and colloquial language modelling

 Transcribing situated corpora

2 Requirements

We are looking for an outstanding and highly motivated postdoc candidate to

work on this subject. Following requirements are mandatory:

 PhD degree in natural language processing or speech processing.

 Excellent programming skills (mostly in Python and deep learning frame-

works).

1

 Interest in pluri-disciplinary research (speech technology and speech sci-

ence)

 Good oral and written communication in English (French is a plus while

not mandatory)

 Ability to work autonomously and in collaboration with other team mem-

bers.

3 Work context

Grenoble Alpes Univ. o
ers an excellent research environment with ample com-

puting facilities, as well as remarkable surroundings to explore over the week-

ends. The postdoc project will be funded by the Grenoble Arti cial Intelligence

Institute (MIAI). The candidate will work both at LIG-lab (GETALP team)

and LIDILEM-lab. The duration of the postdoc is 18 months.

4 How to apply?

Applications should include a detailed CV; a copy of the last diploma; at least

two references (people likely to be contacted); a cover letter of one page; a

one-page summary of the PhD thesis. Applications should be sent to lau-

rent.besacier@imag.fr Applications will be evaluated as they are received: the

position is open until it is lled.

Back  Top

6-29(2019-09-24) VOXCRIM 2019, Ecully France

nceVOXCRIM 2019

 

 

MARDI 24 SEPTEMBRE 2019

 

de 9h30 à 17h00

 

Conférences et table ronde :

 

regards croisés sur la comparaison de voix

 

en criminalistique.

 

Inscriptions avant le 13 septembre

 

voxcrim@interieur.gouv.fr

 

04 72 86 85 22

 

Service Central de la Police

 

Technique et Scientifique

 

31 avenue Franklin Roosevelt

 

69130 ECULLY

 

Back  Top

6-3052019-09-05) Post doctoral position at IDIAP, Martigny, Switzerland

The Social Computing Group at Idiap is seeking a creative and motivated postdoctoral
researcher to work on deep learning methods for behavioral analysis from video and audio
data. This is an opening for a researcher with experience in deep learning applied to
dynamic human behavior (from voice, body, or face), in the context of a project funded by
Innosuisse, the Swiss funding agency for promotion of innovation.

The position offers the opportunity to do exciting work on deep learning and social
behavior. The researcher will work with Prof. Daniel Gatica-Perez and his research group.
The candidates will have a PhD degree in computer science or engineering, with proven
experience in deep learning and a strong publication record.

Salaries are competitive and starting date is immediate. Interviews will start upon
reception of applications until the positions are filled.

Interested candidates are invited to submit a cover letter, a detailed CV, and the names
of three references through Idiap's online recruitment system:

https://www.idiap.ch/en/join-us/job-opportunities
Position: Postdoctoral researcher in deep learning for social behavior analysis

Interested candidates can also contact Prof. Daniel Gatica-Perez (gatica@idiap.ch).

About Idiap Research Institute

Idiap is an independent, not-for-profit, research institute recognized and funded by the
Swiss Federal Government, the State of Valais, and the City of Martigny. Idiap is an
equal opportunity employer, and offers competitive salaries and excellent working
conditions in a dynamic and multicultural environment.

Idiap is located in the town of Martigny in Valais, a scenic region in the south of
Switzerland, surrounded by the highest mountains of Europe, and offering exceptional
quality of life, exciting recreational activities, including hiking, climbing and skiing,
as well as varied cultural activities, all within close proximity to Lausanne and Geneva.
English is the official working language.

Delete | Reply | Reply to All | Forward | Redirect | View Thread | Blacklist | Whitelist | Message Source | Save as | Print
Move | Copy
Back  Top

6-31(2019-09-09) Postdoctoral Research Fellow/Senior Research Fellow, University of Tampere, Finland

Postdoctoral Research Fellow/Senior Research Fellow

(speech and language technology, cognitive science)

Tampere University and Tampere University of Applied Sciences create

a unique environment for multidisciplinary, inspirational and highimpact

research and education. Our universities community has its

competitive edges in technology, health and society. www.tuni.fi/en

Speech and Cognition research group (SPECOG) is part of Computing Sciences Unit of Tampere University

within the Faculty of Information Technology and Communication Sciences. SPECOG focuses on

interdisciplinary research at the intersection of speech technology and cognitive sciences. We apply

advanced signal processing and machine learning methods to computational modeling of human language

learning and perception and study how human-like information processing principles can be applied in

autonomous next-generation artificial intelligence (AI) systems. The group also conducts research and

development in speech and language technology and in medical signal processing and machine learning.

SPECOG collaborates with several internationally leading research groups within and across disciplinary

boundaries, including joint research with computer scientists, psychologists, brain researchers, and linguists.

The group is also closely affiliated with audio and machine vision research groups of Tampere University.

More information on SPECOG: http://www.cs.tut.fi/sgn/specog/index.html

Job description

We are inviting applications for the position of a postdoctoral research fellow or senior research fellow in

the areas of speech and language technology and cognitive science. The work will be conducted as a

member of the SPECOG research group led by Asst. Prof. Okko Räsänen. We are looking for candidates who

are interested in human and/or machine language processing, and who are willing to contribute to our

highly cross-disciplinary research efforts in understanding language learning in humans and autonomous

computational systems. Our current focus is on machine learning algorithms for unsupervised language

learning from purely acoustic or audiovisual data (sometimes also known as zero-resource speech

processing). However, we also consider candidates with a strong independent research agenda in

complementary areas of speech and language technology.

In this position, the candidate is expected to:

1) carry out world-class research on a topic related to SPECOG focus areas

2) work in close collaboration with other members of the research group, and

3) help to advise undergraduate and/or PhD projects on the relevant topics (with flexibility according to

personal interests and career aspirations).

Requirements

The candidate should hold a doctoral degree (e.g., PhD or D.Sc. (Tech.)) in language technology, computer

science, electrical engineering, cognitive science, or other relevant area. Candidates who have already

completed their doctoral research work but have not yet received their doctoral certificate may also apply.

A successful candidate has strong expertise in signal processing and machine learning (e.g., deep learning),

ideally from the context of speech technology. Applicants with a background in natural language processing

(NLP) or cognitive science are also considered. Experience or interests in linguistics, neuroscience, or

statistics are considered as an advantage. Fluent programming (Python, Matlab, R, C++ or similar) and

English skills are required.

Potential candidates must be capable of carrying out independent research at the highest international

level. Competence must be demonstrated through several existing publications in internationally

recognized peer-reviewed journals and conferences.

We offer

The position will be filled for a fixed-term period of two years, starting as soon as possible (but not

extending the contract beyond the end of December 2021). A trial period of 6 months is applied to all new

employees. The exact starting date is negotiable.

We offer competitive academic salary, typically between 3500–4000 € for a starting postdoc depending on

the experience of the candidate, and 4000–4500 € for a senior research fellow with several years of existing

postdoctoral research experience in academia or industry. In addition, the position comes with extensive

benefits such as occupational healthcare, excellent sports facilities, flexible working hours, and several

restaurants and cafés on the campus with staff discounts. Traveling costs and daily allowances related to

presenting peer-reviewed work in major international conferences is also normally covered.

How to apply

Send the application through the online portal at https://tuni.rekrytointi.com/paikat/?o=A_A&jid=301

We will accept applications until the position has been filled, but no later than 30th of November 2019 at

23.59 (GMT+3). Note that we will start evaluating the applicants already on 1st of October 2019, and the

position may be filled as soon as a suitable candidate is found. We reserve the opportunity to recruit the

candidate through other channels or to decide to not to fill the position in case a suitable candidate is not

found during the process.

The application should contain the following documents (all in .pdf format):

- A free-form letter of motivation for the position in question (max. 1 page)

- Academic CV with contact information

- A list of publications

- A copy of doctoral degree certificate

- A letter or letters of recommendation (max. 3)

Please name all the documents as surname_CV.pdf, surname_list_of_publications.pdf … etc. Only the

applications sent through the university application portal and containing the requested attachments in the

instructed format will be considered in the recruitment process.

The most promising candidates will be interviewed in person or via Skype before the final decision.

For more information about the position, please contact Assistant Professor Okko Räsänen

(firstname.surname@tuni.fi; no umlauts) by email.

About the research environment

Finland is among the most stable, free and safe countries in the world, based on prominent ratings by

various agencies. It is also ranked as one of the top countries as far as social progress is concerned.

Tampere is counted among the major academic hubs in the Nordic countries and offers a dynamic living

environment. Tampere region is one of the most rapidly growing urban areas in Finland and home to a

vibrant knowledge-intensive entrepreneurial community. The city is an industrial powerhouse that enjoys a

rich cultural scene and a reputation as a centre of Finland’s information society. Despite its growth, living in

Tampere is highly affordable with two-room apartment rents starting from approx. 550 €. In addition, the

excellent public transport network enables quick, easy and cheap transportation around the city of

Tampere and university campuses.

Read more about Finland and Tampere:

https://www.visitfinland.com/about-finland/

https://finland.fi/

http://julkaisut.valtioneuvosto.fi/bitstream/handle/10024/161193/MEAEguide_18_2018_T

ervetuloaSuomeen_Eng_PDFUA.pdf

https://visittampere.fi/en/

Back  Top

6-32(2019-09-09) Doctoral Researcher; UNiversity of Tampere, Finland

Doctoral Researcher

(speech and language technology, cognitive science)

Tampere University and Tampere University of Applied Sciences create

a unique environment for multidisciplinary, inspirational and highimpact

research and education. Our universities community has its

competitive edges in technology, health and society. www.tuni.fi/en

Speech and Cognition research group (SPECOG) is part of Computing Sciences Unit of Tampere University

within the Faculty of Information Technology and Communication Sciences. SPECOG focuses on

interdisciplinary research at the intersection of speech technology and cognitive sciences. We apply

advanced signal processing and machine learning methods to computational modeling of human language

learning and perception and study how human-like information processing principles can be applied in

autonomous next-generation artificial intelligence (AI) systems. The group also conducts research and

development in speech and language technology and in medical signal processing and machine learning.

SPECOG collaborates with several internationally leading research groups within and across disciplinary

boundaries, including joint research with computer scientists, psychologists, brain researchers, and linguists.

The group is also closely affiliated with audio and machine vision research groups of Tampere University.

More information on SPECOG: http://www.cs.tut.fi/sgn/specog/index.html

Job description

We are inviting applications for the position of a doctoral researcher (doctoral student) in the areas of

speech and language technology and cognitive science. The work will be conducted as a member of the

SPECOG research group led by Asst. Prof. Okko Räsänen. We are looking for candidates who are interested

in human and/or machine language processing, and who are willing to contribute to our highly crossdisciplinary

research efforts in understanding language learning in humans and autonomous computational

systems. Our current focus is on machine learning algorithms for unsupervised language learning from

purely acoustic or audiovisual data (sometimes also known as zero-resource speech processing). However,

we also consider candidates with interest towards complementary areas of speech and language

technology.

In this position, the candidate is expected to:

1) carry out research on a mutually agreed topic

2) complete a doctoral degree, including mandatory course studies for a D.Sc. (tech.) degree

3) participate to doctoral program

4) be available for assisting tasks in teaching and research group activities (max. 15% of working time)

Requirements

The candidate should hold a master’s degree in language technology, computer science, electrical

engineering, mathematics, cognitive science, or other relevant technical area. Candidates who have already

completed their master’s studies but are graduating during 2019 may also apply. Exceptional master’s

students of Tampere University, who are close to graduation, can be also considered for the position. In

this case, the candidate is first employed as a Research Assistant to carry out a master’s thesis project (6

months) on the topic and, upon a successful thesis project, with the possibility to continue to doctoral

studies.

A successful candidate has experience from signal processing and/or machine learning (e.g., deep learning),

ideally from the context of speech technology. Applicants with a background in natural language processing

(NLP) or cognitive science are also considered. Experience or interests in linguistics, neuroscience, or

statistics are considered as an advantage but not required. Good command of programming (Python,

Matlab, R, C++ or similar) and English skills are required.

Potential candidates must be capable of carrying out independent research work but are also good team

players. Previous experience from research such as research internships or other research projects are

considered as a significant advantage.

We offer

The position will be filled for a fixed-term period of two years with a view for extension. A trial period of 6

months is applied to all new employees. The position start in January 2020 or as soon as possible with a

negotiable exact starting date. Target completion time for doctoral studies is 4 years.

We offer a starting salary of 2300 € for a starting doctoral researcher with later increases based on

demonstrated progress through scientific publications and acquired study credits. In addition, the position

comes with extensive benefits such as occupational healthcare, excellent sports facilities, flexible working

hours, and several restaurants and cafés on the campus with staff discounts. Traveling costs and daily

allowances related to presenting peer-reviewed work in major international conferences is also normally

covered.

How to apply

Send your application through the online portal at https://tuni.rekrytointi.com/paikat/?o=A_A&jid=299

We will accept applications until 15th of November 2019 at 23.59 (GMT+3). We reserve the opportunity to

recruit the candidate through other channels or to decide to not to fill the position in case a suitable

candidate is not found during the process.

The application should contain the following documents (all in .pdf format):

- A free-form letter of motivation for the position in question (max. 1 page)

- Complete CV with contact information and a list of publications (if any)

- A copy of master’s degree certificate

- English language certificate of proficiency (for non-native and non-Finnish applicants)

Please name all the documents as surname_CV.pdf, surname_list_of_publications.pdf … etc. Only the

applications sent through the university application portal and containing the requested attachments in the

instructed format will be considered in the recruitment process.

The most promising candidates will be interviewed in person or via Skype before the final decision.

For more information about the position, please contact Assistant Professor Okko Räsänen

(firstname.surname@tuni.fi; no umlauts) by email.

About the research environment

Finland is among the most stable, free and safe countries in the world, based on prominent ratings by

various agencies. It is also ranked as one of the top countries as far as social progress is concerned.

Tampere is counted among the major academic hubs in the Nordic countries and offers a dynamic living

environment. Tampere region is one of the most rapidly growing urban areas in Finland and home to a

vibrant knowledge-intensive entrepreneurial community. The city is an industrial powerhouse that enjoys a

rich cultural scene and a reputation as a centre of Finland’s information society. Despite its growth, living in

Tampere is highly affordable with private market two-room apartment rents starting from approx. 550 €. In

addition, the excellent public transport network enables quick, easy and cheap transportation around the

city of Tampere and university campuses.

Read more about Finland and Tampere:

https://www.visitfinland.com/about-finland/

https://finland.fi/

http://julkaisut.valtioneuvosto.fi/bitstream/handle/10024/161193/MEAEguide_18_2018_T

ervetuloaSuomeen_Eng_PDFUA.pdf

https://visittampere.fi/en/

Back  Top

6-33(2019-09-09) Postdoc position at IRIT, Toulouse, France

Projet READYNOV : AUDIOCAP

Audition et handicap dans le bruit : vers la restauration de l’intelligibilité de la parole

Type d’emploi POSTDOC

Cadre de la recherche

Restauration d’une intelligibilité dans le bruit pour les personnes

âgées via des prothèses auditives.

Mots-clés Parole, bruit, intelligibilité

Missions

Prédiction de l’intelligibilité de la parole dans le bruit :

- Prise en main d’un système de Reconnaissance Automatique de

la Parole en Français,

- Modélisation acoustique dans le bruit.

Mise en place d’un outil de séparation de la parole et du bruit, fondée sur

l’application de filtres temps-fréquences. Celui-ci sera « réglé » dans un

but de favoriser l’intelligibilité de la parole.

Compétences

Développement logiciel

Traitement du signal

Apprentissage machine (« deep learning »)

Lieu IRIT – 118, route de Narbonne – 31062 TOULOUSE

Date et durée de la mission De 12 à 18 mois à partir du 1er octobre 2019

Salaire Entre 1900 et 2400 € net par mois, suivant l’expérience

Documents à fournir

- CV détaillé

- Lettre de motivation

- Résumé d'une page de la thèse de doctorat

Contact Julien PINQUIER, pinquier@irit.fr

Back  Top

6-34(2019-09-05) R/D position at Zaion, Paris France

ZAION est une société innovante en pleine expansion spécialisée dans la technologie des robots conversationnels : callbot et chatbot intégrant de l’Intelligence Artificielle.

ZAION a développé une solution qui s’appuie sur une expérience de plus de 20 ans de la Relation Client. Cette solution en rupture technologique reçoit un accueil très favorable au niveau international et nous comptons déjà 18 clients actifs (GENERALI, MNH, APRIL, CROUS, EUROP ASSISTANCE, PRO BTP …).

Nous sommes actuellement parmi les seuls au monde à proposer une offre de ce type entièrement tournée vers la performance. Nous rejoindre, c’est prendre part à une aventure passionnante au sein d’une équipe ambitieuse afin de devenir la référence sur le marché des robots conversationnels.

Nous rejoindre, c’est prendre part à une aventure passionnante et innovante afin de devenir la référence sur le marché des robots conversationnels. Dans le cadre de son développement ZAION recrute son Data Scientist /Machine Learning appliqué à l’Audio H/F. Au sein de l’équipe R&D, votre rôle est stratégique dans le développement et l’expansion de la société. Vous développerez, une solution qui permet de détecter les émotions dans les conversations. Nous souhaitons augmenter les fonctionnalités cognitives de nos callbots afin qu’ils puissent détecter les émotions de leurs interlocuteurs (joie, stress, colère, tristesse…) et donc adapter leurs réponses en conséquence.

Vos missions principales :

- Vous participez à la création du pôle R&D de ZAION et piloterez à votre arrivée votre premier projet de reconnaissance d’émotion dans la voix.

- Construisez, adaptez et faites évoluer nos services de détection d’émotion dans la voix

- Analysez de bases de données conséquentes de conversations pour en extraire les conversations émotionnellement pertinentes

- Construisez une base de données de conversations labelisées avec des étiquettes émotionnelles

- Formez et évaluez des modèles d'apprentissage automatique pour la classification d’émotion

- Déployez vos modèles en production

- Améliorez en continue le système de détection des émotions dans la voix

Qualifications requises et expérience antérieure :

-Vous avez une expérience de 2 ans minimum comme Data Scientist/Machine Learning appliqué à l’Audio

- Diplômé d’une école d’Ingénieur ou Master en informatique ou un doctorat en informatique mathématiques avec des compétences solides en traitements de signal (audio de préférence)

- Solide formation théorique en apprentissage machine et dans les domaines mathématiques pertinents (clustering, classification, factorisation matricielle, inférence bayésienne, deep learning...)

- La mise à disposition de modèles d'apprentissage machine dans un environnement de production serait un plus

- Vous maîtrisez un ou plusieurs des langages suivants : Python, Frameworks de machine Learning/Deep Learning (Pytorch, TensorFlow,Sci-kit learn, Keras) et Javascript

- Vous maîtrisez les techniques du traitement du signal audio

- Une expérience confirmée dans la labélisation de grande BDD (audio de préférence) est indispensable ;

- Votre personnalité : Leader, autonome, passionné par votre métier, vous savez animer une équipe en mode projet

- Vous parlez anglais couramment

Merci d’envoyer votre candidature à : alegentil@zaion.ai

Back  Top

6-35(2019-09-15) Post-doc and research engineer at INSA, Rouen, Normandy, France

Post-doctoral position (1 year): Perception for interaction and social navigation

Research Engineer (1 year): Social Human-Robot Interactions

Laboratory: LITIS, INSA Rouen Normandy, France

Project: INCA (Natural Interactions with Artificial Companions)

Summary:

The emergence of interactive robots and connected objects has lead to the appearance of symbiotic systems made up of human users, virtual agents and robots in social interactions. However, two major scientific difficulties are unsolved yet: on the one hand, the recognition of human activity remains inaccurate, both at the operational level (location, mapping and identification of objects and users) and cognitive (recognition and tracking of users? intentions) and, on the other hand, interaction involves different modalities that must be adapted according to the context, the user and the situation. The INCA project aims at developing artificial companions (interactive robots and virtual agents) with a particular focus on social interactions. Our goal is to develop new models and algorithms for intelligent companions capable of (1) perceiving and representing an environment (real, virtual or mixed) consisting of objects, robots and users; (2) interacting with users in a natural way to assess their needs, preferences, and engagement; (3) learning models of user behavior and (4) generating semantically adequate and socially appropriate responses.

 

Post-doctoral position in perception for interaction and social navigation (1 position)

The candidate will work to ensure that a robot can recognize the physical content of the scene surrounding him, recognize himself, static and dynamic objects (users and other robots) and finally predict the movement of dynamic elements. The integration of data from different sensors should allow the mapping of an unknown environment and estimate the position of the robot. First, VSLAM techniques (Visual Simultaneous Localization And Mapping) (Saputra 2018) will be used to map the scene. The regions (or points) of interest detected could
then be used to detect obstacles. In order to distinguish between static and dynamic objects, methods of separating the background from the foregound of the scene (Kajo et al, 2018) will be used. Finally, some recent techniques of the Flownet 2.0 type (Eddy et al, 2017), for the prediction of the motion on a video sequence should make it possible to predict the next movement of an object dynamic object and the to apprehend its behavior.

Profile: the candidate must have strong skills in mobile robotics and navigation techniques (VSLAM, OrbSlam, Optical Flow, stereovision...) and a high programming capacities under ROS or any other programming language compatible with robotics. Machine learning and Deep learning skills will be highly appreciated.

Research Engineer in Social Human-Robot Interactions (1 position)

The hired research engineer will work closely with the INCA research staff (permanent, PhD and post-doctoral members) and other project partners. This will mainly involve administering the project's Pepper robots, developing the necessary tools, integrating the algorithms developed with the AgentSlang platform (https://agentslang.github.io/) and join the team created to participate in the Robocup 2020 in Bordeaux, @Home league.

Profile: Computer Sciences / Robotics Engineer

  • Good level in programming (ROS, Python, possibly Java)

  • Strong knowledge in robotics

  • Experiences in some of the following areas would be a plus (non-exhaustive list): machine learning, human-machine social interactions, scene perception, spatio-temporal and semantic representation, natural language dialogue.

Duration and remuneration: 1 year, 2480euros/month (gross salary)

Application should be sent to: alexandre.pauchet@insa-rouen.fr

  • Curriculum vitae

  • Cover letter

  • Recommendation letters

  • Recently graduated students: transcripts

Back  Top

6-36(2019-09-20) Poste ATER, Paris Sorbonne, France

un poste d'ATER en Informatique est disponible à la faculté des lettres de Sorbonne
Université. Le lien pour postuler est http://lettres.sorbonne-universite.fr/ater

Back  Top

6-37(2019-09-21) Post-doc/PhD position, LORIA, Nancy, France
Post-doc/PhD position Pattern mining for Neural Networks debugging: application to speech recognition
 
Advisors:  Elisa Fromont & Alexandre Termier, IRISA/INRIA RBA ? Lacodam team (Rennes)
Irina Illina & Emmanuel Vincent, LORIA/INRIA  ? Multispeech team (Nancy)

firstname.lastname@inria.fr

Location: INRIA RBA, team Lacodam (Rennes)

Deadline to apply : October 30th 2019.
 
Starting date : December 2019 -January 2020
 
Keywords: discriminative pattern mining, neural networks analysis, explainability of blackbox models, speech recognition.

Context:
Understanding the inner working of deep neural networks (DNN) has attracted a lot of  attention in the past years [1, 2] and most problems were detected and analyzed using visualization techniques [3, 4].  Those techniques help to understand what an individual neuron  or a layer of neurons are computing. We would like to go beyond this by focusing on groups of neurons which are commonly highly activated when a network is making wrong predictions on a set of examples. In the same line as [1], where the authors theoretically link how a training example affects the predictions for a test example using the so called ?influence functions?, we would like to design a tool to ?debug? neural networks by identifying, using symbolic data mining methods, (connected) parts of the neural network architecture associated with erroneous or uncertain outputs.

In the context of speech recognition, this is especially important. A speech recognition system contains two main parts: an acoustic model and a language model. Nowadays models are trained with deep neural networks-based algorithms (DNN) and use very large learning corpora to train an important number of DNN hyperparameters. There are many works to automatically tune these hyperparameters. However, this induces a huge computational cost, and does not empower the human designers. It would be much more efficient to provide human designers with understandable clues about the reasons for the bad performance of the system, in order to benefit from their creativity to quickly reach more promising regions of the hyperparameter search space.

Description of the position:

This position is funded in the context of the HyAIAI ?Hybrid Approaches for Interpretable AI? INRIA project lab (https://www.inria.fr/en/research/researchteams/inria-project-labs). With this position, we would like to go beyond the current common visualization techniques that help to understand what an individual neuron or a layer of neurons is computing, by focusing on groups of neurons that are commonly highly activated when a network is making wrong predictions on a set of examples. Tools such as activation maximization [8] can be used to identify such neurons. We propose to use discriminative pattern mining, and, to begin with, the DiffNorm algorithm [6] in conjunction with the LCM one [7] to identify the discriminative activation patterns among the identified neurons.

The data will be provided by the MULTISPEECH team and will consist of two deep architectures as  representatives of acoustic and language models [9, 10]. Furthermore, the training data will be  provided, where the model parameters ultimately derive from. We will also extend our results by performing experiments with supervised and unsupervised learning to compare the features learned by these networks and to perform qualitative comparisons of the solutions learned by various deep architectures. Identifying ?faulty? groups of neurons could lead to the decomposition of the DL network into ?blocks? encompassing several layers. ?Faulty? blocks may be the first to be modified in the search for a better design.

The recruited person will benefit from the expertise of the LACODAM team in pattern mining and deep learning (https://team.inria.fr/lacodam/) and of the expertise of the MULTISPEECH team  (https://team.inria.fr/multispeech/) in speech analysis, language processing and deep learning. We would ideally like to recruit a 1 year (with possibly one additional year) post-doc with the following preferred skills:

? Some knowledge (interest) about speech recognition
? Knowledgeable in pattern mining (discriminative pattern mining is a plus)
? Knowledgeable in machine learning in general and deep learning particular
? Good programming skills in Python (for Keras and/or Tensor Flow)
? Very good English (understanding and writing)

However, good PhD applications will also be considered and, in this case, the position will last 3 years. The position will be funded by INRIA (https://www.inria.fr/en/). See the INRIA web site for the post-doc and PhD wages.

The candidates should send a CV, 2 names of referees and a cover letter to the four researchers (firstname.lastname@inria.fr) mentioned above. Please indicate if you are applying for the post-doc or the PhD position. The selected candidates will be interviewed in June for an expected start in September 2019.

Bibliography:
[1] Pang Wei Koh, Percy Liang: Understanding Black-box Predictions via Influence Functions. ICML 2017: pp 1885-1894 (best paper).
[2] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals: Understanding deep learning requires rethinking generalization. ICLR 2017.
[3] Anh Mai Nguyen, Jason Yosinski, Jeff Clune: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. CVPR 2015: pp 427-436.
[4] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus: Intriguing properties of neural networks. ICLR 2014.
[5] Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi: Deep Text Classification Can be Fooled. IJCAI 2018: pp 4208-4215.
[6] Kailash Budhathoki and Jilles Vreeken. The difference and the norm?characterising similarities and differences between databases. In Joint European Conference on Machine Learning and  Knowledge Discovery in Databases, pages 206?223. Springer, 2015
[7] Takeaki Uno, Tatsuya Asai, Yuzo Uchida, and Hiroki Arimura. Lcm: An efficient algorithm for enumerating frequent closed item sets. In Fimi, volume 90. Citeseer, 2003.
[8] Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. Visualizing higher-layer features of a deep network. University of Montreal, 1341(3):1, 2009.
[9] G. Saon, H.-K. J. Kuo, S. Rennie, M. Picheny: The IBM 2015 English conversational telephone speech recognition system?, Proc. Interspeech, pp. 3140-3144, 2015.
[10] W. Xiong, L. Wu, F. Alleva, J. Droppo, X. Huang, A. Stolcke : The Microsoft 2017 Conversational Speech Recognition System, IEEE ICASSP, 2018.
 
Back  Top

6-38(2019-09-22) Postdoc position at Grenoble Alps University, Grenoble, France

Postdoc proposal

Spontaneous Speech Recognition.

Application to Situated Corpora in French.

September 4, 2019

1 Postdoc Subject

The goal of the project is to advance the state-of-the-art in spontaneous auto-

matic speech recognition (ASR). Recent advances in ASR show excellent per-

formances on tasks such as read speech ASR (Librispeech), TV shows (MGB

challenge), but what about spontaneous communicative speech ?

This postdoc project would leverage existing transcribed corpora in French

(more than 300 hours) recorded in everyday communication (speech recordings

inside a family, in a shop, during an interview, etc.). One impact of the project

would be the automatization of transcription on very challenging data in order

to feed linguistic and phonetic studies at scale.

Research topics:

 End-to-end ASR models

 Spontaneous speech ASR

 Colloquial speech transcription

 Data augmentation for spontaneous and colloquial language modelling

 Transcribing situated corpora

2 Requirements

We are looking for an outstanding and highly motivated postdoc candidate to

work on this subject. Following requirements are mandatory:

 PhD degree in natural language processing or speech processing.

 Excellent programming skills (mostly in Python and deep learning frame-

works).

1

 Interest in pluri-disciplinary research (speech technology and speech sci-

ence)

 Good oral and written communication in English (French is a plus while

not mandatory)

 Ability to work autonomously and in collaboration with other team mem-

bers.

3 Work context

Grenoble Alpes Univ. o
ers an excellent research environment with ample com-

puting facilities, as well as remarkable surroundings to explore over the week-

ends. The postdoc project will be funded by the Grenoble Arti cial Intelligence

Institute (MIAI). The candidate will work both at LIG-lab (GETALP team)

and LIDILEM-lab. The duration of the postdoc is 18 months.

4 How to apply?

Applications should include a detailed CV; a copy of the last diploma; at least

two references (people likely to be contacted); a cover letter of one page; a

one-page summary of the PhD thesis. Applications should be sent to lau-

rent.besacier@imag.fr Applications will be evaluated as they are received: the

position is open until it is lled.

Back  Top

6-39(2019-09-22) PhD thesis proposal at Grenble Alps University, Grenoble, France

PhD thesis proposal

Incremental sequence-to-sequence mapping for

speech generation using deep neural networks

September 4, 2019

1 Context and objectives

In recent years, deep neural networks have been widely used to address sequence-

to-sequence (S2S) learning. S2S models can solve many tasks where source

and target sequences have di
erent lengths such as: automatic speech recog-

nition, machine translation, speech translation, text-to-speech synthesis, etc.

Recurrent, convolutional and transformer architectures, coupled with attention

models, have shown their ability to capture and model complex temporal de-

pendencies between a source and a target sequence of multidimensional discrete

and/or continuous data. Importantly, end-to-end training alleviates the need

to previously extract handcrafted features from the data by learning hierarchi-

cal representations directly from raw data (e.g. character string, video, speech

waveform, etc.).

The most common models are composed of an encoder that reads the full in-

put sequence (i.e. from its beginning to its end) before the decoder produces the

corresponding output sequence. This implies a latency equals to the length of

the input sequence. In particular, for a text-to-speech (TTS) system, the speech

waveform is usually synthesized from a complete text utterance (e.g. a sequence

of words with explicit begin/end-of-utterance markers). Such approach cannot

be used in a truly interactive scenario, in particular by a speech-handicapped

person to communicate orally'. Indeed, the interlocutor has to wait for the

complete utterance to be typed before being able to listen to the synthetic voice,

hence limiting the dynamics and naturalness of the interaction.

The goal of this project is to develop a general methodology for incremental

sequence-to-sequence mapping, with application to interactive speech technolo-

gies. It will require the development of end-to-end classi cation and regression

neural models able to deliver chunks of output data on-the-y, from only a par-

tial observation of input data. The goal is to learn an ecient policy that leads

to an optimal trade-o
between (variable) latency and accuracy of the decoding

process. Possible strategies to decode the output data as soon as possible in-

clude: (i) Predicting online he future' of the output sequence from he past

1

and present' of the input sequence, with an acceptable tolerance to possible er-

rors, or (2) learn automatically from the data an optimal waiting policy' that

prevents the model to output data when the uncertainty is too high. The devel-

oped methodology will be applied to address two speech processing problems:

(i) Incremental Text-to-Speech synthesis in which speech is synthesized while

the user is typing the text (possibly with a variable latency), and (ii) Incremen-

tal speech enhancement/inpainting in which portions of the speech signal are

unintelligible because of sudden noise or speech production disorders, and must

be replaced on-the-y with reconstructed portions.

2 Work plan

The proposed working plan is the following :

 Bibliographic work on S2S neural models, in the context of speech recogni-

tion, speech synthesis, and machine translation as well as their incremental

(low-latency) variations

 Investigating new architectures, losses, and training strategies toward in-

cremental S2S models.

 Implementing and evaluating the proposed techniques in the context of

end-to-end neural TTS systems (the baseline system may be a neural

TTS trained with past information/left-context only).

 Implementing and evaluating the proposed techniques in the context of

speech enhancement/inpainting, rst on simulated noisy speech and then

on pathological speech.

3 Requirements

We are looking for an outstanding and highly motivated PhD candidate to work

on this subject. Following requirements are mandatory:

 Engineering degree and/or a Master's degree in Computer Science, Signal

Processing or Applied Mathematics.

 Solid skills in Machine Learning. General knowledge in natural language

processing and/or speech processing.

 Excellent programming skills (mostly in Python and deep learning frame-

works).

 Good oral and written communication in English.

 Ability to work autonomously and in collaboration with supervisors and

other team members.

2

4 Work context

Grenoble Alpes Univ. o
ers an excellent research environment with ample com-

puting facilities, as well as remarkable surroundings to explore over the week-

ends. The PhD project will be funded by the Grenoble Arti cial Intelligence

Institute (MIAI). The PhD candidate will work both at GIPSA-lab (CRISSP

team) and LIG-lab (GETALP team). The duration of the PhD is 3 years. The

salary is between 1770 and 2100 euros gross per month (depending on comple-

mentary activity or not).

5 How to apply?

Applications should include a detailed CV; a copy of their last diploma; at least

two references (people likely to be contacted); a cover letter of one page; a one-

page summary of the Master thesis; the two last transcripts of notes (Master or

engineering school). Applications should be sent to thomas.hueber@gipsa-lab.fr,

laurent.girin@gipsa-lab.fr and laurent.besacier@imag.fr. Applications will be

evaluated as they are received: the position is open until it is lled.

Back  Top

6-40(2019-10-18) Journées d’étude sur la convergence, LPL, Aix en Provence, France

Journées d’étude sur la convergence

18-19 octobre 2019
Laboratoire Parole et Langage, Aix-en-Provence

Organisées par
Le Cercle Linguistique d’Aix-en-Provence (CLAIX)
L’équipe Systèmes & Usages du Laboratoire Parole et Langage (LPL)

Contacts : Sibylle Kriegel et Sophie Herment (LPL/AMU)

Page Web

Programme

Vendredi 18 octobre 2019

9h30-10h15              Accueil

10h15-11h15           Debra Ziegeler, conférencière invitée (U. Sorbonne Nouvelle) : The future of already in Singapore English: a matter of selective convergence

11h15-11h45           Pause-café

11h45-12h30           Diana Lewis (AMU, LPL) : Grammaticalisation de lexème, de construction : deux cas de développement adverbial en anglais

12h30-14h15           Déjeuner

14h15-15h00          James German (AMU, LPL) : Linguistic adaptation as an automatic response to socio-indexical cues

15h00-15h45         Daniel Véronique (AMU, LPL) : L’« agglutination nominale » dans les langues créoles françaises : un exemple de convergence ?

15h45-16h15          Pause-Café

16h15-17h00          Chady Shimeen-Khan (U. Paris Descartes, CEPED) : Convergences et divergences à des fins discursives à travers l’usage des marqueurs discursifs chez les jeunes Mauriciens plurilingues

17h00-17h45         Sibylle Kriegel (AMU, LPL) : Créolisation et convergence : l’expression du corps comme marque du réfléchi

17h45-18h15          Charles Zaremba : Le CLAIX Cercle Linguistique d’Aix-en-Provence, retrospective

 

Samedi 19 octobre

10h-10h45                 Akissi Béatrice Boutin (ILA-UFHB, Abidjan-Cocody) : Réanalyses avec et sans convergence dans le plurilinguisme ivoirien

10h45-11h30             Massinissa Garaoun (AMU) : Convergence linguistique et cycles : le cas de la négation en arabe maghrébin et en berbère     

11h30-12h                  Pause-café

12h-12h45                 Nicolas Tournadre (AMU, LACITO) : Phénomènes de copie et de convergence dans les langues du Tibet et de l’Himalaya

12h45-13h30            Cyril Aslanov (AMU, LPL) : Convergence and secondary entropy in a macrodiachronic perspective

 

Back  Top

6-41(2019-10-05) offre de post-doctorat au Laboratoire national de métrologie et d' essais (LNE) , Trappes, France

offre de post-doctorat au sein de l’activité « Evaluation des systèmes d’intelligence artificielle » du LNE :

 

https://www.lne.fr/fr/offre-emploi/post-doc-evaluation-systemes-evolutifs-locuteur-traduction

 

Le candidat retenu intégrera une équipe en forte croissance spécialisée en évaluation des systèmes d’IA, ainsi qu’un projet européen ambitieux portant sur les systèmes de traitement de la langue évolutifs (en traduction et en diarisation). La caractérisation des performances des systèmes intelligents capables de s’auto-améliorer au fur et à mesure de leur utilisation, par eux-mêmes et par interaction avec l’humain utilisateur, représente un véritable défi que ce post-doctorat propose de relever.

Back  Top

6-42(2019-10-15) Post doc Portugal

   An open full-time Post-Doc employment

    position for 30 months in the context of our research project DyNaVoiceR

    which is supported by FCT (the Portuguese Foundation for Science and

    Technology).

   

    The official announcement can be found:

   

    English version:

http://www.eracareers.pt/opportunities/index.aspx?task=showAnuncioOportunities&jobId=118038&idc=1e

 

   

    Portuguese version:

http://www.eracareers.pt/opportunities/index.aspx?task=showAnuncioOportunities&jobId=118038&lang=pt&idc=1e

 

    The salary level is quite attractive: 2.128,34 Euros per month (14

    salaries per year)

   

Aapplication deadline is October 15.

Back  Top

6-43(2019-10-10) Ingenieur de recherche,Lab.national de metrologie et d'essais, Trappes, France

 

Ingénieur de recherche en

Traitement Automatique du Langage – F/H


Poste en CDI

Localisation : Laboeratoire national de metrologie et d'essais,Trappes

Traitement Automatique du Langage – F/H

Référence : ML/ITAL/DEC

Leader dans l’univers de la mesure et des références, jouissant d’une forte notoriété en France et à l’international, le LNE soutient l’innovation industrielle et se positionne comme un acteur important pour une économie plus compétitive et une société plus sûre.

Au carrefour de la science et de l’industrie depuis sa création en 1901, le LNE offre son expertise à l’ensemble des acteurs économiques impliqués dans la qualité et la sécurité des produits.

Pilote de la métrologie française, notre recherche est au coeur de cette mission de service public et l’une des clés du succès des entreprises.

Nous nous attachons à répondre au besoin industriel et académique de mesures toujours plus justes, dans des conditions de plus en plus extrêmes ou sur les concepts les plus émergents tels que les véhicules autonomes, les nanotechnologies ou la fabrication additive.

Missions :

Vous intégrerez une équipe de six ingénieurs-docteurs régulièrement accompagnés de post-doctorants, doctorants et stagiaires, spécialisée dans l’évaluation et la qualification des systèmes d’intelligence artificielle. Cette équipe est historiquement reconnue pour son expertise dans l’évaluation des systèmes de traitement automatique du langage naturel et le poste proposé doit contribuer à renforcer cette expertise dans un contexte de forte dynamique de croissance.

Depuis quelques années, l’équipe s’est diversifiée en termes de domaines d’application de son expertise d’évaluation des intelligences en traitant de sujets tels que les dispositifs médicaux, les robots industriels collaboratifs, les véhicules autonomes, etc. L’équipe capitalise sur les savoir-faire à la fois divers et ciblés de ses experts (TAL, imagerie, robotique, etc.) afin d’apporter conjointement une solution satisfaisante à la question de l’évaluation et de la certification des systèmes intelligents, condition impérative de leur acceptabilité et faisant l’objet aujourd’hui d’une attention prioritaire des pouvoirs publics.

C’est dans le cadre de la mise en place progressive d’un centre d’évaluation des systèmes intelligents à vocation nationale et internationale qu’elle cherche à accueillir les meilleurs profils de chaque spécialité de l’IA. Les missions principales de ce futur centre sont le développement de nouveaux protocoles d’évaluation, la qualification et la certification de systèmes intelligents, l’organisation de challenges (campagnes de benchmarking), la mise à disposition de ressources expérimentales, le développement et l’organisation du secteur d’activité et la définition de principes, politiques, doctrines et normes à cet effet.

En tant qu’ingénieur-docteur de recherche en TAL, votre champ d’intervention prioritaire sera le traitement de la langue (texte et parole). Vous pourrez également être amené.e à intervenir dans d’autres domaines du traitement de l’information (par exemple sur le traitement de l’image dont la reconnaissance optique de caractères), puis au-delà en fonction des priorités et de vos propres compétences et affinités.

Le poste est évolutif sur le moyen et le long termes en ce qu’il vise à la formation d’experts techniques de stature au moins nationale et ayant vocation à mener eux-mêmes la politique de croissance et de tutelle de leur spécialité, sous réserve du cadre réglementaire et d’orientations générales du LNE ou de ses donneurs d’ordres.

Dans un premier temps, vous couvrirez les missions suivantes :

- Contribution à la R&D et aux actions structurantes (60%) :

 

Inventaire technique et commercial du besoin et de l’offre, priorisation des marchés et champs techniques à investir

Identification et définition des grandeurs à mesurer, des métriques afférentes, des protocoles d’évaluation et des moyens d’essais nécessaires

Structuration des données de la discipline (TAL) au sein de référentiels et selon des nomenclatures à bâtir

Programmation et conduite d’essais à des fins expérimentales, de recherche itérative et d’étalonnage

Constitution et animation d’un réseau de chercheurs des secteurs public et privé, national et étranger, en appui aux présentes missions

Contribution au montage et à l’exécution de projets de recherche nationaux et européens et de coopérations internationales

Participation aux travaux de planification du LNE : investissements, RH, budgets annuels, perspectives pluriannuelles

Publication et présentation des résultats scientifiques

Encadrement éventuel de doctorants, post-doctorants, stagiaires

- Contribution aux prestations commerciales en TAL (40%) :

Ingénierie linguistique générale (manipulation des données, analyse statistique, etc.)

Prise en charge du besoin client et reformulation dans le cadre d’une offre technique et commerciale

Organisation des tâches pour la réalisation de la prestation, estimation des ressources nécessaires, négociation

Réalisation de ces tâches en coordination avec l’équipe

Production/rédaction des livrables

Présentation des résultats au client

Profil :

Vous êtes titulaire soit d’un doctorat, soit d’un diplôme d’ingénieur avec un minimum de trois ans d’expérience professionnelle, en informatique ou en sciences du langage, avec une spécialisation en traitement automatique de la langue (TAL) et plus généralement en intelligence artificielle. Les expériences professionnelles ou académiques passées en développement et/ou test logiciel, en analyse statistique, ainsi qu’en traitement de la parole ou de l’image seront particulièrement appréciées.

Vous disposez également d’un bon niveau d’anglais et de programmation (C++ et/ou Python), ainsi que d’une expérience en utilisation de Linux.

Dans le cadre de votre prise de poste, vous pourriez être amené.e à suivre des formations complémentaires (par exemple en intelligence artificielle et en cybersécurité).

Vous saurez être à l’initiative, en disposant d’une large autonomie et d’un potentiel de créativité vous permettant d’occuper pleinement votre espace de responsabilité dans un objectif d’excellence. Vous êtes capable de défendre un leadership de par la qualité et la clarté de vos argumentaires.

Déplacements fréquents en région parisienne (une fois par semaine), en province (une à deux fois par mois) et occasionnels dans le monde (une fois par trimestre) dans le cadre de prestations, réunions ou conférences.

Back  Top

6-44(2019-10-13) Postdoctoral Researcher , IRISA, Rennes, France


Postdoctoral Researcher in Multilingual Speech Processing

CONTEXT The Expression research team focuses on expressiveness in human-centered data. In this context, the team has a strong activity in the field of speech processing, especially text-to-speech (TTS). This activity is denoted by regular publications in top international conferences and journals, exposing contributions in topics like machine learning (including deep learning), natural language processing, and speech processing. Team Expression takes part in multiple collaborative projects. Among those, the current position will take part in a large European H2020 project focusing on the social integration of migrants in Europe. Team’s website: https://www-expression.irisa.fr/

PROFILE Main tasks: 1. Design multilingual TTS models (acoustic models, grapheme-to-phoneme, prosody, text normalization...) 2. Take part in porting the team’s TTS system for embedded environments 3. Develop spoken language skill assessment methods

Secondary tasks: 1. Collect speech data 2. Define use cases with the project partners

Environment: The successful candidate will integrate a team of other researchers and engineers working on the same topics.

Required qualification: PhD in computer science or signal processing Skills: Statistical machine learning and deep learning Speech processing and/or natural language processing Strong object-oriented programming skills Android and/or iOS programming are a strong plus

CONTRACT Duration: 22 months, full time Salary: competitive, depending on the experience. Starting date: 1st, January 2020.

APPLICATION & CONTACTS Send a cover letter, a resume, and references by email to: Arnaud Delhay, arnaud.delhay@irisa.fr ; Gwénolé Lecorvé, gwenole.lecorve@irisa.fr ; Damien Lolive, damien.lolive@irisa.fr . Application deadline: 15th, November 2019. Applications will be processed on a daily basis.

Back  Top

6-45(2019-10-16) Position in Machine Learning/AI at ReadSpeaker, The Netherlands

ReadSpeaker has a job opening for a Machine Learning / AI person working on text-to-speech research and development. Job ad can be found here:


https://www.readspeaker.com/careers/machine-learning-ai-for-text-to-speech-synthesis-research-and-development-to-deliver-business-solutions/

Back  Top

6-46(2019-10-18) FULLY FUNDED FOUR-YEAR PHD STUDENTSHIPS, University Edingurgh, Scotland

FULLY FUNDED FOUR-YEAR PHD STUDENTSHIPS

UKRI CENTRE FOR DOCTORAL TRAINING IN NATURAL LANGUAGE PROCESSING

at the University of Edinburgh?s School of Informatics and School of Philosophy, Psychology and Language Sciences.

Applications are now sought for the CDT?s second cohort of students to start in September 2020

Deadlines:
* Non EU/UK : 29th November 2019
* EU/UK : 31st January 2020.

The CDT in NLP offers unique, tailored doctoral training comprising both taught courses and a doctoral dissertation. Both components run concurrently over four years.

Each student will take a set of courses designed to complement their existing expertise and give them an interdisciplinary perspective on NLP. They will receive full funding for four years, plus a generous allowance for travel, equipment and research costs.

The CDT brings together researchers in NLP, speech, linguistics, cognitive science and design informatics from across the University of Edinburgh. Students will be supervised by a team of over 40 world-class faculty and will benefit from cutting edge computing and experimental facilities, including a large GPU cluster and eye-tracking, speech, virtual reality and visualisation labs.

The CDT involves over 20 industrial partners, including Amazon, Facebook, Huawei, Microsoft, Mozilla, Reuters, Toshiba, and the BBC. Close links also exist with the Alan Turing Institute and the Bayes Centre.

A wide range of research topics fall within the remit of the CDT:

  • Natural language processing and computational linguistics

  • Speech technology

  • Dialogue, multimodal interaction, language and vision

  • Information retrieval and visualization, computational social science

  • Computational models of human cognition and behaviour, including language and speech processing

  • Human-Computer interaction, design informatics, assistive and educational technology

  • Psycholinguistics, language acquisition, language evolution, language variation and change

  • Linguistic foundations of language and speech processing

The second cohort of CDT students will start in September 2020 and is now open to applications.

Around 12 studentships are available, covering maintenance at the research council rate (https://www.ukri.org/skills/funding-for-research-training , currently £15,009 per year) and tuition fees. Studentships are available for UK, EU and non-EU nationals. Individuals in possession of other funding scholarships or industry funding are also welcome to apply ? please provide details of your funding source on your application.

Applicants should have an undergraduate or master?s degree in computer science, linguistics, cognitive science, AI, or a related discipline. We particularly encourage applications from women, minority groups and members of other groups that are underrepresented in technology.

Further details including the application procedure can be found at: https://edin.ac/cdt-in-nlp

Application Deadlines
In order to ensure full consideration for funding, completed applications (including all supporting documents) need to be received by:

29th November 2019 (non EU/UK) or 31st January 2020 (EU/UK).

CDT in NLP Open Days
Find out more about the programme by attending the PG Open Day at the School of Informatics or by joining one of the CDT in NLP Virtual Open Days:

Enquiries can be made to the CDT admissions team at cdt-nlp-info@inf.ed.ac.uk

Back  Top

6-47(2019-10-19) Postdoctoral Scholar, University South California, USA
Open Position - Postdoctoral Scholar for Multimodal Machine Learning and Natural Language Processing
 

The University of Southern California?s Institute for Creative Technologies (ICT) is an off-campus research facility, located on a creative business campus in the ?Silicon Beach? neighborhood of Playa Vista. We are world leaders in innovative training and education solutions, computer graphics, computer simulations, and immersive experiences for decision-making, cultural awareness, leadership and health.  ICT employees are encouraged to develop themselves both professionally and personally, through workshops, invited guest talks, movie nights, social events, various sports teams, a private gym and a personal trainer. The atmosphere at ICT is informal and flexible, while encouraging initiative, personal responsibility and a high work ethic.

We are looking for an accomplished recent PhD graduate to work on a challenging yet exciting NIH-funded 4-year research project.  The project seeks to understand the process and success of Motivational Interviewing (MI). Specifically, our project will address shortcomings of current MI coding systems by introducing a novel computational framework that leverages our recent advances in automatic verbal and nonverbal behavior analyses as well as multimodal machine learning. Our framework aims to jointly analyze verbal (i.e., what is being said), nonverbal (i.e., how something is said), and dyadic (i.e., in what interpersonal context something is said) behavior to better identify in-session patient behavior that is predictive of post-session alcohol use. The project is heavily focused on machine learning, NLP, and data mining; it requires no data collection as all data has already been collected.

We are looking to add a talented machine learning (NLP, CV, or signal processing focus) Postdoctoral Research Associate to our interdisciplinary team of machine learning scientists, affective computing experts, and psychiatrists.  Join our team's mission to better understand therapy processes and predict outcomes!

Responsibilities include:

? Design and implement state-of-the-art NLP machine learning algorithms to automatically code dyadic MI therapy sessions and predict behavior change in patients.
? Push the envelope on current NLP and multimodal machine learning algorithms to better understand the MI process and outcome.
? Conduct statistical analysis on verbal, nonverbal and dyadic behavioral patterns to describe their relationship with the MI process and outcome.
? Write and lead authorship of high impact conference (ACL, EMNLP, ICMI, CVPR, ICASSP, and Interspeech) and journal papers (PAMI, TAFFC, and TASLP).
? Support and lead graduate, undergraduate students, and summer interns to preprocess and annotate multimodal MI data.
 

Work collaboratively with:               

? Domain experts of MI research to automatically derive meaningful insights for MI research Experts.
? Computer scientists across departments at the highly accomplished and interdisciplinary USC Institute for Creative Technologies


  Have fun & learn while working at ICT with a great team and an incredible mission!


Minimum Education: PhD in computer science or engineering with a focus on NLP, CV signal processing or multimodal machine learning. 

Minimum Experience: At least 1 year of experience working with data compromising human verbal and/or nonverbal behavior. 
Minimum Field of Expertise: Directly related education in research specialization with advanced knowledge of equipment, procedures, and analysis methods. 
Skills: Comfortable with machine learning frameworks such as PyTorch or Tensorflow Excellent programming skills in Python or C++ Analysis Assessment/evaluation Communication-written and oral skills Organization Planning Problem identification and resolution Project management Research
 
 
 
Back  Top

6-48(2019-11-03) Ingénieur de recherche, IRIT, Toulouse France

Dans le cadre du laboratoire commun ALAIA, l'IRIT (équipe SAMoVA https://www.irit.fr/SAMOVA/site/) recrute un ingénieur de recherche en CDD pour intégrer son équipe de recherche, travailler dans le domaine de l'IA appliquée à l'apprentissage des langues étrangères et collaborer avec la société Archean Technologie (http://www.archean.tech/archean-labs-en.html).

Poste à pourvoir : Ingénieur de recherche
Durée: 12 à 18 mois
Prise de poste : possible dès le 1er décembre 2019
Domaine : traitement de la parole, machine learning, analyse automatique de la prononciation 
Lieu : Institut de Recherche en Informatique de Toulouse (Université Paul Sabatier) -  Équipe SAMoVA 
Profil recherché : titulaire d'un doctorat en informatique, machine learning, traitement de l'audio. 
Contact : Isabelle Ferrané (isabelle.ferrane@irit.fr)  
Dossier de candidature : CV, résumé de la thèse, lettre de motivation, recommandations/contacts
Détail de l'offre :  https://www.irit.fr/SAMOVA/site/assets/files/engineer/ALAIA_ResearchEngineerPosition(1).pdf
Salaire : selon expérience 

Back  Top

6-49(2019-11-05) Annotateur/Transcripteur H/F at ZAION, Paris, France

ZAION (https://www.zaion.ai) est une société innovante en pleine expansion spécialisée dans la technologie des robots conversationnels : callbot et chatbot intégrant de l?Intelligence Artificielle.

ZAION a développé une solution qui s?appuie sur une expérience de plus de 20 ans de la Relation Client. Cette solution en rupture technologique reçoit un accueil très favorable au niveau international et nous comptons déjà 12 clients actifs (GENERALI, MNH, APRIL, CROUS, EUROP ASSISTANCE, PRO BTP ?).

Nous sommes actuellement parmi les seuls au monde à proposer une offre de ce type entièrement tournée vers la performance. Nous rejoindre, c?est prendre part à une belle aventure au sein d?une équipe dynamique qui a l?ambition de devenir la référence sur le marché des robots conversationnels.

Au sein de notre activité Intelligence Artificielle, pour appuyer ses innovations constantes concernant l'identification automatique des sentiments et émotions au sein d'interactions conversationnelles téléphoniques, nous recrutons un Annotateur/Transcripteur H/F :

 

Ses missions principales :

  • ANNOTER avec exactitude les échanges entre un client et son conseiller selon des balises expliquées sur un guide,
  • travailler avec minutie à partir de documents audio et texte en français,
  • se familiariser rapidement avec un logiciel d'annotation dédié,
  • connaître les outils de travail collaboratif,
  • utiliser ses connaissances culturelles, langagières et grammaticales pour rendre compte avec une grande précision non seulement de la conversation entre deux interlocuteurs sur un sujet donné mais aussi de la segmentation de leurs propos.

  • Le profil du candidat :
  •  être locuteur natif et avoir une orthographe irréprochable,
  • avoir une très bonne maîtrise des environnements Mac OU Windows OU Linux, - faire preuve de rigueur, d?écoute et de discrétion.

     Contrat en CDD (temps complet), basé à Paris (75017)

    Si intéressé(e), prière de contacter Anne le Gentil/RRH à l?adresse suivante : alegentil@zaion.ai en joignant au mail un C.V
Back  Top

6-50(2019-11-05) Data Scientist /Machine Learning appliqué à l'Audio H/F, at Zaion, Paris, France

ZAION est une société innovante en pleine expansion spécialisée dans la technologie des robots conversationnels : callbot et chatbot intégrant de l?Intelligence Artificielle.

ZAION a développé une solution qui s?appuie sur une expérience de plus de 20 ans de la Relation Client. Cette solution en rupture technologique reçoit un accueil très favorable au niveau international et nous comptons déjà 18 clients actifs (GENERALI, MNH, APRIL, CROUS, EUROP ASSISTANCE, PRO BTP ?).

Nous sommes actuellement parmi les seuls au monde à proposer une offre de ce type entièrement tournée vers la performance. Nous rejoindre, c?est prendre part à une aventure passionnante au sein d?une équipe ambitieuse afin de devenir la référence sur le marché des robots conversationnels.

Nous rejoindre, c?est prendre part à une aventure passionnante et innovante afin de devenir la référence sur le marché des robots conversationnels. Dans le cadre de son développement ZAION recrute son Data Scientist /Machine Learning appliqué à l?Audio H/F. Au sein de l?équipe R&D, votre rôle est stratégique dans le développement et l?expansion de la société. Vous développerez, une solution qui permet de détecter les émotions dans les conversations. Nous souhaitons augmenter les fonctionnalités cognitives de nos callbots afin qu?ils puissent détecter les émotions de leurs interlocuteurs (joie, stress, colère, tristesse?) et donc adapter leurs réponses en conséquence.

Vos missions principales :

- Vous participez à la création du pôle R&D de ZAION et piloterez à votre arrivée votre premier projet de reconnaissance d?émotion dans la voix.

- Construisez, adaptez et faites évoluer nos services de détection d?émotion dans la voix 

- Analysez de bases de données conséquentes de conversations pour en extraire les conversations émotionnellement pertinentes

- Construisez une base de données de conversations labelisées avec des étiquettes émotionnelles

- Formez et évaluez des modèles d'apprentissage automatique pour la classification d?émotion

- Déployez vos modèles en production

- Améliorez en continue le système de détection des émotions dans la voix 

Qualifications requises et expérience antérieure :

-Vous avez une expérience de 2 ans minimum comme Data Scientist/Machine Learning appliqué à l?Audio

- Diplômé d?une école d?Ingénieur ou Master en informatique ou un doctorat en informatique mathématiques avec des compétences solides en traitements de signal (audio de préférence)

- Solide formation théorique en apprentissage machine et dans les domaines mathématiques pertinents (clustering, classification, factorisation matricielle, inférence bayésienne, deep learning...)

- La mise à disposition de modèles d'apprentissage machine dans un environnement de production serait un plus

- Vous maîtrisez un ou plusieurs des langages suivants : Python, Frameworks de machine Learning/Deep Learning (Pytorch, TensorFlow,Sci-kit learn, Keras) et Javascript

- Vous maîtrisez les techniques du traitement du signal audio

- Une expérience confirmée dans la labélisation de grande BDD (audio de préférence) est indispensable ;

- Votre personnalité : Leader, autonome, passionné par votre métier, vous savez animer une équipe en mode projet

- Vous parlez anglais couramment

Merci d?envoyer votre candidature à : alegentil@zaion.ai

 

Très cordialement


Anne le Gentil/RRH

alegentil@zaion.ai/0662339864

 

https://www.linkedin.com/company/zaion-callbot/

Back  Top

6-51(2019-11-25) Offre de stage, INRIA Bordeaux, France

Offre de stage M2 (Informatique/traitement du signal)



Deep Learning pour la classification entre la maladie de Parkinson et l'atrophie multisystématisée par analyse du signal vocal



La maladie de Parkinson (MP) et l'atrophie multisystématisée (AMS) sont des maladies neurodégénératives. AMS appartient au groupe des troubles parkinsoniens atypiques. Dans les premiers stades de la maladie, les symptômes de MP et AMS sont très similaires, surtout pour AMS-P où le syndrome parkinsonien prédomine. Le diagnostic différentiel entre AMS-P et MP peut être très difficile dans les stades précoces de la maladie, tandis que la certitude de diagnostic précoce est importante pour le patient en raison du pronostic divergent. Malgré des efforts récents, aucun marqueur objectif valide n'est actuellement disponible pour guider le clinicien dans ce diagnostic différentiel. La besoin de tels marqueurs est donc très élevé dans la communauté de la neurologie, en particulier compte tenu de la gravité du pronostic AMS.

Il est établi que les troubles de la parole, communément appelés dysarthrie, sont un symptôme précoce commun aux deux maladies et d'origine différente. Nous menons ainsi des recherches qui consistent à utiliser la dysarthrie, grâce à un traitement numérique des enregistrements vocaux des patients, comme un vecteur pour distinguer entre MP et AMS-P. Nous coordonnons actuellement un projet de recherche sur cette thématique avec des partenaires cliniciens, neurologues et ORL, des CHU de Bordeaux et Toulouse. Dans le cadre de ce projet nous disposons d?une base de données d?enregistrements vocaux de patients MP et AMS-P (et de sujets saints).

 

Le but de ce stage est d?explorer des techniques récentes de Deep Leaning pour effectuer la classification entre MP et AMS-P. La première étape du stage consistera en l?implémentation d?un système baseline utilisant des outils standards et en se basant sur la méthodologie décrite dans [1]. Cette dernière traite la classification entre MP et les sujets saints et utilise des «chunks » de spectrogrammes comme entrée à un réseau neuronale convolutionnel (CNN). Cette méthodologie sera appliquée à la tâche MP vs AMS-P en utilisant notre base de données. L?implémentation du CNN se fera avec Keras-Tensorflow (https://www.tensorflow.org/guide/keras). L?extraction des paramètres du signal vocal sera effectuée par Matlab et le logiciel Praat (http://www.fon.hum.uva.nl/praat/). Cette étape permettra au stagiaire d?assimiler les briques de base du Deep Learning et de l?analyse la voix pathologique.

 

La deuxième étape de stage consistera à développer un réseau de neurones profonds (DNN) qui prend en entrée des représentations acoustiques dédiées à la tâche MP vs AMS-P et développés par notre équipe. Il s?agira de :

  • construire le bon jeu de données

  • définir la bonne classe de DNN à utiliser

  • construire la bonne architecture du DNN

  • poser la bonne fonction objective à optimiser

  • analyser et comparer les performances de classification

Cette étape nécessitera une meilleure compréhension des aspects théoriques et algorithmiques du Deep Learning.

 

Pré-requis : Une bonne connaissance des techniques standards en apprentissage statistique (Machine Learning) et de leur conceptualisation est nécessaire. Un bon niveau en programmation Python est aussi nécessaire. Des connaissances en traitement du signal/image et/ou Deep Learning seraient avantageuses. Un test sera effectué pour vérifier ces pré-requis.

 

Responsable du stage : Khalid Daoudi (khalid.daoudi@inria.fr)

Lieu du stage :Équie GeoStat (https://geostat.bordeaux.inria.fr)

INRIA Bordeaux Sud-Ouest (https://www.inria.fr/centre/bordeaux)

Durée du Stage :4 à 6 mois à partir de Février 2020

munération : Gratification standard (~580euros/mois)

 

Le (la) candidat(e) doit envoyer un CV détaillé ainsi que le nom et coordonnées d?au moins une référence à khalid.daoudi@inria.fr.

 

Le stage pourrait déboucher sur une offre de thèse.

 

[1] Convolutional neural network to model articulation impairments in patients with Parkinson?s disease

VJ. C. Vásquez-Correa, J. R. Orozco-Arroyave, and E. Nöth

in Proceedings of INTERSPEECH?2017

Back  Top

6-52(2019-11-15) 13 PhD studentships at UKRI Centre for Doctoral Training (CDT), University of Sheffield, UK

UKRI Centre for Doctoral Training (CDT) in Speech and Language Technologies (SLT) and their Applications 

 

Department of Computer Science

Faculty of Engineering 

University of Sheffield

 

Fully-funded 4-year PhD studentships for research in Speech and Language Technologies (SLT) and their Applications

** Apply now for September 2020 intake. Up to 13 studentships available **

Deadline for applications: 31 January 2020. 

What makes the SLT CDT different:

  • Unique Doctor of Philosophy (PhD) with Integrated Postgraduate Diploma (PGDip) in SLT Leadership. 

  • Bespoke cohort-based training programme running over the entire four years providing the necessary skills for academic and industrial leadership in the field, based on elements covering core SLT skills, research software engineering (RSE), ethics, innovation, entrepreneurship, management, and societal responsibility.  

  • The centre is a world-leading hub for training scientists and engineers in SLT ? two core areas within artificial intelligence (AI) which are experiencing unprecedented growth and will continue to do so over the next decade.

  • Setting that fosters interdisciplinary approaches, innovation and engagement with real world users and awareness of the social and ethical consequences of work in this area.

 

The benefits:

  • Four-year fully-funded studentship covering all fees and an enhanced stipend (£17,000 pa)

  • Generous personal allowance for research-related travel, conference attendance, specialist equipment, etc.

  • A full-time PhD with integrated PGDip incorporating 6 months of foundational SLT training prior to starting your research project 

  • Supervision from a team of over 20 internationally leading SLT researchers, covering all core areas of modern SLT research, and a broader pool of over 50 academics in cognate disciplines with interests in SLTs and their application

  • Every PhD project underpinned by a real-world application, directly supported by one of over 30 industry partners. Partners include Google, Amazon, Microsoft, Nuance, NHS Digital and many more

  • A dedicated CDT workspace within a collaborative and inclusive research environment hosted by the Department of Computer Science

  • Work and live in Sheffield - a cultural centre on the edge of the Peak District National Park which is in the top 10 most affordable and safest UK university cities.

 

About you:

We are looking for students from a wide range of backgrounds interested in Speech and Language Technologies. 

  • High-quality (ideally first class) undergraduate or masters (ideally distinction) degree in a relevant discipline. Suitable backgrounds include (but not limited to) computer science, informatics, engineering, linguistics, speech and language processing, mathematics, cognitive science, AI, physics, or a related discipline. 

  • Regardless of background, you must be able to demonstrate mathematical aptitude (minimally to A-Level standard or equivalent) and experience of programming.

  • We particularly encourage applications from groups that are underrepresented in technology.

  • Candidates must satisfy the UKRI funding eligibility criteria. Students must have settled status in the UK and have been ?ordinarily resident? in the UK for at least 3 years prior to the start of the studentship. Full details of eligibility criteria can be found on our website.

 

Applying:

Applications are now sought for the September 2020 intake. Up to 13 studentships available.

 

We operate a staged admissions process, with application deadlines throughout the year. 

The first deadline for applications is 31 January 2020. The second deadline is 31 May 2020. 

Applications will be reviewed within 4 weeks of each deadline and short-listed applicants will be invited to interview. Interviews will be held in Sheffield.

In some cases, because of the high volume of applications we receive, we may need more time to assess your application. If this is the case, we will let you know if we intend to do this.

We may be able to consider applications received after 31 May 2020 if places are still available. Equally, all places may be allocated after the first deadline therefore we encourage you to apply early.

 

See our website for full details and guidance on how to apply: slt-cdt.ac.uk 

For an informal discussion about your application please contact us by email at: sltcdt-enquiries@sheffield.ac.uk

 

By replying to this email or contacting sltcdt-enquiries@sheffield.ac.uk you consent to being contacted by the University of Sheffield in relation to the CDT. You are free to withdraw your permission in writing at any time.

Back  Top

6-53(2019-11-21) Bourses en études françaises (MA et PhD) à l'université Western, Canada

 


Bourses en études françaises (MA et PhD) à l'université Western

 

Le département d’études françaises de l’université Western (London, Canada) accepte maintenant les demandes d’admission pour l’année académique 2020-2021 pour ses programmes de maîtrise et de doctorat, dans les domaines de la linguistique et de la littérature. L’université Western est reconnue comme une des grandes universités de recherche en Ontario et le département d’études françaises participe activement à maintenir sa réputation depuis plus de 50 ans.

 

Le corps professoral et l’ensemble des étudiants et étudiantes participant aux programmes d’études supérieures forment une communauté internationale diversifiée. Nous offrons la possibilité de conduire un programme de recherche en linguistique formelle (syntaxe, morphologie, phonologie et sémantique) de même qu’en sociolinguistique.Nous offrons aussi une formation en littérature dans tous les siècles et tous les domaines de la littérature française et francophone, domaines dans lesquels nos étudiants et étudiantes conduisent leur recherche.

 

Vous pouvez vous renseigner quant aux champs d’intérêt du corps professoral en cliquant ici : https://www.uwo.ca/french/people/faculty/index.html.

 

Vous trouverez la liste des thèses et mémoires complétés depuis 2003 ici : https://www.uwo.ca/french/graduate/thesis/index.html.

 

Date limite pour le premier appel donnant accès au financement à partir de septembre 2020: 1er février 2020

 

Les candidatures canadiennes et internationales retenues pour le programme de doctorat reçoivent une bourse d’études d’une durée de quatre ans couvrant les frais de scolarité ainsi qu’un assistanat d’enseignement annuel d’une valeur minimale de $13 000. Le même financement est offert aux étudiants canadiens acceptés à la maîtrise pour une durée d’une année. Les étudiants internationaux acceptés au programme de maîtrise reçoivent un montant forfaitaire de $3 000 pour toute la durée du programme.

 

En plus des bourses de cycles supérieurs, le département d’études françaises offre aux étudiants et aux étudiantes qui maintiennent un dossier académique de qualité une aide financière pour effectuer des voyages de recherche ou pour prendre part à des colloques, ainsi que la possibilité de remplacer l’assistanat d’enseignement par une bourse de recherche d’une valeur équivalente. Plusieurs étudiants de notre programme de doctorat profitent aussi d’un régime de cotutelle avec une université française.

 

Pour plus d’information concernant l’aide financière offerte par notre institution, veuillez communiquer directement avec le département d’études françaises ou consultez le lien suivant :http://www.uwo.ca/french/graduate/finances/index.html .

 

Nous offrons aussi un excellent programme de formation des assistants d’enseignement de même que plusieurs activités de développement professionnel.

 

 

 

Directeur des cycles supérieures : François Poiré (fpoire@uwo.ca)

Adjointe aux cycles supérieurs : Chrisanthi Ballas (frgrpr@uwo.ca)

Pour nous joindre :http://www.uwo.ca/french/graduate/programs/index.html

 

url de référence

http://www.uwo.ca/french/graduate



Back  Top

6-54(2019-11-22) Master R2 Internship, Loria-Inria, Nancy, France

Master R2 Internship in Natural Language Processing: weakly supervised learning for hate speech detection

Supervisors: Irina Illina, MdC, Dominique Fohr, CR CNRS

Team: Multispeech, LORIA-INRIA

Contact: illina@loria.fr, dominique.fohr@loria.fr

Duration: 5-6 months

Deadline to apply : March 1th, 2020

Required skills: background in statistics, natural language processing and computer program skills (Perl, Python). Candidates should email a detailed CV with diploma

Motivations and context

Recent years have seen a tremendous development of Internet and social networks. Unfortunately, the dark side of this growth is an increase in hate speech. Only a small percentage of people use the Internet for unhealthy activities such as hate speech. However, the impact of this low percentage of users is extremely damaging.

Hate speech is the subject of different national and international legal frameworks. Manual monitoring and moderating the Internet and the social media content to identify and remove hate speech is extremely expensive. This internship aims at designing methods for automatic learning of hate speech detection systems on the Internet and social media data. Despite the studies already published on this subject, the results show that the task remains very difficult (Schmidt et al., 2017; Zhang et al., 2018).

In text classification, text documents are usually represented in some so-called vector space and then assigned to predefined classes through supervised machine learning. Each document is represented as a numerical vector, which is computed from the words of the document. How to numerically represent the terms in an appropriate way is a basic problem in text classification tasks and directly affects the classification accuracy. Developments in Neural Network led to a renewed interest in the field of distributional semantics, more specifically in learning word embeddings (representation of words in a continuous space). Computational efficiency was one big factor which popularized word embeddings. The word embeddings capture syntactic as well as semantic properties of the words (Mikolov et al., 2013). As a result, they outperformed several other word vector representations on different tasks (Baroni et al., 2014).

Our methodology in the hate speech detection is related on the recent approaches for text classification with Neural Networks and word embeddings. In this context, fully connected feed forward networks, Convolutional Neural Networks (CNN) and also Recurrent/Recursive Neural Networks (RNN)  have been applied. On the one hand, the approaches based on CNN and RNN capture rich compositional information, and have outperformed the state-of-the-art results in text classification; on the other hand they are computationally intensive and require huge corpus of training data.

To train these DNN hate speech detection systems it is necessary to have a very large corpus of training data. This training data must contains several thousands of social media comments and each comment should be labeled as hate or not hate. It is easy to automatically collect social media and Internet comments. However, it is time consuming and very costly to label huge corpus. Of course, for several hundreds of comments this work can be manually performed by human annotators. But it is not feasible to perform this work for a huge corpus of comments. In this case weakly supervised learning can be used : the idea is to train a deep neural network with a limited amount of labelled data.

The goal of this master internship is to develop a methodology to weakly supervised learning of a hate speech detection system using social network data (Twitter, YouTube, etc.).

Objectives

In our Multispeech team, we developed a baseline system for automatic hate speech detection. This system is based on fastText and BERT embeddings (Bojanowski  et al., 2017; Devlin et al, 2018) and the methodology of CNN/RNN. During this internship, the master student will work on this system in following directions:

  • Study of the state-of-the-art approaches in the field of weakly supervised learning;
  • Implementation of a baseline method of weakly supervised learning for our system;
  • Development of a new methodology for weakly supervised learning. Two cases will be studied. In the first case, we train the hate speech detection system using a small labeled corpus. Then, we proceed incrementally. We use this first system to label more data, we retrain the system and use it to label new data, In the second case, we refer to learning with noisy labels (labels that can be not correct or given by several annotators who do not agree).

References

Baroni, M., Dinu, G., and Kruszewski, G.  ?Don?t count, predict! a systematic comparison of context-counting vs. contextpredicting semantic vectors?. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Volume 1, pages 238-247, 2014.

Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. ?Enriching word vectors with subword information?. Transactions of the Association for Computational Linguistics, 5:135?146, 2017.

Dai, A. M. and Le, Q. V. ?Semi-supervised sequence Learning?. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems 28, pages 3061-3069. Curran Associates, Inc, 2015.

Devlin J.,   Chang M.-W., Lee K., Toutanova K. ?BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding?, arXiv:1810.04805v1, 2018.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. ?Distributed representations of words and phrases and their Compositionality?. In Advances in Neural Information Processing Systems, 26, pages 3111-3119. Curran Associates, Inc, 2013b.

Schmidt A., Wiegand M. ?A Survey on Hate Speech Detection using Natural Language Processing?, Workshop on Natural Language Processing for Social Media, 2017.

Zhang, Z., Luo, L. ?Hate speech detection: a solved problem? The Challenging Case of Long Tail on Twitter?. arxiv.org/pdf/1803.03662, 2018.

 

Back  Top

6-55(2019-11-25) Annotateur/Transcripteur, ZAION, Paris, France

ZAION (https://www.zaion.ai) est une société innovante en pleine expansion spécialisée dans la technologie des robots conversationnels : callbot et chatbot intégrant de l?Intelligence Artificielle.

ZAION a développé une solution qui s?appuie sur une expérience de plus de 20 ans de la Relation Client. Cette solution en rupture technologique reçoit un accueil très favorable au niveau international et nous comptons déjà 12 clients actifs (GENERALI, MNH, APRIL, CROUS, EUROP ASSISTANCE, PRO BTP ?).

Nous sommes actuellement parmi les seuls au monde à proposer une offre de ce type entièrement tournée vers la performance. Nous rejoindre, c?est prendre part à une belle aventure au sein d?une équipe dynamique qui a l?ambition de devenir la référence sur le marché des robots conversationnels.

Au sein de notre activité Intelligence Artificielle, pour appuyer ses innovations constantes concernant l'identification automatique des sentiments et émotions au sein d'interactions conversationnelles téléphoniques, nous recrutons un Annotateur/Transcripteur H/F :

 

Ses missions principales :

  • ANNOTER avec exactitude les échanges entre un client et son conseiller selon des balises expliquées sur un guide,
  • travailler avec minutie à partir de documents audio et texte en français,
  • se familiariser rapidement avec un logiciel d'annotation dédié,
  • connaître les outils de travail collaboratif,
  • utiliser ses connaissances culturelles, langagières et grammaticales pour rendre compte avec une grande précision non seulement de la conversation entre deux interlocuteurs sur un sujet donné mais aussi de la segmentation de leurs propos.

     

    Le profil du candidat :
  •  être locuteur natif et avoir une orthographe irréprochable,
  • avoir une très bonne maîtrise des environnements Mac OU Windows OU Linux, - faire preuve de rigueur, d?écoute et de discrétion.  

     Contrat en CDD (temps complet ou partiel), basé à Paris (75017)

     Si intéressé(e), prière de contacter Anne le Gentil/RRH à l?adresse suivante : alegentil@zaion.ai en joignant au mail un C.V
Back  Top

6-56(2019-12-02) 2 postes d'enseignant-chercheur, Université Paris-Saclay, France

2 postes d'enseignant-chercheur (un PR et un MC) vont être mis au concours par
l'Université Paris-Saclay en section 27 lors du concours de 2020, avec des profils en
Traitement des Langues, dont la Parole en priorité et une recherche qui se fera au LIMSI.

Les deux profils sont détaillés ici :

https://www.limsi.fr/fr/limsi-emplois/offres-de-postes-chercheurs-et-enseignants-chercheurs N'hésitez pas à prendre contact si l'un des postes vous intéresse (dir@limsi.fr), et à faire savoir autour de vous l'existence de ces
postes.

Back  Top

6-57(2019-12-03) Ph studentships, University of Glasgow, UK

The School of Computing Science at the University of Glasgow is offering studentships and excellence bursaries for PhD study. The following sources of funding are available:

 

* EPSRC DTA awards: open to UK or EU applicants who have lived in the UK for at least 3 years (see https://epsrc.ukri.org/skills/students/help/eligibility/) - covers fees and living expenses

* College of Science and Engineering Scholarship: open to all applicants (UK, EU and International) - covers fees and living expenses

* Centre for Doctoral Training in Socially Intelligent Artificial Agents: open to UK or EU applicants who have lived in the UK for at least 3 years through a national competition – see https://socialcdt.org

* China Scholarship Council Scholarship nominations: open to Chinese applicants – covers fees and living expenses

* Excellence Bursaries: full fee discount for UK/EU applicants; partial discount for international applicants

* Further scholarships (contact potential supervisor for details): open to UK or EU applicants

 

Whilst the above funding is open to students in all areas of computing science, applications in the area of Human-Computer Interaction are welcomed. 

 

Please find below a list of Available supervisors in HCI and their research areas.

 

Available supervisors and their research topics:  

* Prof Stephen Brewster (http://mig.dcs.gla.ac.uk/): Multimodal Interaction, MR/AR/VR, Haptic feedback. Email: Stephen.Brewster@glasgow.ac.uk

* Prof Matthew Chalmers (https://www.gla.ac.uk/schools/computing/staff/matthewchalmers/): mobile and ubiquitous computing, focusing on ethical systems design and healthcare applications. Email: Matthew.Chalmers@glasgow.ac.uk

* Prof Alessandro Vinciarelli (http://www.dcs.gla.ac.uk/vincia/): Social Signal Processing. Email: Alessandro.Vinciarelli@glasgow.ac.uk
* Dr Mary Ellen Foster (http://www.dcs.gla.ac.uk/~mefoster/): Social Robotics, Conversational Interaction, Natural Language Generation. Email: MaryEllen.Foster@glasgow.ac.uk
* Dr Euan Freeman (http://euanfreeman.co.uk/): Interaction Techniques, Haptics, Gestures, Pervasive Displays. Email: Euan.Freeman@glasgow.ac.uk

* Dr Fani Deligianni (http://fdeligianni.site/): Characterising uncertainty, eye-tracking, EEG, bimanual teleoperations. Email: fadelgr@gmail.com

* Dr Helen C. Purchase (http://www.dcs.gla.ac.uk/~hcp/): Visual Communication, Information Visualisation, Visual Aesthetics. Email: Helen.Purchase@glasgow.ac.uk

* Dr Mohamed Khamis (http://mkhamis.com/): Human-centered Security and Privacy, Eye Tracking and Gaze-based Interaction, Interactive Displays. Email: Mohamed.Khamis@glasgow.ac.uk

 

The closing date for applications is 31 January 2020.  For more information about how to apply, see https://www.gla.ac.uk/schools/computing/postgraduateresearch/prospectivestudents.  This web page includes information about the research proposal, which is required as part of your application.

 

Applicants are strongly encouraged to contact a potential supervisor and discuss an application before the submission deadline.

 

Back  Top

6-58(2019-12-03) Poste de chercheur au LIMSI, Orsay, Paris, France

Le LIMSI recrute un chercheur (CC) en traitement automatique des langues et traduction
automatique (H/F). Tous les détails de l'offre sont ici:

https://emploi.cnrs.fr/Offres/CDD/UPR3251-FRAYVO-002/Default.aspx

Back  Top

6-59(2019-12-06) Stage de fin d’études d’Ingénieur ou de Master 2, INA, Bry-sur-Marne, France

Segmentation et détection automatique

des situations conflictuelles en interview politique


Stage de fin d’études d’Ingénieur ou de Master 2 – Année académique 2019-2020

Mots clés : Machine Learning, Diarization, Humanités numériques, parole politique, expressivité

Contexte

L’Institut national de l’audiovisuel (INA) est un établissement public à caractère industriel et

commercial (EPIC), dont la mission principale consiste à archiver et valoriser la mémoire

audiovisuelle française (radio, télévision et web média). L’INA assure également des missions de

recherche scientifique, de formation et de production.

Ce stage s’inscrit le cadre du projet OOPAIP (Ontologie et outil pour l’annotation des interventions

politiques). C’est un projet transdisciplinaire porté par l’INA et le CESSP (Centre européen de

sociologie et de science politique) de l’Université Paris 1 Panthéon-Sorbonne. L’objectif est de

concevoir de nouvelles approches pour élaborer des analyses détaillées, qualitatives et

quantitatives des interventions politiques médiatisés en France. Une part du projet porte sur

l’étude de la dynamique des interactions conflictuelles dans les interviews et débats politiques, ce

qui nécessite une description fine et un large corpus afin de généraliser les modèles. Les verrous

technologiques concernent la performance des algorithmes de segmentation en locuteurs et en

styles de parole. L’amélioration de leur précision, l’ajout de la détection de parole superposée, de

mesures de l’effort vocal et d’éléments expressifs, permettront d’optimiser le travail d’annotation

manuel.

Objectifs du stage

Le stage vise principalement à l’amélioration de la segmentation automatique d’interviews

politiques pour assister les travaux de recherche en science politique. La thématique de recherche

correspondante que nous retiendrons est la mise en évidence des situations conflictuelles. Dans ce

cadre, nous nous intéresserons notamment à la détection du brouhaha (parole superposée). De

manière plus fine, nous aimerions pouvoir extraire des descripteurs du signal de parole corrélés au

niveau de conflictualité des échanges, basés, par exemple, sur le niveau d’activation (niveau

intermédiaire entre le signal et l’expressivité [Rilliard et al, 2018]) ou l’effort vocal [Liénard, 2019].

Le stage pourra s’appuyer initialement sur deux corpus totalisant 30 interviews politiques annotés

finement en tours de paroles — dans le cadre du projet OOPAIP. Il débutera par la réalisation d’un

état de l’art de la diarization (segmentation et regroupement en locuteurs [Broux et al., 2019]) et

de la détection de la parole superposée [Chowdhury et al, 2019]. Il s’agira ensuite de proposer des

solutions basées sur des frameworks récents pour améliorer la localisation des frontières de tours

de parole, notamment lorsque la fréquence des changements de locuteurs est importante — le

cas limite étant la situation du brouhaha.

La seconde partie du stage se penchera sur une mesure plus fine du niveau conflictuel des

échanges, via la recherche des descripteurs les plus pertinents et par la mise au point

d’architecture d’apprentissage pour sa modélisation.

Le langage de programmation utilisé dans le cadre de ce stage sera Python. Le stagiaire aura accès

aux ressources de calcul de l’INA (serveurs et clusters), ainsi qu’à un desktop performant avec 2

GPU de génération récente.

Valorisation du stage

Différentes stratégies de valorisation des travaux du·de la stagiaire seront envisagées, en fonction

du degré de maturité des travaux réalisés :

Diffusion des outils d’analyse réalisés sous licence open-source via le dépôt GitHub de

l’INA : https://github.com/ina-foss

Rédaction de publications scientifiques

Conditions du stage

Le stage se déroulera sur une période de 4 à 6 mois, au sein du service de la Recherche de l’Ina. Il

aura lieu sur le site Bry 2, situé au 18 Avenue des frères Lumière, 94360 Bry-sur-Marne. La·le

stagiaire sera encadré·e par Marc Evrard (mevrard@ina.fr).

Gratification : environ 550 Euros par mois.

Profil recherché

Étudiant·e en dernière année d’un bac +5 dans le domaine de l’informatique et de l'IA.

Compétence en langage Python et expérience dans l’utilisation de bibliothèques de ML

(Scikit-learn, TensorFlow, PyTorch).

Vif intérêt dans les SHS, les humanités numériques et les sciences politiques en particulier.

Capacité à réaliser une étude bibliographique à partir d’articles scientifiques rédigés en

anglais.

Bibliographie

Broux, P. A., Desnous, F., Larcher, A., Petitrenaud, S., Carrive, J., & Meignier, S. (2018). “S4D: Speaker Diarization

Toolkit in Python”. In Inter-speech 2018.

Chowdhury, S. A., Stepanov, E. A., Danieli, M., Riccardi, G. (2019). “Automatic classification of speech overlaps:

Feature representation and algo-rithms”, Computer Speech & Language, vol. 55, pp.145-167.

Liénard, J.-S. “Quantifying vocal effort from the shape of the one-third octave long-term-average spectrum of speech”

J. Acoust. Soc. Am. 146 (4), Oc-tober 2019.

Rilliard, A., d’Alessandro, C & Evrard, M. (2018). Paradigmatic variation of vowels in expressive speech: Acoustic

description and dimensional analysis. The Journal of the Acoustical Society of America, 143(1), 109–122.

Back  Top

6-60(2019-12-07) Stage à l'IRCAM, Paris, France

Deep Disentanglement of Speaker Identity and Phonetic Content for Voice

Conversion

Dates : 01/02/2020 au 30/06/2020

Laboratoire : STMS Lab (IRCAM / CNRS / Sorbonne Université)

Lieu : IRCAM – Analyse et Synthèse des Sons

Responsables : Nicolas Obin, Axel Roebel

Contact : Nicolas.Obin@ircam.fr, Axel.Roebel@ircam.fr

Contexte :

La conversion de l’identité de la voix consiste à modifier les caractéristiques d’une voix

« source » pour reproduire les caractéristiques d’une voix « cible » à imiter, à partir

d’une collection d’exemples de la voix « cible ». La tâche de conversion d’identité de la

voix s’est largement popularisée ces dernières années avec l’apparition des « deep

fakes », avec comme objectif de transposer les réussites réalisées dans le domaine de

l’image au domaine de la parole. Ainsi, les lignes de recherche actuelles reposent sur des

architectures neuronales comme les modèles séquence-à-séquence, les réseaux

antagonistes génératifs (GAN, [Goodfellow et al., 2014]) et ses variantes pour

l’apprentissage à partir de données non appareillées (Cycle-GAN [Kaneko and

Kamaeoka, 2017] ou AttGAN [He et al., 2019]). Les challenges majeurs de la conversion

d’identité comprennent la possibilité d’apprendre des transformation d’identité

efficacement à partir de petites bases de données (qq minutes) et de séparer les

facteurs de variabilité de la parole afin de modifier uniquement l’identité d’un locuteur

sans modifier ou dégrader le contenu linguistique et expressif de la voix.

Objectifs :

Le travail effectué dans ce stage concernera l’extension du système de conversion

neuronal de l’identité vocale actuellement développée dans le cadre du projet ANR

TheVoice (https://www.ircam.fr/projects/pages/thevoice/). Le focus principal du

stage sera d’intégrer efficacement l’information du contenu linguistique au système de

conversion neuronal existant. Cet objectif passera par la réalisation des tâches

suivantes :

- Développement d’une représentation de l’information phonétique (par ex. sous

forme de Phonetic PosteriorGrams [Sun et al., 2016]) et intégration au système de

conversion actuel.

- Application et approfondissement de techniques de « disentanglement » de l’identité

du locuteur et du contenu phonétique pour l’apprentissage de la conversion

[Mathieu et al., 2016 ; Hamidreza et al., 2019]

- Evaluation des résultats obtenus par comparaison à des systèmes de conversion de

l’état de l’art, sur des bases de référence comme VCC2018 ou LibriSpeech.

Les problèmes abordés pendant le stage seront sélectionnés en début du stage après une

phase d’orientation et une étude bibliographique. Les solutions réalisées au cours du

stage seront intégrées au système de conversion d’identité de la voix de l’Ircam, avec

possibilité d’exploitation industrielle et professionnelle. Par exemple, le système de

conversion d’identité développé à l’Ircam a été exploité dans des projets de production

professionnelle pour recréer des voix de personnalités historiques : le maréchal Pétain

dans le documentaire « Juger Pétain » en 2012, et Louis de Funès dans le film « Pourquoi

j’ai pas mangé mon père » de Jamel Debbouze en 2015.

Le stage s’appuiera sur les connaissances de l’équipe Analyse et Synthèse des Sons du

laboratoire STMS (IRCAM/CNRS/Sorbonne Université) en traitement du signal de parole

et en apprentissage de réseaux de neurones, et sur une grande expérience en

conversion d’identité de la voix [Villavicencio et al., 2009 ; Huber, 2015].

Compétences attendues :

- Maîtrise de l’apprentissage automatique, en particulier de l’apprentissage par

réseaux de neurones ;

- Maîtrise du traitement du signal audio numérique (analyse temps-fréquence, analyse

paramétrique de signaux audio, etc…) ;

- Bonne maîtrise de la programmation Python et de l’environnement TensorFlow ;

- Autonomie, travail en équipe, productivité, rigueur et méthodologie.

Rémunération :

Gratification selon loi en vigueur et avantages sociaux

Date limite de candidature :

20/12/2019

Bibliographie :

[Goodfellow et al., 2014] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David

Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative Adversarial

Networks,” arXiv:1406.2661 [cs, stat], 2014.

[Hamidreza et al., 2019] Seyed Hamidreza Mohammadi, Taehwan Kim. One-shot Voice

Conversion with Disentangled Representations by Leveraging Phonetic Posteriorgrams,

Interspeech 2019.

[He et al., 2019] Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, “Attgan: Facial attribute editing by

only changing what you want.,” IEEE Transactions on Image Processing, vol. 28, no. 11, 2019.

[Huber 2015] S. Huber, “Voice Conversion by modelling and transformation of extended voice

characteristics”, Thèse Université Pierre et Marie Curie (Paris VI), 2015.

[Kanekoa and Kameoka, 2017] TakuhiroKanekoandHirokazuKameoka,“Parallel-Data-Free Voice

Conversion Using Cycle-Consistent Adversarial Net- works,” arXiv:1711.11293 [cs, eess, stat],

2017

[Mathieu et al., 2016] Michael Mathieu, Junbo Zhao, Pablo Sprechmann, Aditya Ramesh, Yann

LeCun. Disentangling factors of variation in deep representations using adversarial training,

NIPS 2016.

[Sun et al., 2016 ]Lifa Sun, Kun Li, Hao Wang, Shiyin Kang, and Helen Meng, “Phonetic

posteriorgrams for many-to-one voice conversion without parallel data training,” in 2016 IEEE

International Conference on Multimedia and Expo (ICME), 2016, pp. 1–6.

[Villavicencio et al., 2009] Villavicencio, F., Röbel, A., and Rodet, X. (2009). Applying improved

spectral modelling for high quality voice conversion. In Proc. of IEEE International Conference on

Acoustics, Speech, and Signal Processing (ICASSP), pages 4285–4288. 17, 41, 45

Back  Top

6-61(2019-12-07) Assistant-e ingénieur-e en production, LPL, Aix en Provence, France

Emploi-type :

Assistant-e ingénieur-e en production, traitement de données et enquêtes BAP D (Donnée en SHS) :

Mission :

Au sein de plateforme expérimentale du Laboratoire Parole et Langage (LPL), l'agent sera chargé de la coordination technique, de l'accueil et du soutien aux expériences en collaboration avec les responsables de secteur (audio-vidéo, articulographie/physiologie, neurophysiologie/eye-tracking).

Activités :

Accueillir et recueillir des informations personnelles relatives aux participants dans le respect de la législation en vigueur (RGPD)
Assurer le recrutement des participants aux expériences
Interfacer avec les chercheurs extérieurs
Suivre et renouveler les consommables
Assurer la réservation des espaces expérimentaux et des matériels, établissement du planning de passation, prises de rendez-vous
Soutenir la mise en place du dispositif expérimental en lien avec le responsable de secteur
Renseignement des cahiers de laboratoire
Assurer les Campagnes permanentes pour la recherche de volontaires
Participer à la rédaction de notices méthodologiques des opérations réalisées
Actualiser ses connaissances disciplinaires et méthodologiques et répertorier la bibliographie consacrée à un champ d'études

Compétences :

Maîtrise des techniques, méthodes, et protocoles expérimentaux en SHS.
Connaissance dans le domaine de la mesure et des statistiques
Travail en collaboration avec les chercheurs dans la conception, la mise en place et la réalisation des expériences
Travail en équipe avec les autres personnels ITA intervenant sur la plateforme.
Sens aigu des relations humaines dans ses interactions avec des investigateurs aux compétences variées (des étudiants de master aux chercheurs étrangers en passant par les chercheurs et doctorants du laboratoire) et avec toutes les catégories de participants, des enfants d'âge scolaire aux adultes et personnes âgées, et dont certains peuvent présenter différentes pathologies.
Connaissance et respect de la législation dans le domaine des recherches sur la personne humaine ainsi que les règles en matière d'hygiène et de sécurité.
Bonne maîtrise de l'anglais parlé (Niveau B2 selon le cadre européen de référence pour les langues) se montrera indispensable
Archivage pérenne de données de recherche (notion)

 

La campagne est ouverte jusqu'au 17 janvier mais l'examen des candidatures se fera au fil de l'eau. N'hésitez pas à diffuser cette information auprès des personnes potentiellement concernées.

 

 

 

 

Back  Top

6-62(2019-12-07) 1 year post-doc/engineer position at LIA, Avignon France

 1 year post-doc/engineer position at LIA, Avignon France, in the Vocal Interaction Group

Multimodal man-robot interface for social spaces

keywords: AI, ML, DNN, RL, NLP, dialogue, vision, robotics

Starting job date (desired): March 2020.
==================================================================
## Work description

###Project Summary

Automation and optimisation of *verbal interactions of a socially-competent robot*,
guided by its *multimodal perceptions*

Facing a steady increase in the ageing population and the prevalence of chronic diseases,
social robots are promising tools to include in the health care system. Yet extant
assistive robots are not well suited to such context as their communication abilities
cannot handle social spaces (several meters and group of persons) but rather face-to-face
individual interactions in quiet environments. In order to overcome these limitations and
eventually aiming at natural man-robot interaction, the objectives of the work will be
multifold.

First and foremost we intend to leverage the rich information available with audio and
visual flows of data coming from humans to extract verbal and non-verbal features. These
features will be used to enhance the robot's decision-making ability such that it can
smoothly take speech turns and switch from interaction with a group of people to
face-to-face dialogue and back. Secondly online and continual learning of the advanced
system will be investigated.

Outcomes of the project will be implemented onto a commercially available social robot
(most likely a Pepper) and validated with several in-situ use cases. A large-scale data
collection will complement in-situ tests to fuel further researches. Essential
competencies to address our overall objectives lie in dialogue systems / NLP, yet
knowledges in vision and robotics would also be necessary. And in any case good command
of deep learning techniques and tools is mandatory (including reinforcement learning for
dialogue strategy training).

### Requirements

- Master or PhD in Computer Science, Machine Learning, Computational Linguistics,
Mathematics, Engineering or related fields
- Expertise in NLP / Dialog systems. Strong knowledge of current NLP / Interactive /
Speech techniques is expected. Previous experience with dialogue and interaction and/or
vision data is a strong plus.
- Knowledge in Vision and/or Robotics are plusses.
? Strong programming skills, Python/C++ programmer of DNN models (preferably with pytorch)
- Expertise in Unix environments
- Good spoken and written command of English is required. *French is optional.*
- Good writing skills, as evidenced through publications at top venues (e.g., ACL, EMNLP,
SigDial etc) is a plus, for post-doc.

## Place

Bordered by the left bank of the Rhône Avignon is one of the most beautiful city in
Provence, for some time capital of Christendom in the Middle Ages. The important remains
of a past rich in history give the city its unique atmosphere: dozens of churches and
chapels, the ?Palais des Papes? (palace of the popes) the most important gothic palace in
Europe), the Saint-Benezet brigde, called the « pont d?Avignon » of worldwide fame
through its commemoration by the song, and the ramparts that still encircle the entire
city, ten museums from then ancient times to contemporary art.

The 94,787 inhabitants of the city, about 12,000 live in the ancient town centre
surrounded by its medieval ramparts. Avignon is not only the birthplace of the most
prestigious festival of contemporary theatre, European Capital of Culture in 2000, but
also the largest city and capital of the département of Vaucluse. The region offers a
high quality of urban life at comparatively still modest costs. In addition to this, the
region of Avignon also offers the opportunity to visit numerous monuments and natural
beauty sites easily accessible in a very short time: Avignon is the ideal destination for
visiting Provence.

LIA is the computer science lab of Avignon University: http://lia.univ-avignon.fr.

## Conditions

Net monthly salary: 1500-2100 ? (depending on the candidate's experience). Basic
healthcare coverage included (https://en.wikipedia.org/wiki/Health_care_in_France).

The position carries no direct teaching load, but if desired, teaching BSc or MSc level
courses is a possibility (paid extra hours), as is supervision of student dissertation
projects.

Initial employment is 12 months, extension is possible. For engineer, shift to a PhD
position is possible.

## Applications

No deadline: applications are possible until the position is filled.

To apply, send the following documents *as a single PDF* to
fabrice.lefevre@univ-avignon.fr:

* Statement of research interests that motivates your application
* CV, including the list of publications if any
* Scans of transcripts and academic degree certificates
* MSc/PhD dissertation and/or any other writing samples
* Coding samples or links to your contributions to public code repositories, if any
* Names, affiliations, and contact details of up to three people who can provide
reference letters for you

Delete | Reply | Reply to List | Forward | Redirect | View Thread | Blacklist | Whitelist | Message Source | Save as | Print
M
Back  Top

6-63​Postdoctoral Fellowship,University of Connecticut Health,Farmington, CT, USA

Postdoctoral Fellowship, Speech Processing in Noise

University of Connecticut Health

Location: Farmington, CT

Start Date: January 2020, or thereafter

Duration: Initially 1 year with potential for extension

Salary: Depends on experience, based on NIH range: benefits include health care, retirement

contributions, and paid leave for vacation, personal days, holidays and sickness.

Application Process: Please send your resumé, a one-page cover letter that describes your

research interests and experience, a list of publications (copies of most relevant - optional), and

contact information for three references to Dr Insoo Kim (ikim@uchc.edu).

A Postdoctoral Fellowship is available in the Division of Occupational Medicine, Department of

Medicine, at the University of Connecticut Health to investigate algorithms for improving speech

intelligibility in environmental noise. The work will involve simulating the noise of machines from

known frequency spectra and creating speech-in-noise test files using MATLAB for replaying to

subjects in listening tests. The test files may be processed electronically to improve intelligibility

before the psychoacoustic testing. The position requires knowledge of, and practical

experience with, speech or audio digital signal processing; proficiency with MATLAB and

Simulink simulations, and; familiarity with psychoacoustic testing of speech intelligibility in noise,

and with the development of embedded systems or digital signal processors.

The Fellow will participate in on-going research projects involving speech processing. He/she

will be responsible for implementing the algorithms for improving speech communication in

noise, conducting all psychoacoustic tests used to establish proof-of-concept, and data analysis

and interpretation. The Fellow will also have opportunities to supervise graduate and

undergraduate students.

Candidates should have good oral and written English communication skills, be capable of

independent work as a part of a multi-disciplinary team, be able to work on multiple projects at

the same time, publish results in academic journals and participate in grant proposal

preparation. They should have a Ph.D. degree in Acoustics, Electrical, Computer, Biomedical

Engineering, or a related field with appropriate experience. The initial appointment is for a

period of one year with potential for further extension. The review of applications will start

immediately and will continue until the position is filled.

Back  Top

6-64(2019-12-08) PhD sudentship, Utrecht University, The Netherlands

The Social and Affective Computing group at the Utrecht University Department of Information and Computing Sciences is looking for a PhD candidate to conduct research on explainable and accountable affective computing for mental healthcare scenarios. The five-year position includes 70% research time and 30% teaching time. The post presents an excellent opportunity to develop an academic profile as a competent researcher and able teacher.

Affective computing has great potential for clinician support systems, but it needs to produce insightful, explainable, and accountable results. Cross-corpus and cross-task generalization of approaches, as well as efficient and effective ways of leveraging multimodality are some of the main challenges in the field. Furthermore, data are scarce, and class-imbalance is expected. While addressing these issues, precision needs to be complemented by interpretability. Potential investigation areas include for example depression, bipolar disorder, and dementia.

The PhD candidate is expected to bridge the research efforts in cross-corpus, cross-task multimodal affect recognition with explainable/accountable machine learning for the aim of efficient, effective and interpretable predictions on a data-scarce and sensitive target problem. The candidate is also expected to be involved in teaching activities within the department of Information and Computing Sciences. Teaching activities may include supporting senior teaching staff, conducting tutorials, and supervising student projects and theses. These activities will contribute to the development of the candidate's didactic skills.

We are looking for candidates with:

  • a Master?s degree in computer science/engineering, mathematics, and/or fields related to the project focus;
  • interest or experience with processing of audio/acoustics, vision/video or natural language;
  • interest or experience with machine learning, affective computing, information fusion, multimodal interaction;
  • demonstrable coding skills in high-level scripting languages such as MATLAB, python or R;
  • excellent English oral and writing skills.

The ideal candidate should express a strong interest in research in affective computing and teaching within the ICS department. The Department finds gender balance specifically and diversity in a broader sense very important; therefore women are especially encouraged to apply. Applicants are encouraged to mention any personal circumstances that need to be taken into account in their evaluation, for example parental leave or military service.

 

We offer an exciting opportunity to contribute to an ambitious and international education programme with highly motivated students and to conduct your own research project at a renowned research university. You will receive appropriate training, personal supervision, and guidance for both your research and teaching tasks, which will provide an excellent start to an academic career.

The candidate is offered a position for five years (1.0 FTE). The gross salary starts at ?2,325 and increases to ?2,972 (scale P according to the Collective Labour Agreement Dutch Universities) per month for a full-time employment. Salaries are supplemented with a holiday bonus of 8% and a year-end bonus of 8.3% per year. In addition, Utrecht University offers excellent secondary conditions, including an attractive retirement scheme, (partly paid) parental leave and flexible employment conditions (multiple choice model). More information about working at Utrecht University can be found here.

Application deadline is 01.01.2020.

 Further information and application procedure can be found here.
 
 

Back  Top

6-65(2019-12-09) Postdoc , IRISA, Rennes, France
IRISA (France) is looking for a 30-month postdoctoral researcher for topic of Natural Language Processing for Kids, starting in Spring 2020.

 

Back  Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA