ISCA - International Speech
Communication Association

ISCApad Archive » 2020 » ISCApad #260 » Jobs

ISCApad #260

Monday, February 10, 2020 by Chris Wellekens

6 Jobs

6-1

(2019-08-02) Research engineer or Post-doc, at Eurecom, Inria, LIA, France

EURECOM (Nice, France), Inria (Nancy, France) and LIA (Avignon, France) are opening a
18-month Research Engineer or Postdoc position on speaker de-identification and voice
privacy.

For more information and to apply:
https://jobs.inria.fr/public/classic/en/offres/2019-01937

Top

6-2

(2019-08-02) Ph.D. position in Softbank robotics and Telecom-Paris, France

Ph.D. position in Softbank robotics and Telecom-Paris

Subject: Automatic multimodal recognition of users? social behaviors
in human-robot interactions (HRI)

*Places of work* Softbank Robotics [SB] (Paris 15e) & Telecom Paris [TP] Palaiseau (Paris outskirt)

*Starting date* December 2019

*Funding* CIFRE http://www.anrt.asso.fr/fr/cifre-7843

*Context*
The research activity of the Ph.D. candidate will contribute to :
- Softbank Robotics robot?s software NAOqi, within the Expressivity team responsible for ensuring an expressive, natural and fun interaction with our robots.
- the Social Computing topic [SocComp.] of the S2a team [SSA] at Telecom-ParisTech, in close collaboration with other researchers and Ph.D. students of the team.

* Candidate profile*
As a minimum requirement, the successful candidate should have:
? A master in one or more of the following areas: human-agent interaction, deep learning, computational linguistics, cognitive sciences, affective computing, reinforcement learning, natural language processing, speech processing
? Excellent programming skills (preferably in Python)
? Excellent command of English
? Very good communication skills, commitment, independent working style as well as initiative and team spirit

Given the multidisciplinary aspect of the subject, priority will be given to multidisciplinary profiles. Ph.D. applicant?s interest in social robotics is required.

*Keywords* Human-Machine Interaction, Social Robotics, Deep Learning, Social Computing, Natural Language Processing, Speech Processing, Computer Vision, Multimodality

*Supervision* :
Industrial: Marine Chamoux (Softbank robotics),
Academic: Chloé Clavel [Clavel], Giovanna Varni [Varni] (Telecom-Paris)

*How to apply*
Applications should be sent as soon as possible (the first review of applications will be made in early September). The application should be formatted as **a single pdf file** and should include:
? A complete and detailed curriculum vitae
? A letter of motivation
? The academic credentials and the transcript of grades
? The contact of two referees

The pdf file should be sent to the three supervisors: mchamoux@softbankrobotics.com, chloe.clavel@telecom-paristech.fr, giovanna.varni@telecom-paristech.fr

*Description*
Social robotics, and more broadly human-agent interaction is a field of human-machine interaction for which the integration of social behaviors is expected to have great potential. 'Socio-emotional behaviors' (emotions, social stances) include thus the role and the reactions of the user towards the robot during an interaction. These behaviors could be expressed differently depending:
-on the user (age, emotional state, ...): some users may have a dominant behavior with the robot, considering it a tool to achieve a goal. Others are more cooperative with the robot, they can be more friendly with it. Still others try to trap or 'troll' the robot.
-on the interaction context (users do not behave in the same way when interacting with a pepper selling toys, or with a pepper bank secretary). Besides, in each of these situations, the robot must be able to adapt its behavior, and to provide a coherent interaction between the user and the robot, avoiding confusion and frustration.

This Ph.D. will focus on multimodal modeling for the prediction of the user's socio-emotional behaviors during interactions with a robot and on building an engine that is robust to real-life scenarios and different contexts. In particular, the Ph.D. candidate will address the following points:
- the encoding of contextual multimodal representations relevant for the modeling of socio-emotional behavior. Thanks to the robot, we have access to a lot of information on context (market, robot intention, demographics, multi or mono user interaction, etc.) that could be combined to our multimodal representation.
- the development and evaluation of models that take advantage of the complementarity of modalities in order to monitor the evolution of the user's socio-emotional behaviors during the interaction (e. g. taking into account the inherent sequentially of the interaction structure)
The models will be based on sequential neural approaches (recurrent networks) that integrate attention models as a continuation of the work done in [Hemamou] and [BenYoussef19].

Selected references of the team:
[Hemamou] L. Hemamou, G. Felhi, V. Vandenbussche, J.-C. Martin, C. Clavel, HireNet: a Hierarchical Attention Model for the Automatic Analysis of Asynchronous Video Job Interviews. in AAAI 2019
[Garcia] Alexandre Garcia, Chloé Clavel, Slim Essid, Florence d?Alche-Buc, Structured Output Learning with Abstention: Application to Accurate Opinion Prediction, ICML 2018
[Clavel&Callejas] Clavel, C.; Callejas, Z., Sentiment analysis: from opinion mining to human-agent interaction, Affective Computing, IEEE Transactions on, 7.1 (2016) 74-93.
[Langlet] C. Langlet and C. Clavel, Improving social relationships in face-to-face human-agent interactions: when the agent wants to know user?s likes and dislikes , in ACL 2015
[Maslowski] Irina Maslowski, Delphine Lagarde, and Chloé Clavel. In-the-wild chatbot corpus: from opinion analysis to interaction problem detection, ICNLSSP 2017.
[Ben-Youssef17] Atef Ben-Youssef, Chloé Clavel, Slim Essid, Miriam Bilac, Marine Chamoux, and Angelica Lim. Ue-hri: a new dataset for the study of user engagement in spontaneous human-robot interactions. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, pages 464?472. ACM, 2017.
[Ben-Youssef19] Atef Ben Youssef; Chloe Clavel; Slim Essid Early Detection of User Engagement Breakdown in Spontaneous Human-Humanoid Interaction, IEEE Transactions on Affective Computing, 2019
[Varni] Varni G., Hupont, I., Clavel, C., Chetouani, M. Computational Study of Primitive Emotional Contagion in Dyadic Interactions. IEEE Transactions on Affective Computing, 2017.

[SB] https://www.softbankrobotics.com/emea/fr
[TP] https://www.telecom-paristech.fr/eng/
[SocComp.] https://www.tsi.telecom-paristech.fr/recherche/themes-de-recherche/analyse-automatique-des-donnees-sociales-social-computing/
[SSA] http://www.tsi.telecom-paristech.fr/ssa/#
[Clavel] https://clavel.wp.imt.fr/publications/
[Varni] https://sites.google.com/site/gvarnisite/

Top

6-3

(2019-08-03) Speech scientist at ETS Research

Speech scientist at ETS Research :

https://etscareers.pereless.com/index.cfm?fuseaction=83080.viewjobdetail&CID=83080&JID=290092

Top

6-4

(2019-08-12) Several positions in Forensic Speech Science or Forensic Data Science: Aston University, Birmingham, UK

Positions in Forensic Speech Science or Forensic Data Science:

- One Lecturer or Senior Lecturer

- Two Postdoctoral Researchers

Aston University, Birmingham, UK

Aston University has recently been awarded GBP 5.4 M from Research England?s Expanding Excellence in England (E3) Fund. The money is being used to expand the existing Centre for Forensic Linguistics into the substantially larger Aston Institute for Forensic Linguistics (AIFL). As part of the expansion, we are building a research team with expertise in forensic speech science and in forensic data science. In addition to conducting research in forensic speech science, members of the team will work on forensic inference and statistics more broadly, and on quantitative-measurement and statistical-model based approaches in other branches of forensic science. The latter potentially include but are not limited to: fingerprints, face, gait, ballistics, blood pattern analysis, and linguistics. The Forensic Speech Science Laboratory and the Centre for Forensic Data Science will be headed by Dr Geoffrey Stewart Morrison, and, in addition to the affiliation with AIFL, will be affiliated with the Computer Science Department in the School of Engineering and Applied Science.

We are looking to recruit the following positions:

Lecturer or Senior Lecturer in Forensic Speech Science or Forensic Data Science

Reference:    R190354

Salary:    Grade 9 £40,792 ? £48,677 or Grade 10 £50,132 ? £58,089

Contract Type:    Continuing

Basis:    Full time

Closing Date:    23.59 hours BST on September 30, 2019

Interview Date:    To be confirmed

Two Postdoctoral Researchers in Forensic Speech Science or Forensic Data Science

Reference:    R190353

Salary:   Grade 8 £33,199 ? £39,609 or Grade 9 £40,792 ? £48,677

Contract Type:    Fixed term (3 years)

Basis:    Full time or part time

Closing Date:    23.59 hours BST on September 30, 2019

Interview Date:    To be confirmed

The Lecturer or Senior Lecturer position will be a full-time permanent position and will include teaching and administrative responsibilities. The position is costed as a Grade 9 Lecturer, but an exceptionally well qualified and experienced successful applicant could potentially be appointed as a Grade 10 Senior Lecturer. Note: ?Lecturer? is equivalent to North American ?Assistant Professor?, ?Senior Lecturer? is equivalent to North American ?Associate Professor?, and ?Reader / Associate Professor? is an occasionally used additional rank between Senior Lecturer and Professor.

The Postdoctoral Researcher positions may be filled as full-time appointments (preferred) or via a combination of part-time appointments. The Postdoctoral Researcher positions will be fixed-term, but the plan is to build a team that will be successful in obtaining additional research funding that will sustain these positions.

All new team members must have a commitment to solving forensic problems. Previous experience working on forensic problems would be advantageous, but not essential. A background in forensic speech science, in other branches of forensic science, and/or in forensic inference and statistics would be advantageous, but not essential. At least one of the new team members must have a strong background in state-of-the-art automatic speaker recognition, with an ability to implement systems. Other useful backgrounds for members of the team would include biometrics, machine learning, natural language processing, and acoustic phonetics.

Candidates may apply for both the Lecturer / Senior Lecturer and the Research Associate positions. If positions are not filled after this round of recruitment, we will initiate another round of recruitment.

We also welcome enquiries from individuals who have obtained or are applying for their own postdoctoral fellowships, e.g., Marie Sklodowska-Curie Fellowships. For suitable candidates we would assist with the application process.

Potential candidates are encouraged to contact Dr Geoffrey Stewart Morrison to seek more information about these positions.

Tel:    +44 121 204 3901

e-mail:    g.s.morrison@aston.ac.uk

Dr Morrison will be attending Interspeech in September and would be happy to meet informally with potential applicants there.

Please visit our website http://www.aston.ac.uk/jobs for further information and to apply online.

Aston University is an equal opportunities employer and welcomes applications from all sections of the community.

Top

6-5

(2019-08-14) Postdoc at KTH, Stockholm, Sweden

We are looking for a postdoc to conduct research in a multidisciplinary expedition project funded by Wallenberg AI, Autonomous Systems and Software Program (WASP), Sweden?s largest individual research program, addressing compelling research topics that promise disruptive innovations in AI, autonomous systems and software for several years to come.

The project combines Formal Methods and Human-Robot Interaction with the goal of moving from conventional correct-by-design control with simple, static human models towards the synthesis of correct-by-design and socially acceptable controllers that consider complex human models based on empirical data. Two demonstrators, an autonomous driving scenario and a mobile robot navigation scenario in crowded social spaces, are planned to showcase the advances made in the project.

The focus of this position is on the development of data-driven models of human behavior that can be integrated with formal methods-based systems to better reflect real-world situations, as well as in the evaluation of the social acceptability of such systems.

The candidate will work under the supervision of Assistant Prof. Iolanda Leite (https://iolandaleite.com/) and in close collaboration with another postdoctoral researcher working in the field of formal synthesis.

This is a two-year position. The starting date is open for discussion, but ideally, we would like the selected candidate to start ASAP.

QUALIFICATIONS

Candidates should have completed, or be near completion of, a Doctoral degree with a strong international publication record in areas such as (but not limited to) human-robot interaction, social robotics, multimodal perception, and artificial intelligence. Familiarity with formal methods, game theory, and control theory is an advantage.

Documented written and spoken English and programming skills are required. Experience with experimental design and statistical analysis is an important asset. Applicants must be strongly motivated, be able to work independently and possess good levels of cooperative and communicative abilities.

We look for candidates who are excited about being a part of a multidisciplinary team.

HOW TO APPLY

The application should include:

1. Curriculum vitae.

2. Transcripts from University/ University College.

3. A brief description of the candidate's research interests, including previous research and future goals (max 2 pages).

4. Contact of two references. We will contact the references only for selected candidates.

The application documents should be uploaded using the KTH's recruitment system:

https://www.kth.se/en/om/work-at-kth/lediga-jobb/what:job/jobID:274522/where:4/

The application deadline is ** September 13, 2019 **

-----------------
Iolanda Leite
Assistant Professor
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science
Division of Robotics, Perception and Learning (RPL)

Teknikringen 33, 4th floor, room 3424, SE-100 44 Stockholm, Sweden
Phone: +46-8 790 67 34
https://iolandaleite.com

Top

6-6

(2019-08-17) Fully funded PhD position at IDIAP, Martigny, Valais, Switzerland.

There is a fully funded PhD position open at Idiap Research Institute on spiking neural
architectures for speech prosody.

The research will build on work done recently at Idiap on creating tools for
physiologically plausible modelling of speech. The current 'toolbox' contains rudimentary
muscle models and means to drive these using conventional (deep) neural networks. The
main focus of the work will involve use of spiking neural networks such as the 'integrate
and fire' type that is broadly representative of those found in biological systems.
Whilst we have focused so far on prosody (actually intonation), the application is open
ended; the focus is on the neural modelling. A key problem to be solved will be that of
training of the spiking networks, especially with the recurrence that is usual in such
networks. We hope to be able to train and use spiking networks as easily as conventional
backpropagation networks, and to shed light on current understanding of how biological
spiking networks learn (e.g., via spike timing-dependent plasticity).

For more information, and to apply, please follow this link:
http://www.idiap.ch/education-and-jobs/job-10263

Idiap is located in Martigny in French speaking Switzerland, but functions in English and
hosts many nationalities. PhD students are registered at EPFL. All positions offer quite
generous salaries. Martigny has a distillery and a micro-brewery and is close to all
manner of skiing, hiking and mountain life.

There are other open positions on Idiap's main page
https://www.idiap.ch/en/join-us/job-opportunities

Top

6-7

(2019-08-18) PhD positions at IRIT, Toulouse, France

Applications are invited for a three-year Early Stage Researcher PhD positions in the speech technology for pathological speech.

Description

The thesis focuses on studying the link between the internal representations of Deep Neural Networks (DNNs) and the subjective representation of speech intelligibility. We propose to explore the saliency detection capabilities of DNNs when used in a regression task for predicting speech intelligibility scores as given by human experts. By saliency, we mean to retrieve which frequency bands are important and used by a DNN to make its predictions.

The final expectation is to identify regions of interest in the speech signal, both in time and frequency, that characterise the level of speech impairment.

The experiments will be processed on various samples of speech performed by 150 people (100 patients and 50 healthy controls). This database was recorded within the INCA C2SI project, and contains speech from patients treated for cancer of the oral cavity or pharynx. It contains also various metadata such as the location of the tumor, the impairment in terms of severity and intelligibility that were appreciated by human experts, self evaluation questionnaires on the patient?s quality of life? Various tasks were recorded such as a sustained vowel, read speech, nonsense words, prosodic exercises, picture description, etc.

There will be also the possibility to extend the work to another corpus which is composed of voice of patients suffering from Parkinson disease.

At first, the PhD will have to take benefit from the various analysis and descriptions that were done during the C2SI project trying to correlate the impact of the tumor and the communication ability. Those results will help attesting the human representation of the impact of the disease. Then, a DNN representation will be modeled to fit the data, taking care of the data sparsity. The last part of the work will be to explore the intern representation of the DNN, trying to explore what part of the signal help to make a decision on the impact of the disease and that will be the final goal of the thesis, studying the automatic representation that lies in the model the student will propose.

This work is funded by the TAPAS project (https://www.tapas-etn-eu.org) which is a Horizon 2020 Marie Sk?odowska-Curie Actions Initial Training Network European Training Network (MSCA-ITN-ETN) project that aims to transform the well being of people across Europe with debilitating speech pathologies (e.g., due to stroke, Parkinson's, etc.). These groups face communication problems that can lead to social exclusion. They are now being further marginalised by a new wave of speech technology that is increasingly woven into everyday life but which is not robust to atypical speech.

The supervision of the PhD will take place at IRIT laboratory by the SAMoVA team in Toulouse. SAMoVA does research in the domain of ?analysis, modeling and structuring of audiovisual content?. The application areas are diverse: speech processing, identification of languages, speaker verification and speech and music indexing. The researchers expertise covers novel machine learning and audio processing technologies and is now focused on deep learning methods, leading to several publications in international conferences.

Eligibility Criteria:

Early Stage Researchers (ESRs) shall, at the time of recruitment by the host organization, be in the first four years (full-time equivalent research experience) of their research careers.

- The ESR may be a national of a Member State, of an Associated Country or of any Third Country.
- The ESR must not have resided or carried out her/his main activity (work, studies, etc.) in the country of her/his host organization for more than 12 months in the 3 years immediately prior to her/his recruitment.
- Holds a Master?s degree or equivalent, which formally entitles to embark on a Doctorate.
- Does not hold a PhD degree.

Duration of recruitment: 36 months.

Applications can be done through the website : https://www.tapas-etn-eu.org/positions/recruitment

Contact : Julie Mauclair (mauclair@irit.fr)

Top

6-8

(2018-08-25) Post-doc position at INRIA Rennes, France

Post-doc position: Pattern mining for Neural Networks debugging: application to speech recognition

Advisors: Elisa Fromont & Alexandre Termier, IRISA/INRIA RBA ? Lacodam team (Rennes)

Irina Illina & Emmanuel Vincent, LORIA/INRIA ? Multispeech team (Nancy)
firstname.lastname@inria.fr

Location: INRIA RBA, team Lacodam (Rennes)

Keywords: discriminative pattern mining, neural networks analysis, explainability of black
box models, speech recognition.

Deadline to apply: September 30th, 2019

Context:

Understanding the inner working of deep neural networks (DNN) has attracted a lot of attention in the past years [1, 2] and most problems were detected and analyzed using visualization techniques [3, 4]. Those techniques help to understand what an individual neuron or a layer of neurons are computing. We would like to go beyond this by focusing on groups of neurons which are commonly highly activated when a network is making wrong predictions on a set of examples. In the same line as [1], where the authors theoretically link how a training example affects the predictions for a test example using the so called ?influence functions?, we would like to design a tool to ?debug? neural networks by identifying, using symbolic data mining methods, (connected) parts of the neural network architecture associated with erroneous or uncertain outputs.

In the context of speech recognition, this is especially important. A speech recognition system contains two main parts: an acoustic model and a language model. Nowadays models are trained with deep neural networks-based algorithms (DNN) and use very large learning corpora to train an important number of DNN hyperparameters. There are many works to automatically tune these hyperparameters. However, this induces a huge computational cost, and does not empower the human designers. It would be much more efficient to provide human designers with understandable clues about the reasons for the bad performance of the system, in order to benefit from their creativity to quickly reach more promising regions of the hyperparameter search space.

Description of the position:

This position is funded in the context of the HyAIAI ?Hybrid Approaches for Interpretable AI? INRIA project lab (https://www.inria.fr/en/research/researchteams/inria-project-labs). With this position, we would like to go beyond the current common visualization techniques that help to understand what an individual neuron or a layer of neurons is computing, by focusing on groups of neurons that are commonly highly activated when a network is making wrong predictions on a set of examples. Tools such as activation maximization [8] can be used to identify such neurons. We propose to use discriminativepattern mining, and, to begin with, the DiffNorm algorithm [6] in conjunction with the LCM one [7] to identify the discriminative activation patterns among the identified neurons.

The data will be provided by the MULTISPEECH team and will consist of two deep architectures as representatives of acoustic and language models [9, 10]. Furthermore, the training data will be provided, where the model parameters ultimately derive from. We will also extend our results by performing experiments with supervised and unsupervised learning to compare the features learned by these networks and to perform qualitative comparisons of the solutions learned by various deep architectures. Identifying ?faulty? groups of neurons could lead to the decomposition of the DL network into ?blocks? encompassing several layers. ?Faulty? blocks may be the first to be modified in the search for a better design.

The recruited person will benefit from the expertise of the LACODAM team in pattern mining and deep learning (https://team.inria.fr/lacodam/) and of the expertise of the MULTISPEECH team (https://team.inria.fr/multispeech/) in speech analysis, language processing and deep learning. We would ideally like to recruit a 1 year (with possibly one additional year) post-doc with the following preferred skills:
? Some knowledge (interest) about speech recognition
? Knowledgeable in pattern mining (discriminative pattern mining is a plus)
? Knowledgeable in machine learning in general and deep learning particular
? Good programming skills in Python (for Keras and/or Tensor Flow)
? Very good English (understanding and writing)

See the INRIA web site for the post-doc page.

The candidates should send a CV, 2 names of referees and a cover letter to the four researchers (firstname.lastname@inria.fr) mentioned above. Please indicate if you are applying for the post-doc or the PhD position. The selected candidates will be interviewed in September for an expected start in October-November 2019.

Bibliography:

[1] Pang Wei Koh, Percy Liang: Understanding Black-box Predictions via Influence Functions. ICML 2017: pp 1885-1894 (best paper).

[2] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals: Understanding deep learning requires rethinking generalization. ICLR 2017.

[3] Anh Mai Nguyen, Jason Yosinski, Jeff Clune: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. CVPR 2015: pp 427-436.

[4] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus: Intriguing properties of neural networks. ICLR 2014.

[5] Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi: Deep Text Classification Can be Fooled. IJCAI 2018: pp 4208-4215.

[6] Kailash Budhathoki and Jilles Vreeken. The difference and the norm?characterising similarities and differences between databases. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 206?223. Springer, 2015.

[7] Takeaki Uno, Tatsuya Asai, Yuzo Uchida, and Hiroki Arimura. Lcm: An efficient algorithm for enumerating frequent closed item sets. In Fimi, volume 90. Citeseer, 2003.

Top

6-9

(2019-08-28) Speech technologist/linguist at Cobaltspeech.

Cobalt Speech & Language (http://www.cobaltspeech.com/ ) is looking for a speech technologist/linguist to help find and create language resources for a project in French Canadian.

The project is short term (<2 months) part time (~5-8h a week) which is ideal for a student to get an experience with speech industry.

The following skills are required:
- Native French (does not have to be Canadian - though desirable)
- Able to communicate in English
- Basic understanding of speech technology and linguistics
- Ability to run a python script.

For more information, please contact (Rasmus Dall) rasmus@cobaltspeech.com

Top

6-10

(2019-09-04)PhD thesis proposal, GIPSA Lab Grenoble France

PhD thesis proposal

Incremental sequence-to-sequence mapping for

speech generation using deep neural networks

September 4, 2019

1 Context and objectives

In recent years, deep neural networks have been widely used to address sequence-

to-sequence (S2S) learning. S2S models can solve many tasks where source

and target sequences have different lengths such as: automatic speech recog-

nition, machine translation, speech translation, text-to-speech synthesis, etc.

Recurrent, convolutional and transformer architectures, coupled with attention

models, have shown their ability to capture and model complex temporal de-

pendencies between a source and a target sequence of multidimensional discrete

and/or continuous data. Importantly, end-to-end training alleviates the need

to previously extract handcrafted features from the data by learning hierarchi-

cal representations directly from raw data (e.g. character string, video, speech

waveform, etc.).

The most common models are composed of an encoder that reads the full in-

put sequence (i.e. from its beginning to its end) before the decoder produces the

corresponding output sequence. This implies a latency equals to the length of

the input sequence. In particular, for a text-to-speech (TTS) system, the speech

waveform is usually synthesized from a complete text utterance (e.g. a sequence

of words with explicit begin/end-of-utterance markers). Such approach cannot

be used in a truly interactive scenario, in particular by a speech-handicapped

person to communicate orally'. Indeed, the interlocutor has to wait for the

complete utterance to be typed before being able to listen to the synthetic voice,

hence limiting the dynamics and naturalness of the interaction.

The goal of this project is to develop a general methodology for incremental

sequence-to-sequence mapping, with application to interactive speech technolo-

gies. It will require the development of end-to-end classication and regression

neural models able to deliver chunks of output data on-the-y, from only a par-

tial observation of input data. The goal is to learn an ecient policy that leads

to an optimal trade-off between (variable) latency and accuracy of the decoding

process. Possible strategies to decode the output data as soon as possible in-

clude: (i) Predicting online he future' of the output sequence from he past

and present' of the input sequence, with an acceptable tolerance to possible er-

rors, or (2) learn automatically from the data an optimal waiting policy' that

prevents the model to output data when the uncertainty is too high. The devel-

oped methodology will be applied to address two speech processing problems:

(i) Incremental Text-to-Speech synthesis in which speech is synthesized while

the user is typing the text (possibly with a variable latency), and (ii) Incremen-

tal speech enhancement/inpainting in which portions of the speech signal are

unintelligible because of sudden noise or speech production disorders, and must

be replaced on-the-y with reconstructed portions.

2 Work plan

The proposed working plan is the following :

Bibliographic work on S2S neural models, in the context of speech recogni-

tion, speech synthesis, and machine translation as well as their incremental

(low-latency) variations

Investigating new architectures, losses, and training strategies toward in-

cremental S2S models.

Implementing and evaluating the proposed techniques in the context of

end-to-end neural TTS systems (the baseline system may be a neural

TTS trained with past information/left-context only).

Implementing and evaluating the proposed techniques in the context of

speech enhancement/inpainting, rst on simulated noisy speech and then

on pathological speech.

3 Requirements

We are looking for an outstanding and highly motivated PhD candidate to work

on this subject. Following requirements are mandatory:

Engineering degree and/or a Master's degree in Computer Science, Signal

Processing or Applied Mathematics.

Solid skills in Machine Learning. General knowledge in natural language

processing and/or speech processing.

Excellent programming skills (mostly in Python and deep learning frame-

works).

Good oral and written communication in English.

Ability to work autonomously and in collaboration with supervisors and

other team members.

4 Work context

Grenoble Alpes Univ. o
ers an excellent research environment with ample com-

puting facilities, as well as remarkable surroundings to explore over the week-

ends. The PhD project will be funded by the Grenoble Artificial Intelligence

Institute (MIAI). The PhD candidate will work both at GIPSA-lab (CRISSP

team) and LIG-lab (GETALP team). The duration of the PhD is 3 years. The

salary is between 1770 and 2100 euros gross per month (depending on comple-

mentary activity or not).

5 How to apply?

Applications should include a detailed CV; a copy of their last diploma; at least

two references (people likely to be contacted); a cover letter of one page; a one-

page summary of the Master thesis; the two last transcripts of notes (Master or

engineering school). Applications should be sent to thomas.hueber@gipsa-lab.fr,

laurent.girin@gipsa-lab.fr and laurent.besacier@imag.fr. Applications will be

evaluated as they are received: the position is open until it is filled.

Top

6-11

(2019-09-04) Postdoc proposal, GIPSA Lab Grenoble, France

Postdoc proposal

Spontaneous Speech Recognition.

Application to Situated Corpora in French.

September 4, 2019

1 Postdoc Subject

The goal of the project is to advance the state-of-the-art in spontaneous auto-

matic speech recognition (ASR). Recent advances in ASR show excellent per-

formances on tasks such as read speech ASR (Librispeech), TV shows (MGB

challenge), but what about spontaneous communicative speech ?

This postdoc project would leverage existing transcribed corpora in French

(more than 300 hours) recorded in everyday communication (speech recordings

inside a family, in a shop, during an interview, etc.). One impact of the project

would be the automatization of transcription on very challenging data in order

to feed linguistic and phonetic studies at scale.

Research topics:

End-to-end ASR models

Spontaneous speech ASR

Colloquial speech transcription

Data augmentation for spontaneous and colloquial language modelling

Transcribing situated corpora

2 Requirements

We are looking for an outstanding and highly motivated postdoc candidate to

work on this subject. Following requirements are mandatory:

PhD degree in natural language processing or speech processing.

Excellent programming skills (mostly in Python and deep learning frame-

works).

Interest in pluri-disciplinary research (speech technology and speech sci-

ence)

Good oral and written communication in English (French is a plus while

not mandatory)

Ability to work autonomously and in collaboration with other team mem-

bers.

3 Work context

Grenoble Alpes Univ. o
ers an excellent research environment with ample com-

puting facilities, as well as remarkable surroundings to explore over the week-

ends. The postdoc project will be funded by the Grenoble Articial Intelligence

Institute (MIAI). The candidate will work both at LIG-lab (GETALP team)

and LIDILEM-lab. The duration of the postdoc is 18 months.

4 How to apply?

Applications should include a detailed CV; a copy of the last diploma; at least

two references (people likely to be contacted); a cover letter of one page; a

one-page summary of the PhD thesis. Applications should be sent to lau-

rent.besacier@imag.fr Applications will be evaluated as they are received: the

position is open until it is lled.

Top

6-12

(2019-09-24) VOXCRIM 2019, Ecully France

nceVOXCRIM 2019

MARDI 24 SEPTEMBRE 2019

de 9h30 à 17h00

Conférences et table ronde :

regards croisés sur la comparaison de voix

en criminalistique.

Inscriptions avant le 13 septembre

voxcrim@interieur.gouv.fr

04 72 86 85 22

Service Central de la Police

Technique et Scientifique

31 avenue Franklin Roosevelt

69130 ECULLY

Top

6-13

52019-09-05) Post doctoral position at IDIAP, Martigny, Switzerland

The Social Computing Group at Idiap is seeking a creative and motivated postdoctoral
researcher to work on deep learning methods for behavioral analysis from video and audio
data. This is an opening for a researcher with experience in deep learning applied to
dynamic human behavior (from voice, body, or face), in the context of a project funded by
Innosuisse, the Swiss funding agency for promotion of innovation.

The position offers the opportunity to do exciting work on deep learning and social
behavior. The researcher will work with Prof. Daniel Gatica-Perez and his research group.
The candidates will have a PhD degree in computer science or engineering, with proven
experience in deep learning and a strong publication record.

Salaries are competitive and starting date is immediate. Interviews will start upon
reception of applications until the positions are filled.

Interested candidates are invited to submit a cover letter, a detailed CV, and the names
of three references through Idiap's online recruitment system:

https://www.idiap.ch/en/join-us/job-opportunities
Position: Postdoctoral researcher in deep learning for social behavior analysis

Interested candidates can also contact Prof. Daniel Gatica-Perez (gatica@idiap.ch).

About Idiap Research Institute

Idiap is an independent, not-for-profit, research institute recognized and funded by the
Swiss Federal Government, the State of Valais, and the City of Martigny. Idiap is an
equal opportunity employer, and offers competitive salaries and excellent working
conditions in a dynamic and multicultural environment.

Idiap is located in the town of Martigny in Valais, a scenic region in the south of
Switzerland, surrounded by the highest mountains of Europe, and offering exceptional
quality of life, exciting recreational activities, including hiking, climbing and skiing,
as well as varied cultural activities, all within close proximity to Lausanne and Geneva.
English is the official working language.

Delete \| Reply \| Reply to All \| Forward \| Redirect \| View Thread \| Blacklist \| Whitelist \| Message Source \| Save as \| Print
	Move \| Copy

Top

6-14

(2019-09-09) Postdoctoral Research Fellow/Senior Research Fellow, University of Tampere, Finland

Postdoctoral Research Fellow/Senior Research Fellow

(speech and language technology, cognitive science)

Tampere University and Tampere University of Applied Sciences create

a unique environment for multidisciplinary, inspirational and highimpact

research and education. Our universities community has its

competitive edges in technology, health and society. www.tuni.fi/en

Speech and Cognition research group (SPECOG) is part of Computing Sciences Unit of Tampere University

within the Faculty of Information Technology and Communication Sciences. SPECOG focuses on

interdisciplinary research at the intersection of speech technology and cognitive sciences. We apply

advanced signal processing and machine learning methods to computational modeling of human language

learning and perception and study how human-like information processing principles can be applied in

autonomous next-generation artificial intelligence (AI) systems. The group also conducts research and

development in speech and language technology and in medical signal processing and machine learning.

SPECOG collaborates with several internationally leading research groups within and across disciplinary

boundaries, including joint research with computer scientists, psychologists, brain researchers, and linguists.

The group is also closely affiliated with audio and machine vision research groups of Tampere University.

More information on SPECOG: http://www.cs.tut.fi/sgn/specog/index.html

Job description

We are inviting applications for the position of a postdoctoral research fellow or senior research fellow in

the areas of speech and language technology and cognitive science. The work will be conducted as a

member of the SPECOG research group led by Asst. Prof. Okko Räsänen. We are looking for candidates who

are interested in human and/or machine language processing, and who are willing to contribute to our

highly cross-disciplinary research efforts in understanding language learning in humans and autonomous

computational systems. Our current focus is on machine learning algorithms for unsupervised language

learning from purely acoustic or audiovisual data (sometimes also known as zero-resource speech

processing). However, we also consider candidates with a strong independent research agenda in

complementary areas of speech and language technology.

In this position, the candidate is expected to:

1) carry out world-class research on a topic related to SPECOG focus areas

2) work in close collaboration with other members of the research group, and

3) help to advise undergraduate and/or PhD projects on the relevant topics (with flexibility according to

personal interests and career aspirations).

Requirements

The candidate should hold a doctoral degree (e.g., PhD or D.Sc. (Tech.)) in language technology, computer

science, electrical engineering, cognitive science, or other relevant area. Candidates who have already

completed their doctoral research work but have not yet received their doctoral certificate may also apply.

A successful candidate has strong expertise in signal processing and machine learning (e.g., deep learning),

ideally from the context of speech technology. Applicants with a background in natural language processing

(NLP) or cognitive science are also considered. Experience or interests in linguistics, neuroscience, or

statistics are considered as an advantage. Fluent programming (Python, Matlab, R, C++ or similar) and

English skills are required.

Potential candidates must be capable of carrying out independent research at the highest international

level. Competence must be demonstrated through several existing publications in internationally

recognized peer-reviewed journals and conferences.

We offer

The position will be filled for a fixed-term period of two years, starting as soon as possible (but not

extending the contract beyond the end of December 2021). A trial period of 6 months is applied to all new

employees. The exact starting date is negotiable.

We offer competitive academic salary, typically between 3500–4000 € for a starting postdoc depending on

the experience of the candidate, and 4000–4500 € for a senior research fellow with several years of existing

postdoctoral research experience in academia or industry. In addition, the position comes with extensive

benefits such as occupational healthcare, excellent sports facilities, flexible working hours, and several

restaurants and cafés on the campus with staff discounts. Traveling costs and daily allowances related to

presenting peer-reviewed work in major international conferences is also normally covered.

How to apply

Send the application through the online portal at https://tuni.rekrytointi.com/paikat/?o=A_A&jid=301

We will accept applications until the position has been filled, but no later than 30th of November 2019 at

23.59 (GMT+3). Note that we will start evaluating the applicants already on 1st of October 2019, and the

position may be filled as soon as a suitable candidate is found. We reserve the opportunity to recruit the

candidate through other channels or to decide to not to fill the position in case a suitable candidate is not

found during the process.

The application should contain the following documents (all in .pdf format):

- A free-form letter of motivation for the position in question (max. 1 page)

- Academic CV with contact information

- A list of publications

- A copy of doctoral degree certificate

- A letter or letters of recommendation (max. 3)

Please name all the documents as surname_CV.pdf, surname_list_of_publications.pdf … etc. Only the

applications sent through the university application portal and containing the requested attachments in the

instructed format will be considered in the recruitment process.

The most promising candidates will be interviewed in person or via Skype before the final decision.

For more information about the position, please contact Assistant Professor Okko Räsänen

(firstname.surname@tuni.fi; no umlauts) by email.

About the research environment

Finland is among the most stable, free and safe countries in the world, based on prominent ratings by

various agencies. It is also ranked as one of the top countries as far as social progress is concerned.

Tampere is counted among the major academic hubs in the Nordic countries and offers a dynamic living

environment. Tampere region is one of the most rapidly growing urban areas in Finland and home to a

vibrant knowledge-intensive entrepreneurial community. The city is an industrial powerhouse that enjoys a

rich cultural scene and a reputation as a centre of Finland’s information society. Despite its growth, living in

Tampere is highly affordable with two-room apartment rents starting from approx. 550 €. In addition, the

excellent public transport network enables quick, easy and cheap transportation around the city of

Tampere and university campuses.

Read more about Finland and Tampere:

• https://www.visitfinland.com/about-finland/

• https://finland.fi/

• http://julkaisut.valtioneuvosto.fi/bitstream/handle/10024/161193/MEAEguide_18_2018_T

ervetuloaSuomeen_Eng_PDFUA.pdf

• https://visittampere.fi/en/

Top

6-16

(2019-09-09) Postdoc position at IRIT, Toulouse, France

Projet READYNOV : AUDIOCAP

Audition et handicap dans le bruit : vers la restauration de l’intelligibilité de la parole

Type d’emploi POSTDOC

Cadre de la recherche

Restauration d’une intelligibilité dans le bruit pour les personnes

âgées via des prothèses auditives.

Mots-clés Parole, bruit, intelligibilité

Missions

Prédiction de l’intelligibilité de la parole dans le bruit :

- Prise en main d’un système de Reconnaissance Automatique de

la Parole en Français,

- Modélisation acoustique dans le bruit.

Mise en place d’un outil de séparation de la parole et du bruit, fondée sur

l’application de filtres temps-fréquences. Celui-ci sera « réglé » dans un

but de favoriser l’intelligibilité de la parole.

Compétences

Développement logiciel

Traitement du signal

Apprentissage machine (« deep learning »)

Lieu IRIT – 118, route de Narbonne – 31062 TOULOUSE

Date et durée de la mission De 12 à 18 mois à partir du 1er octobre 2019

Salaire Entre 1900 et 2400 € net par mois, suivant l’expérience

Documents à fournir

- CV détaillé

- Lettre de motivation

- Résumé d'une page de la thèse de doctorat

Contact Julien PINQUIER, pinquier@irit.fr

Top

6-17

(2019-09-05) R/D position at Zaion, Paris France

ZAION est une société innovante en pleine expansion spécialisée dans la technologie des robots conversationnels : callbot et chatbot intégrant de l’Intelligence Artificielle.

ZAION a développé une solution qui s’appuie sur une expérience de plus de 20 ans de la Relation Client. Cette solution en rupture technologique reçoit un accueil très favorable au niveau international et nous comptons déjà 18 clients actifs (GENERALI, MNH, APRIL, CROUS, EUROP ASSISTANCE, PRO BTP …).

Nous sommes actuellement parmi les seuls au monde à proposer une offre de ce type entièrement tournée vers la performance. Nous rejoindre, c’est prendre part à une aventure passionnante au sein d’une équipe ambitieuse afin de devenir la référence sur le marché des robots conversationnels.

Nous rejoindre, c’est prendre part à une aventure passionnante et innovante afin de devenir la référence sur le marché des robots conversationnels. Dans le cadre de son développement ZAION recrute son Data Scientist /Machine Learning appliqué à l’Audio H/F. Au sein de l’équipe R&D, votre rôle est stratégique dans le développement et l’expansion de la société. Vous développerez, une solution qui permet de détecter les émotions dans les conversations. Nous souhaitons augmenter les fonctionnalités cognitives de nos callbots afin qu’ils puissent détecter les émotions de leurs interlocuteurs (joie, stress, colère, tristesse…) et donc adapter leurs réponses en conséquence.

Vos missions principales :

- Vous participez à la création du pôle R&D de ZAION et piloterez à votre arrivée votre premier projet de reconnaissance d’émotion dans la voix.

- Construisez, adaptez et faites évoluer nos services de détection d’émotion dans la voix

- Analysez de bases de données conséquentes de conversations pour en extraire les conversations émotionnellement pertinentes

- Construisez une base de données de conversations labelisées avec des étiquettes émotionnelles

- Formez et évaluez des modèles d'apprentissage automatique pour la classification d’émotion

- Déployez vos modèles en production

- Améliorez en continue le système de détection des émotions dans la voix

Qualifications requises et expérience antérieure :

-Vous avez une expérience de 2 ans minimum comme Data Scientist/Machine Learning appliqué à l’Audio

- Diplômé d’une école d’Ingénieur ou Master en informatique ou un doctorat en informatique mathématiques avec des compétences solides en traitements de signal (audio de préférence)

- Solide formation théorique en apprentissage machine et dans les domaines mathématiques pertinents (clustering, classification, factorisation matricielle, inférence bayésienne, deep learning...)

- La mise à disposition de modèles d'apprentissage machine dans un environnement de production serait un plus

- Vous maîtrisez un ou plusieurs des langages suivants : Python, Frameworks de machine Learning/Deep Learning (Pytorch, TensorFlow,Sci-kit learn, Keras) et Javascript

- Vous maîtrisez les techniques du traitement du signal audio

- Une expérience confirmée dans la labélisation de grande BDD (audio de préférence) est indispensable ;

- Votre personnalité : Leader, autonome, passionné par votre métier, vous savez animer une équipe en mode projet

- Vous parlez anglais couramment

Merci d’envoyer votre candidature à : alegentil@zaion.ai

Top

6-18

(2019-09-15) Post-doc and research engineer at INSA, Rouen, Normandy, France

Post-doctoral position (1 year): Perception for interaction and social navigation

Research Engineer (1 year): Social Human-Robot Interactions

Laboratory: LITIS, INSA Rouen Normandy, France

Project: INCA (Natural Interactions with Artificial Companions)

Summary:

The emergence of interactive robots and connected objects has lead to the appearance of symbiotic systems made up of human users, virtual agents and robots in social interactions. However, two major scientific difficulties are unsolved yet: on the one hand, the recognition of human activity remains inaccurate, both at the operational level (location, mapping and identification of objects and users) and cognitive (recognition and tracking of users? intentions) and, on the other hand, interaction involves different modalities that must be adapted according to the context, the user and the situation. The INCA project aims at developing artificial companions (interactive robots and virtual agents) with a particular focus on social interactions. Our goal is to develop new models and algorithms for intelligent companions capable of (1) perceiving and representing an environment (real, virtual or mixed) consisting of objects, robots and users; (2) interacting with users in a natural way to assess their needs, preferences, and engagement; (3) learning models of user behavior and (4) generating semantically adequate and socially appropriate responses.

Post-doctoral position in perception for interaction and social navigation (1 position)

The candidate will work to ensure that a robot can recognize the physical content of the scene surrounding him, recognize himself, static and dynamic objects (users and other robots) and finally predict the movement of dynamic elements. The integration of data from different sensors should allow the mapping of an unknown environment and estimate the position of the robot. First, VSLAM techniques (Visual Simultaneous Localization And Mapping) (Saputra 2018) will be used to map the scene. The regions (or points) of interest detected could
then be used to detect obstacles. In order to distinguish between static and dynamic objects, methods of separating the background from the foregound of the scene (Kajo et al, 2018) will be used. Finally, some recent techniques of the Flownet 2.0 type (Eddy et al, 2017), for the prediction of the motion on a video sequence should make it possible to predict the next movement of an object dynamic object and the to apprehend its behavior.

Profile: the candidate must have strong skills in mobile robotics and navigation techniques (VSLAM, OrbSlam, Optical Flow, stereovision...) and a high programming capacities under ROS or any other programming language compatible with robotics. Machine learning and Deep learning skills will be highly appreciated.

Research Engineer in Social Human-Robot Interactions (1 position)

The hired research engineer will work closely with the INCA research staff (permanent, PhD and post-doctoral members) and other project partners. This will mainly involve administering the project's Pepper robots, developing the necessary tools, integrating the algorithms developed with the AgentSlang platform (https://agentslang.github.io/) and join the team created to participate in the Robocup 2020 in Bordeaux, @Home league.

Profile: Computer Sciences / Robotics Engineer

Good level in programming (ROS, Python, possibly Java)
Strong knowledge in robotics
Experiences in some of the following areas would be a plus (non-exhaustive list): machine learning, human-machine social interactions, scene perception, spatio-temporal and semantic representation, natural language dialogue.

Duration and remuneration: 1 year, 2480euros/month (gross salary)

Application should be sent to: alexandre.pauchet@insa-rouen.fr

Curriculum vitae
Cover letter
Recommendation letters
Recently graduated students: transcripts

Top

6-19

(2019-09-20) Poste ATER, Paris Sorbonne, France

un poste d'ATER en Informatique est disponible à la faculté des lettres de Sorbonne
Université. Le lien pour postuler est http://lettres.sorbonne-universite.fr/ater

Top

6-20

(2019-09-21) Post-doc/PhD position, LORIA, Nancy, France

Post-doc/PhD position Pattern mining for Neural Networks debugging: application to speech recognition

Advisors: Elisa Fromont & Alexandre Termier, IRISA/INRIA RBA ? Lacodam team (Rennes)

Irina Illina & Emmanuel Vincent, LORIA/INRIA ? Multispeech team (Nancy)

firstname.lastname@inria.fr

Location: INRIA RBA, team Lacodam (Rennes)

Deadline to apply : October 30th 2019.

Starting date : December 2019 -January 2020

Keywords: discriminative pattern mining, neural networks analysis, explainability of blackbox models, speech recognition.

Context:

Understanding the inner working of deep neural networks (DNN) has attracted a lot of attention in the past years [1, 2] and most problems were detected and analyzed using visualization techniques [3, 4]. Those techniques help to understand what an individual neuron or a layer of neurons are computing. We would like to go beyond this by focusing on groups of neurons which are commonly highly activated when a network is making wrong predictions on a set of examples. In the same line as [1], where the authors theoretically link how a training example affects the predictions for a test example using the so called ?influence functions?, we would like to design a tool to ?debug? neural networks by identifying, using symbolic data mining methods, (connected) parts of the neural network architecture associated with erroneous or uncertain outputs.

In the context of speech recognition, this is especially important. A speech recognition system contains two main parts: an acoustic model and a language model. Nowadays models are trained with deep neural networks-based algorithms (DNN) and use very large learning corpora to train an important number of DNN hyperparameters. There are many works to automatically tune these hyperparameters. However, this induces a huge computational cost, and does not empower the human designers. It would be much more efficient to provide human designers with understandable clues about the reasons for the bad performance of the system, in order to benefit from their creativity to quickly reach more promising regions of the hyperparameter search space.

Description of the position:

This position is funded in the context of the HyAIAI ?Hybrid Approaches for Interpretable AI? INRIA project lab (https://www.inria.fr/en/research/researchteams/inria-project-labs). With this position, we would like to go beyond the current common visualization techniques that help to understand what an individual neuron or a layer of neurons is computing, by focusing on groups of neurons that are commonly highly activated when a network is making wrong predictions on a set of examples. Tools such as activation maximization [8] can be used to identify such neurons. We propose to use discriminative pattern mining, and, to begin with, the DiffNorm algorithm [6] in conjunction with the LCM one [7] to identify the discriminative activation patterns among the identified neurons.

? Some knowledge (interest) about speech recognition

? Knowledgeable in pattern mining (discriminative pattern mining is a plus)

? Knowledgeable in machine learning in general and deep learning particular

? Good programming skills in Python (for Keras and/or Tensor Flow)

? Very good English (understanding and writing)

However, good PhD applications will also be considered and, in this case, the position will last 3 years. The position will be funded by INRIA (https://www.inria.fr/en/). See the INRIA web site for the post-doc and PhD wages.

Bibliography:

[1] Pang Wei Koh, Percy Liang: Understanding Black-box Predictions via Influence Functions. ICML 2017: pp 1885-1894 (best paper).

[2] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals: Understanding deep learning requires rethinking generalization. ICLR 2017.

[3] Anh Mai Nguyen, Jason Yosinski, Jeff Clune: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. CVPR 2015: pp 427-436.

[4] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus: Intriguing properties of neural networks. ICLR 2014.

[5] Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi: Deep Text Classification Can be Fooled. IJCAI 2018: pp 4208-4215.

[6] Kailash Budhathoki and Jilles Vreeken. The difference and the norm?characterising similarities and differences between databases. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 206?223. Springer, 2015

[7] Takeaki Uno, Tatsuya Asai, Yuzo Uchida, and Hiroki Arimura. Lcm: An efficient algorithm for enumerating frequent closed item sets. In Fimi, volume 90. Citeseer, 2003.

[8] Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. Visualizing higher-layer features of a deep network. University of Montreal, 1341(3):1, 2009.

[9] G. Saon, H.-K. J. Kuo, S. Rennie, M. Picheny: The IBM 2015 English conversational telephone speech recognition system?, Proc. Interspeech, pp. 3140-3144, 2015.

[10] W. Xiong, L. Wu, F. Alleva, J. Droppo, X. Huang, A. Stolcke : The Microsoft 2017 Conversational Speech Recognition System, IEEE ICASSP, 2018.

Top

6-21

(2019-09-22) Postdoc position at Grenoble Alps University, Grenoble, France

Postdoc proposal