ISCA - International Speech
Communication Association


ISCApad Archive  »  2016  »  ISCApad #222  »  Jobs

ISCApad #222

Saturday, December 10, 2016 by Chris Wellekens

6 Jobs
6-1(2016-07-07) 1 PhD position in bio-inspired ASR, 1 PhD position in brain-signal based ASR at the Italian Institute of Technology

Title: 1 PhD position in bio-inspired ASR, 1 PhD position in brain-signal based ASR

 

The Center for Translational Neurophysiology of Speech and Communication (CTNSC) at the Italian Institute of Technology is seeking for motivated PhD students who will work on bio-inspired automatic speech recognition (ASR) and brain-signal based ASR.

The successful candidates will have a Master degree in Computer Science, Engineering (or equivalent) and a background in machine learning and/or signal processing.

 

The candidates should apply by the 5th of August, 2016.

PhD courses will start on the 1st of November, 2016.

 

CTNSC Info: https://www.iit.it/centers/ctnsc-unife

 

PhD page [ITA]:

http://www.unife.it/studenti/dottorato/concorsi/ordinario

 

PhD course summary [ITA]:

http://www.unife.it/studenti/dottorato/modulistica/concorsi/32-ciclo/neuroscienze

 

How to participate to the selection [ITA/ENG]:

http://www.unife.it/studenti/dottorato/modulistica/Guidaconcorso.pdf

Top

6-2(2016-07-13) Permanent research position in machine learning for natural language processing, Orange Labs, Lannion, France

Job offer: permanent research position in machine learning for natural language processing

 

Mission: Conceive algorithms and develop softwares for natural language processing, with applications to automatic extraction of structured knowledge from documents

 

Position environment: In Orange Labs,  the « CONTENT » department prepares the future of content access services (music, video, press, radio, education, entertainment). The job is opened in a multidisciplinary team, located in Lannion, France, in which are conducted collaborative studies with academics (collaborative projects, PhD supervision, scientific publications) and research and development projects with strong interactions with operational units of Orange (marketing, customer experience, design?)  

 

Candidate profile: PhD or significant research experience in the field of machine learning, with an noticeable interest on natural language processing

 

In order to visualize the full offer and to apply on line : https://orange.jobs/jobs/offer.do?joid=54926&lang=FR

 

For more information, please contact:

Gilles Prigent (gilles.prigent@orange.com)

Delphine Charlet (delphine.charlet@orange.com)

Géraldine Damnati (geraldine.damnati@orange.com)

Top

6-3(2016-07-15) ASR Research Scientist (Chinese) Oben Pasadena, California


ASR Research Scientist (Chinese) Oben Pasadena, California

  
Founded in 2014, ObEN is an artificial iintelligence company based in one of the world’s most successful incubators: Idealab in Pasadena, CA. We have developed a sophisticated speech technology that allows any Internet of Things to speak in any voice and in any language. 

 
JOB DESCRIPTION  As a Speech Recognition Research Scientist at ObEN, you’ll be working on several proprietary and stealthy projects in the audio space. Your work will encompass the whole scope of application development, including speech research, voice interface design, application prototyping, and development of our proprietary speech recognition engine. Currently, we are mainly interested in candidates with experience in ASR for Chinese language. Experience in English and/or Spanish is a big plus.  
You must have:

■ PhD in Computer Science, Electrical Engineering or Mathematics with specialization in speech recognition, natural language processing or machine learning.

■ Solid experience building ASR systems (Chinese) and/or a publication record in the area.

■ Solid experience with Kaldi.

■ Strong machine learning background and familiar with standard statistical modeling techniques applied to speech.

■ Familiarity with linguistic phonetics.

■ Proficiency in programming languages such as C/C++, Python, Java or Perl.

■ Knowledge of basic digital signal processing techniques for audio.

■ Enjoys a highly collaborative environment. 


To apply please send the following: to careers@oben.me

● Detailed resume and/or Linkedin profile.

● Links to prominent scientific/professional contributions.



Contact: Fernando Villavicencio <fernando@oben.com>

Top

6-4(2016-07-20) PhD position at IDIAP, Martigny, Switzerland.

In the context of a Swiss NSF grant, we seek a PhD student to work on multilingual and
affective speech synthesis.

  http://www.idiap.ch/education-and-jobs/job-10193

The research will begin with the state of the art in speech synthesis, probably based
around deep neural networks (DNNs). Such networks have been shown recently to produce
synthetic speech with better overall quality, especially with respect prosody, than other
techniques. The student will then focus on modelling of speech with a view to extraction,
modeling and adaptation of emotion (affect) in synthetic speech. To demonstrate the
language independence of the models, it will be necessary to work in two or three
languages.

Several research threads are possible; one thread will involve creating a neural model of
prosodic features. It will build upon recent work here at Idiap where we have created a
general physiologically plausible model for prosody.

Another research thread will involve analysis and modification of the phonetic quality of
the synthetic speech. This is in keeping with observations that formant positions can
change with affective indicators such as valence and arousal.

Depending on progress, there is also the possibility to address cross-lingual modelling
of affect. Here, the goal would be to detect emotion in one language and reproduce it in
another language.

Owing to availability of data, the work will probably proceed using German data. However,
our locality dictates that evaluation will be easier in French. Amongst the Swiss
languages, Italian is also appealing.

The ideal Ph.D student should have a master (or equivalent) degree in engineering,
computer science, or applied mathematics. S/he should have a good background in
mathematics, statistics, and programming (C/C++, Python, scripting langages). Given the
multlingual nature of the project, the position would suit someone with some knowledge of
one or more of the languages described above. In order to balance the group, we
especially encourage female applicants. However, all applications will be judged on merit.

Applications will be considered as they are received; the position will remain open until
filled.

About Idiap:

Idiap is an independent, non-profit research institute recognized and supported by the
Swiss Government, and affiliated with the Ecole Polytechnique Fédérale de Lausanne
(EPFL). It is located in the town of Martigny in Valais, a scenic region in the south of
Switzerland, surrounded by the highest mountains of Europe, and offering exciting
recreational activities, including hiking, climbing and skiing, as well as varied
cultural activities. It is within close proximity to Geneva and Lausanne. Although Idiap
is located in the French part of Switzerland, English is the working language. Free
French lessons are provided.

Idiap offers competitive salaries and conditions at all levels in a young, dynamic, and
multicultural environment. Idiap is an equal opportunity employer and is actively
involved in the 'Advancement of Women in Science' European initiative. The Institute
seeks to maintain a principle of open competition (on the basis of merit) to appoint the
best candidate, provides equal opportunity for all candidates, and equally encourage both
genders to apply.

--
Phil Garner
http://www.idiap.ch/~pgarner

Delete | Reply | Reply to List | Reply to All | Forward | Redirect | View Thread | Blacklist | Whitelist | Message Source | Save as | Print
Move | Copy
Top

6-5(2016-07-21) Speech Analytics Engineer , SRI, Menlo Park, CA, USA

 

Speech Analytics Engineer

Job Description

SRI’s Speech Technology and Research (STAR) Lab seeks a self-motivated and team-oriented speech engineer to join a team working on modeling of speaker state (for example, emotion and health). The position offers both research and commercial opportunities, and a potential for growth and client interaction. The work will include advancing technology capabilities, understanding client data and conducting experiments involving model development and testing, under current or new projects in this area. This is an opportunity to have a significant impact in an emerging research area.

STAR Lab engages in leading-edge research in speech recognition, speaker characterization, speaker and language identification, machine translation, natural language processing and other areas of speech/language technology, offering opportunities from basic research to prototyping, productization, and widespread deployment. The multidisciplinary research team consists of excellent speech researchers, linguists and software engineers. Characteristics of STAR staff are enthusiasm, self-motivation, initiative, passion for learning, taking ownership, thriving in a flat organizational hierarchy, and a desire to control their own career paths.

SRI International is a premier nonprofit research organization based in Menlo Park, California with a mission to create world-changing solutions making people safer, healthier, and more productive.

Located in the heart of Silicon Valley, the world’s center for innovation and technology, SRI’s focus spans everything from basic research to delivered systems, government and commercial work and spinning off successful companies like SIRI. Flexible cross-laboratory teams often form dynamically to solve challenging problems spanning multiple disciplines.

Requirements

-PhD (preferred) or Master’s degree in electrical engineering or computer science, with experience in advanced speech signal processing and in state of the art machine learning techniques for speech processing.

-Additional background in natural language processing is useful.

-Work experience outside the PhD institution, including internships, is preferred.

-Candidates should have an interest in the topic area, excellent programming skills, be a quick learner, be proactive and efficient, and work well both individually and in teams.

Apply online at www.sri.com/careers

Job ID: 1239BR

SRI is an Equal Employment Opportunity/Affirmative Action Employer

Top

6-6(2016-07-28) Announcing 14 PhD positions on a new Marie Curie European Training Network, ENRICH

Announcing 14 PhD positions on a new Marie Curie European Training Network, ENRICH (Enriched Communication Across the Lifespan)

Speech is a hugely efficient means of communication: a reduced capacity in listening or speaking creates a significant barrier to social inclusion at all points through the lifespan, in education, work and at home. Hearing aids and speech synthesis can help address this reduced capacity but their use imposes greater listener effort. The fundamental objective of ENRICH is to modify or augment speech with additional information to make it easier to process. Enrichment aims to reduce the listening burden by minimising cognitive load, while maintaining or improving intelligibility. ENRICH will investigate the relationship between cognitive effort and different forms of natural and synthetic speech.

ENRICH is funded for 4 years from October 2016. We are now looking to recruit 14 generously-funded early stage researchers to undertake projects leading to a PhD within the network. Openings exist for candidates with a background in engineering, psychology, speech science or linguistics.

ENRICH partners are: University of Edinburgh, Radboud University, Tobii Technology, University College London, University of Crete, Hoerzentrum Oldenburg, University Medical Center Groningen and the University of the Basque Country (coordinator).

For more details of the network and early stage researcher positions, see http://www.enrich-etn.eu. The closing date for applications is 30 September 2016.


Top

6-7(2016-08-01) Positions at the French 'Police Technique et Scientifique', Ecully (Lyon), France

Le service audio de la Police Technique et Scientifique (Ecully, près de Lyon) recherche des vacataires pour effectuer un travail de segmentation et/ou d'enregistrements de locuteurs. 
Le profil suivant est recherché:

- un intérêt pour la linguistique ou pour les langues
- une bonne maitrise de l'informatique et des nouvelles technologiques
- une connaissance du logiciel Praat sera appréciée
Les vacations s'effectueront jusqu'à fin octobre 2016.
Pour plus d'informations, merci de contacter laurianne.georgeton@gmail.com

ou de téléphoner au 0472868638

Top

6-8(2016-08-12) Offre de thèse à IRISA, Rennes, Bretagne, France

 
Nous offrons un contrat doctoral de 3 ans portant sur la « Caractérisation et la génération de l'expressivité en fonction des styles de parole pour la construction de livres audio » dans le cadre du projet ANR SynPaFlex. Des informations sur ce travail sont disponibles à : https://www-expression.irisa.fr/files/2016/08/phd_caracterisation_generation_livre_audio.pdf et https://www-expression.irisa.fr/files/2016/08/phd_audiobook_generation.pdf pour la version en anglais. Cette thèse sera effectuée en co-direction entre l?IRISA et le LLF. 

 
Merci de diffuser cette information auprès de personnes qui pourraient être intéressées.
 
Les candidats doivent posséder de bonnes compétences en Informatique et être intéressés par un travail collaboratif et par le traitement automatique de la parole, l?apprentissage automatique et l?intelligence artificielle.
 
Pour plus de précisions ou pour postuler, merci de contacter :
- Elisabeth Delais-Roussarie : elisabeth.roussarie@wanadoo.fr
- Damien Lolive : damien.lolive@irisa.fr
 
Début souhaité: Octobre 2016.
Top

6-9(2016-08-19) Thesis opportunity in linguistics / phonetics at Spring, France

Looking for an opportunity in linguistics / phonetics 

 
Spring France is currently hiring for one of its clients several Junior Linguists for a CDD contract from the beginning of September until end December 2016. 
 
Job description:
 
The role of the Junior Linguist is to annotate and review linguistic data in French.  The Junior Linguist will also contribute to a number of other tasks to improve natural language processing. The tasks include:
 
- Providing phonetic/phonemic transcription of lexicon entries
- Analyzing acoustic data to evaluate speech synthesis
- Annotating and reviewing linguistic data
- Labeling text for disambiguation, expansion, and text normalization
- Annotating lexicon entries according to guidelines
- Evaluating current system outputs
- Deriving NLP data for new and on-going projects
- Be able to work independently with confidence and little oversight
 
Minimum Requirements:
 
- Native speaker of French and fluent in English
- Extensive knowledge of phonetic/phonemic transcriptions
- Familiarity with TTS tools and techniques
- Experience in annotation work
- Knowledge of phonetics, phonology, semantics, syntax, morphology or lexicography
- Excellent oral and written communication skills
- Attention to detail and good organizational skills
 
Desired Skills:
 
- Degree in Linguistics or Computational Linguistics or Speech processing
- Ability to quickly grasp technical concepts; learn in-house tools
- Keen interest in technology and computer-literate
- Listening Skills
- Fast and Accurate Keyboard Typing Skills
- Familiarity with Transcription Software
- Editing, Grammar Check and Proofing Skills
- Research Skills
 
Applications to be sent to Dominique.auffroy@springfrance.com under the reference 6528162.
Top

6-10(2016-08-20) Position in linguistic Code-Switching at Columbia University, NY, USA

 

 

We are asking for input on researchers’ interest and engagement in computational approaches to linguistic Code-Switching in any language pair, modality, or genre. A brief survey can be found at the link below.

 

https://docs.google.com/forms/u/0/d/1ARm04N_si_7VaMPjtbWOFUUjxJZm7TQmjgNSHcUNPcw

 

Mona Diab

Julia Hirschberg

Thamar Solorio

 

Top

6-11(2016-08-29) Research Scientist positions at Nokia Bell Labs, Cambridge, UK

Applications are invited for Research Scientist positions at Nokia Bell Labs, Cambridge in the areas of Applied Machine Learning, Embedded Systems and Sensor engineering with a strong focus on Pervasive Sensing and Mobile Systems.

Nokia and Bell Labs

Nokia is a global leader in the technologies that connect people and things. Powered by the pioneering work of Bell Labs, our research and innovation division, and Nokia Technologies, we are at the forefront of creating and licensing the technologies that are increasingly at the heart of our connected lives. Nokia Bell Labs is internationally renowned as the birthplace of modern information theory, the transistor, the laser and the UNIX operating system.

Bell Labs Cambridge

Bell Labs' research facility in Cambridge is a leading lab working in the areas of Mobile Sensing and Systems, Applied Machine Learning, Social Computing and Internet of Things research.

We have openings for inspired innovators in our Pervasive Sensing and Systems Department of  the Application Platforms and Software Systems (A&S)  research Program. The department's research agendas are:

  • Embedded OS-level software enabling resource-efficient high-precision sensor processing, modelling and analytics.
  • Multi-modal edge-device-based deep learning models of human behaviour and context reasoning in the wild.
  • Exploration and understanding of new sensor modalities.
  • Disruptive end-to-end applications in the areas of digital health, smart home and quantified enterprise.

Main duties and responsibilities

  • Carry out groundbreaking research in the areas of Mobile Systems, Ubiquitous Computing and Applied Machine learning with focus on above mentioned research agendas for creating both theoretical innovations and novel practical implementations.
  • Contribute to the technical definition of research objectives and programs.
  • Propose and publish research for the major research publications worldwide.
  • Create and maintain strong collaborative associations with university-based researchers, other leading research bodies, and product business units.
  • Keep an active and visible role in the research community through conference committees and reviewing panels.

Expected qualifications, skills and experience required for this job

A PhD in Computer Science or Electrical Engineering with a strong focus on UbiComp and Mobile Computing. A PostDoc experience is a plus.

  • For positions with a focus on Embedded Systems
      • Deep understanding of Mobile and Embedded Operating Systems with kernel level coding experience.
      • Comprehensive understanding of Mobile and Embedded Operating Systems and Mobile Application Frameworks (e.g., Android / iOS).
  • For positions with focus on (Applied) Machine Learning
      • Deep expertise in Machine Learning and Data Mining techniques (especially deep learning methods) for processing sensory signals (vision, speech, motion, etc.)
      • Good understanding of Mobile and Embedded Operating Systems and Mobile Application Frameworks (e.g., Android / iOS).
  • For positions with a focus on Sensor Engineering
      • Hands-on architecture-level skills in hardware integration and sensor engineering
      • Strong expertise in the areas of digital signal processing and applied machine learning
  • A proven track record in research, with publications in prestigious journals and conferences.
  • Strong written and spoken communications skills.
  • Ability to conduct independent research while also contributing to a team-oriented project.

For informal query or more information please contact Dr. Fahim Kawsar (fahim.kawsar@nokia-bell-labs.com) or Dr. Nic Lane (nic.lane@nokia-bell-labs.com).

Top

6-12(2016-09-04) postdoctoral position on the Evaluation of an Adaptive Training Virtual Environment, Heudiasyc Lab CNRS

We are inviting applications for one postdoctoral position on the Evaluation of an Adaptive Training Virtual Environment at Heudiasyc Lab CNRS (https://www.hds.utc.fr/).

 

Context

Feedback selection is a major issue for intelligent virtual environments for training. Expert knowledge provides useful insight, but it can be difficult to collect. Furthermore, the influence of a specific type of feedback may vary across trainees and over training, which is rarely reflected in expert knowledge. As part of an automatic gesture training system, we modelled the co-evolution between a trainee and a training environment. We consider trainees by their levels in every dimensions of the task, and the training environment as the relevance of every element in its set of feedback. Over time, this co-evolution is reflected by trainee successive performances, and changes in the feedback selection. We model these interactions as a multi-armed bandit problem, each arm representing a type of feedback. This allows to adapt feedback selection only relying on the interactions between the trainee and the training environment, without any prior knowledge. Combining the trainee and the set of feedback in a single representation space, we show how our model can provide useful indicators regarding trainees progression and feedback relevance.

 

Objective

The goal of this post-doc is to evaluate the impact of adaptation in training virtual environments. We believe that co-evolution based on the enaction theory has a positive impact on training. The candidate will design an experiment in order to validate the model. Two platforms are available to conduct this experiment: a 2D platform (PhD Remy Frenoy) and a 3D platform (extension of the model in a CAVE). One major issue is the role of feedback for controlling the interaction. A second issue is the impact of giving an importance to errors in training, in order to propose a flexible, personalized and easy of use training.

 

The position is for two years, starting before 15th December 2016. The successful candidate will join the Heudiasyc research team

(https://www.hds.utc.fr/).) at the CNRS premises within the Campus of the UTC (http://www.utc.fr/the_university/index.php), in Compiegne, France. The research activity will be carried out under the supervision of Indira Thouvenin (http://www.hds.utc.fr/~ithouven

 

 

Professor/ Enseignant-Chercheur HDR / http://www.hds.utc.fr/~ithouven

UMR CNRS 7253 Heudiasyc

Sorbonne Universités, Université de technologie de Compiègne

Génie Informatique - bureau 141
57 avenue de Landshut 60203 Compiègne (France)

Top

6-13(2016-10-03) Ph grant at the laboratoire de phonétique de l’Université de Mons, Belgium

Offre de bourse de doctorat

_____________________________________________________________________________________

Service de Métrologie et Sciences du Langage, Laboratoire de

phonétique,

Université de Mons, Mons, Belgique

_____________________________________________________________________________________

Le service de métrologie et sciences du langage (laboratoire de phonétique de l’Université de Mons,

Belgique) recherche, un spécialiste (M/F) de l’étude de la parole humaine désireux de préparer, en son

sein, une thèse de doctorat.

Profil du candidat (M/F) :

Secteur de formation initiale :

Sciences du langage (linguistique, logopédie, psychologie du langage,…) à titre de formation de

base ou à tout le moins de formation complémentaire approfondie.

Niveau à l’entrée :

Au moins niveau « master » (« bac+5 », 300 crédits) au sens du décret de la Communauté

Française de Belgique organisant l’enseignement supérieur.

Compétences transversales

Aptitude au travail en équipe, créativité, autonomie, curiosité scientifique.

Bonne maîtrise de l’outil informatique (tableurs, gestion de bases de données, traitements de

texte).

Maîtrise au moins quasi-native de la langue française à l’écrit et à l’oral.

Maîtrise de la langue anglaise au moins au niveau C1 du CECR dans les 4 compétences .

Une sensibilité à la problématique des langues étrangères, la maîtrise d’autres langues, des connaissances

en phonétique (en particulier l’analyse objective de la parole), des connaissances et des compétences en

mathématiques appliquées (en particulier en statistiques), constituent des atouts complémentaires.

Track record

La personne recrutée peut se prévaloir de mentions égales ou supérieures à la grande distinction en

bachelier et en master.

Profil du poste:

La personne recrutée se voit octroyer une bourse de doctorat d’un an.

Elle s’engage à tenter d’obtenir, durant ce terme, un financement plus étendu en concourant pour des

postes tels que chercheur FRIA, FRESH et/ou aspirant FNRS.

Au terme de la première année, en cas d’absence du financement alternatif ainsi recherché, la personne

est évaluée par le comité d’accompagnement. Sur cette base, elle peut se voir octroyer –ou non- un

terme additionnel de 2 ans. Durant ce nouveau terme, elle s’engage à poursuivre sa recherche de

financements extérieurs de son doctorat.

En aucun cas, la durée cumulée des bourses de recherche doctorale ne peut excéder 4 années.

Prise de fonctions :

au plus tôt.

Les personnes intéressées sont priées d’adresser, pour le 30 octobre au plus tard, un dossier

comportant :

une lettre de motivation ;

un curriculum vitae ;

tout document jugé utile ;

au format pdf (exclusivement) à l’adresse : bernard.harmegnies@umons.ac.be

Projet scientifique

Des concepts tels que fatigue ou stress sont fréquemment évoqués tant en sciences de la

vie qu’en sciences humaines. Ils se caractérisent non seulement par le fait que leur portée

s’étend aussi bien à la biochimie qu’au psychisme humains, mais surtout par l’idée qu’une

action sur l’esprit peut ici avoir des répercussions sur le corps et vice-versa. Ces notions

demeurent pourtant variablement définies et diversement objectivées, comme l’est aussi,

dans ce contexte, l’interaction entre physiologie et psychisme.

Le projet dans le cadre duquel s’inscrit le doctorat pour lequel il est ici fait appel vise à

élucider ces relations complexes en étudiant l'évolution conjointe, dans une approche intrasujets,

de trois types de variables: (i) des variables situationnelles (tant à variation invoquée

qu’à variation provoquée); (ii) des marqueurs biologiques de l'état du sujet humain

(approche métabonomique et bio-marqueurs spécifiques enregistrés au sein de divers

biofluides); (iii) les mesures révélatrices du traitement du langage par le sujet (gestion de la

parole en émission et en réception).

Les contextes dans lesquels sont recueillies les observations sont ceux du contrôle par le

sujet humain de processus complexes, spécialement en aéronautique, domaine hautement

générateur de « situations-problèmes » potentiellement suscitatrices de phénomènes.

Le doctorat se centre plus particulièrement sur les variables situationnelles et cible

spécifiquement celles qui sont liées aux langues utilisées par le sujet. On sait aujourd’hui que

la réalité physique des sons de parole est notamment influencée par divers facteurs qui s’y

rapportent. Ceux-ci peuvent être liés à l’inscription communautaire du sujet (on peut par

exemple citer les variabilités diatopique, diastratique, ou diachronique), voire aux actions

exogènes visant délibérément à modifier les caractéristiques phoniques des sons de parole,

par exemple dans les contextes de l’enseignement/apprentissage (et/ou de l’utilisation) de

langues non-maternelles. D’autres déterminants, de nature endogène au sujet, peuvent

également se manifester, qu’ils trouvent leur origine dans la sphère cognitive (maîtrise des

langues, expertise multilingue, etc.), soit dans la sphère affective (posture personnelle par

rapport aux langues utilisées). Ces travaux qui, à des titres divers, démontrent l’action de ces

facteurs endogènes sur les productions vocales, ouvrent la voie à un positionnement moins

descriptif mais plus centré sur la valeur indicielle des observations effectuées: puisque ces

facteurs ont un effet sur le signal vocal, la détection de leurs marques dans le signal ouvre la

voie d’une caractérisation, par la seule analyse des productions phoniques, de l’état du

locuteur.

Le secteur des transports, et en particulier celui de l’aéronautique, s’est montré

graduellement plus intéressé par ces perspectives lors des dernières décennies ; à un

moment où nombre d’incidents ou d’accidents y sont aujourd’hui imputables au facteur

humain plus qu’à des défauts techniques, le développement de recherches susceptibles de

contribuer à l’élaboration de systèmes d’alerte propres à détecter des altérations de la

fonctionnalité du pilote et ce, sur la base de variations du seul signal vocal, fait figure de défi

stratégique.

Si plusieurs recherches ont certes démontré l’intérêt de ces perspectives, force est

cependant de constater que les résultats en sont extrêmement diversifiés, voire parfois

contradictoires. Ceci s’explique probablement d’une part par l’insuffisance du volume de

données global recueilli et d’autre part par l’importante diversité méthodologique qui

caractérise le champ. De ce point de vue, trois dimensions apparaissent nécessiter une

attention particulière. D’une part, ces recherches, sont d’ordinaire restreintes à des

productions vocales anglophones, laissent dans l’ombre les autres langues de

communication aéronautique et, ipso facto, négligent les interactions possibles entre le

facteur langue et les divers facteurs étudiés ; peu, par ailleurs, prennent en considération le

caractère fréquemment multilingue des communications aéronautiques et le fait que,

souvent, les agents sont amenés à s’exprimer dans une langue non maternelle ; aucune, par

ailleurs, n’interroge le problème de la perte différentielle de compétence phonique en L2 et

en L1 sous l’effet des conditions adverses de communication.

La thèse visée se centrera en conséquence tant sur les effets exercés par les variables

situationnelles liées aux situations de contrôle de processus complexes sur la performance

multilingue que sur les effets de divers types de multilinguismes sur l’efficacité du contrôle

de processus complexes gérés en contextes multilingues.

Top

6-14(2016-10-06) Two positions at DefinedCrowd Corporation at its RandD center in Lisbon, Portugal

We’re DefinedCrowd Corporation (http://www.definedcrowd.com), headquartered in Seattle, Washington, with a R&D center in Lisbon, Portugal. We provide machine learning training data for artificial intelligence products and applications internationally. Our clients ranges from Fortune 500 companies to cutting-edge AI and robotics companies and we just got invested by Amazon and Sony. We are a Microsoft Accelerator alumni company, and we are recognized as the fastest growing company in the big data arena for AI. As we continue to expand our international footprint at DefinedCrowd, we are looking for talented new members to join this energetic, hardworking and fun team in both Seattle headquarter and in Lisbon R&D center. As we continue to grow at DefinedCrowd, we are constantly challenging ourselves with finding the most efficient way to manage business operation, building most suitable and sustainable models for our lines of business, creating a highly recognized and trusted brand among both clients and crowd communities.

Current job offers:

 

Data Scientist (Seattle or Lisbon)

We’re looking for a Data Scientist with an NLP background reporting to the CTO to drive the data science strategy and quality control using machine learning for DefinedCrowd.

 

Main requirements

 

  • MSc/PhD in Computer Science or equivalent with experience in Machine Learning (including DNNs), Artificial Intelligence or Statistics.

  • Creative thinking with the ability to drive their ideas into technology.

  • NLP background or experience.

  • Proactivity, initiative, positive attitude, ability to solve problems creatively.

  • Excellent speaking and writing skills in English.

 

Nice to have

  • Enterprise experience in this area.

  • Experience with both supervised and unsupervised learning.

  • Data warehousing and ETL.

  • Knowledge of ML platforms such as Azure ML, Google Tensor, IBM Watson.

  • Experience in statistical analysis and respective tools (e.g. SPSS).

  • International publications in this area.


 

Technical Program Manager (Seattle or Lisbon)

We are looking for a Program Manager with an NLP background reporting to the Program Manager Lead in Lisbon. He/she will be responsible for helping managing customer deliverables, defining data quality, working with the Engineering team on the platform features, validating quality against customer requirements and data reporting and analytics. Scripting and basic programming skills will be required.

Required skills:

  • BA or Masters in Linguistics or equivalent. Masters in Computational Linguistics is a plus.

  • Experience in project management: gathering requirements; planning; resource allocation; communication with developers.

  • Technical Skills: Python; Perl; Regular Expressions; SQL; Excel; Microsoft Office Suite;

  • Proficiency in more than 2 languages.

  • Proactivity, initiative, positive attitude, ability to solve problems creatively.

  • Excellent speaking and writing skills in English.

  • Good communication skills and ability to work with ambiguity.

  • Able to work with flexible hours when the interactions with Seattle headquarters is needed or with overseas customers and partners.

 

Please submit your resume to hr@definedcrowd.com. We are looking forward to hearing from you.

Positions opened until filled. Date: 09/10/2016

 

Top

6-15(2016-10-20) Two Master research internships at LIMSI - CNRS, Orsay, France

Two Master research internships (with follow-up PhD scholarship) at LIMSI - CNRS, Orsay, France
Unsupervised Multimodal Character Identification in TV Series and Movies

Keywords : deep learning, speech processing, natural language processing, computer vision

Automatic character identification in multimedia videos is an extensive and challenging problem. Person identities can serve as foundation and building block for many higher level video analysis tasks, for example semantic indexing, search and retrieval, interaction analysis and video summarization. The goal of this project is to exploit textual, audio and video information to automatically identify characters in TV series and movies without requiring any manual annotation for training character models. A fully automatic and unsupervised approach is especially appealing when considering the huge amount of available multimedia data (and its growth rate). Text, audio and video provide complementary cues to the identity of a person, and thus allow to better identify a person than from either modality alone.

In this context, LIMSI (www.limsi.fr) proposes two projects, focusing on two different aspects of this multimodal problem. Depending on the outcome of the internship, both projects may lead to a PhD scholarship (one funding is already secured).

Project 1 ? natural language processing + speech processing

speaker A  ? 'Nice to meet you, I am Leonard, and this is Sheldon. We live across the hall.'
speaker B ? 'Oh. Hi. I?m Penny.'

speaker A ? 'Sheldon, what the hell are you doing?'
speaker C ? I am not quite sure yet. I think I am on to something?

Just looking at these two short conversations, a human can easily infer that 'speaker A' is actually 'Leonard', 'speaker B' is Penny and 'speaker C' is Sheldon. The objective of this project is to combine natural language processing and speech processing to do the same automatically. Building blocks include automatic speech transcription, named entity detection, classification of names (first, second or third person) and speaker diarization. Preliminary works in this direction have already been published in [Bredin 2014] and [Haurilet 2016]

[Bredin 2014] Hervé Bredin, Antoine Laurent, Achintya Sarkar, Viet-Bac Le, Sophie Rosset, Claude Barras. Person Instance Graphs for Named Speaker Identification in TV Broadcast. Odyssey 2014, The Speaker and Language Recognition Workshop.
[Haurilet 2016] Monica-Laura Haurilet, Makarand Tapaswi, Ziad Al-Halah, Rainer Stiefelhagen. Naming TV Characters by Watching and Analyzing Dialogs. WACV 2016. IEEE Winter Conference on Applications of Computer Vision.

Project 2 ? speech processing + computer vision

This project aims at improving (acoustic) speaker diarization using the visual modality. Indeed, it was shown in a recent paper [Bredin 2016] that recent advances in deep learning for computer vision led to very reliable face clustering performance ? whereas speaker diarization is very bad at processing TV series and movies (mostly because current state of the art has not been designed to process this kind of content).

The first task is to design deep learning approaches (based on recurrent neural networks) to address talking-face detection (e.g. deciding, among all visible people, which one is currently speaking) by combining the audio and visual (e.g. lip motion) streams. The second task is to combine talking-face detection and face clustering to guide and improve speaker diarization (i.e. who speaks when?). Read [Bredin 2016] for more information on this kind of approach.

[Bredin 2016] Hervé Bredin, Grégory Gelly. Improving speaker diarization of TV series using talking-face detection and clustering. ACM Multimedia 2016, 24th ACM International Conference on Multimedia.

Profile: Master student in machine learning (experience in natural language processing, computer vision and/or speech processing is appreciated)
Location: LIMSI - CNRS, Orsay, France
Duration: 5/6 months
Salary: according to current regulations
Contact: Hervé Bredin (bredin@limsi.fr) with CV + cover letter + reference letter(s)

Top

6-16(2015-10-23) Vacancy for a one-year post-doctoral researcher position at the Radboud University, Nijmegen, The Netherlands

Vacancy for a one-year post-doctoral researcher position
           at the Radboud University, Nijmegen, The Netherlands
     *** Computational modelling of human spoken-word recognition  ***

As a post-doctoral researcher, you will join the NWO-Vidi funded project
'Ignoring the merry in marry: The effect of individual differences in
attention and proficiency on non-native spoken-word recognition in noise'
headed by Dr. Odette Scharenborg. This project investigates the effect of
noise on non-native spoken-word recognition using a range of tasks tapping
into different processes underlying human spoken-word recognition, and the
effect of individual differences in attention and proficiency on
non-native spoken-word recognition in noise.

You will conduct research on or related to the computational modelling of
human spoken-word recognition. Research will focus on determining the best
method for the automatic classification of speech segments, building a
computational model of non-native human spoken-word recognition using Deep
Neural Networks, running simulations and comparing the model?s output with
existing human data. You will communicate your findings through papers in
peer-reviewed research journals and at international conferences.

What we expect:
- You hold a PhD in artificial intelligence, computer science,
computational (psycho)linguistics, or a related discipline
- You have a good knowledge of speech and human and/or automatic speech
processing
- You have experience with computational modelling
- You have experience with Deep Neural Networks
- You have a good command of spoken and written English
- You have multiple journal publications, two of which as a first author
- You are a team player who enjoys working with people from different
backgrounds

More information:
Odette Scharenborg
O.Scharenborg@let.ru.nl
http://odettescharenborg.ruhosting.nl

How to apply:
Please note: Only job applications uploaded via the university website are
taken into consideration. The job vacancy will soon become available via
http://www.ru.nl/werken/alle-vacatures/

The application should consists of:
-Motivation letter
-CV
-List of publications
-Names of two referents

Closing date: Sunday 20 November 2016.

Starting date: The preferred starting date is 1 January 2017.

Top

6-17(2016-11-02) PhD stipends at The Centre for Acoustic Signal Processing Research, Aalborg University, Denmark

PhD stipends at The Centre for Acoustic Signal Processing Research, Aalborg University, Denmark

 

The Centre for Acoustic Signal Processing Research (CASPR) will have a number of fully funded PhD stipends available in 2017.

 

We are looking for highly motivated, independent, and outstanding students that desire to do a successful 3-year PhD programme at Aalborg University. The ideal candidates must have strong expertise in one or more of the following disciplines: statistical signal processing, auditory perception, machine learning, information theory, or estimation theory. Good English verbal and written skills are a must. Excellent undergraduate and master degree grades are desired.

 

PhD positions in Denmark are fully funded, i.e. no tuition fees, and come with a salary. The salary is subject to a pay grade system based on prior working experience since completing your undergraduate degree. The yearly gross salary is in the range 41.500 ? 50.100 Euros.

 

You may obtain further information about the PhD stipends from Associate Professor Jan Østergaard (jo@es.aau.dk), Associate Professor Zheng-Hua Tan (zt@es.aau.dk), or Professor Jesper Jensen (jje@es.aau.dk), CASPR, Aalborg University, concerning the scientific aspects of the stipends.

 

Webpage for the positions: http://caspr.es.aau.dk/open-positions/

Top

6-18(2016-11-09) Internship on Search Engine Development at ELDA in Paris (France),

ELDA is opening an internship on Search Engine Development in Paris (France), starting on January 2017 for 6 months.

Profile

  • MSc. in Computer Science;
  • Basic knowledge in data structures and algorithms;
  • Basic knowledge in Web applications architecture;
  • Python and / or JavaScript language skills;
  • Technical English
  • Hands-on knowledge of a database system (ideally PostgreSQL);
  • Knowledge of a search engine (Solr, Elasticsearch, Lucene) will be appreciated.

Duties

In the software development team at ELDA and under the supervision of an Engineer specialised in Natural Language Processing and Web Application Development, you will participate in the following tasks:

  • produce a state-of-the-art overview on most powerful research engines that are currently available, such as Solr, Elasticsearch, or the full-text search features provided by current database systems, such as PostgreSQL.
  • help specifying the full-text search needs for the LREC conference proceedings;
  • help choosing the technical solution that best fits the context;
  • participate in the design of a database structure (data schema) for the contents of the LREC proceedings web sites;
  • harvest the LREC proceedings sites and populate the aforementioned database with all the relevant information extracted from the contents of the LREC proceedings sites;
  • implement a search solution that is exhaustive, robust and works throughout all the LREC proceedings.

You will also participate in the regular meetings of the software development team at ELDA.

Application

This 6-month internship is based in Paris 13th district (Les Gobelins).
It should start in January 2017.

Applicants should email a cover letter addressing the points listed above together with a curriculum vitæ to Vladimir Popescu (vladimir@elda.org).

The internship is subject to a monthly allowance, commensurate with the candidate's educational qualifications and according to the French laws.

http://www.elra.info/en/opportunities/internships/

-*-*-*-*-*-*-*-

ELDA ('Evaluations and Language resources Distribution Agency') is a key player of the Human Language Technology domain. Operational body of ELRA, the European Language Resources Association, a European not-for-profit organisation promoting language resources in a European context, ELDA is in charge of executing a number of tasks on behalf of ELRA, including both the distribution and the production of Language Resources. Within the production projects, ELDA is often in the position of coordinating resource annotations, as well as performing quality control of these annotations.

Thus, ELDA supports the organization of ELRA?s biannual international scientific conference, LREC, the Language Resources and Evaluation Conference, which brings together, an increasing number (1200+) of top-tier researchers from all over the world, who submit and present scientific research articles.

In order to ease the navigation in this thesaurus of scientific articles, ELDA has set up a set of Web sites gathering the articles themselves, as well as the corresponding metadata (authors, titles, article abstracts, etc.).

In this context, ELDA wants to consolidate these sites, allowing the users to rely on a robust and exhaustive search throughout the article collections for all the editions of the LREC conference.

Top

6-19(2016-11-10) Offre de thèse LIMSI, Orsay, France

Sujet: offre de thèse en CIFRE, agents conversationnels

Sujet de thèse : Génération pour un agent conversationnel adaptatif
Directrices de thèse : Sophie Rosset et Anne-Laure Ligozat

Dans le cadre du projet Nanolifi (un agent conversationnel au service
de la ville, Le LIMSI recrute un.e doctorant.e en informatique.

L'objectif principal de la thèse est de modéliser des activités
dialogiques dans un cadre conversationnel humain-robot tout
venant. L'approche portera sur l'utilisation de méthodes à base
d'apprentissage non supervisé, par exemple de type zero-shot learning
ou par renforcement. La représentation sémantique et dialogique
manipulée devra intégrer des informations de type linguistique telles
que fournies par des outils de traitement automatique des langues
disponibles. La mesure d'évaluation devra prendre en compte la notion
d'engagement tant de l'utilisateur que du système. La base de
connaissances servant de support au cadre applicatif sera constitué à
partir des documents contenant l'information sur la ville.

Description complète
======================
http://www.adum.fr/as/ed/voirpropositionT.pl?site=PSaclay&matricule_prop=13069

Pré-requis
===========
* Master 2 en Informatique (ou équivalent), avec au moins une spécialité en
  * Apprentissage
  * Traitement automatique de la langue
  * Traitement de la parole


Informations pratiques
=======================

* Début de thèse : début 2017
* inscription EDSTIC Univ. Paris Saclay
* Financement CIFRE

Candidature
=============

Dossier de candidature : lettre de motivation, CV, résultats de Licence et master,
Copie à Sophie Rosset (rosset@limsi.fr), Anne-Laure Ligozat (annlor@limsi.fr)

Top

6-20(2016-11-19) Stage de master recherche dans le cadre du projet ANR Vocadom, Grenoble, France

L'équipe GETALP du LIG propose un stage de master recherche dans le cadre du projet ANR Vocadom.

-- 

2016-2017 M2R INFO GETALP

 
Stage M2R Informatique

Commande vocale à large vocabulaire pour la domotique par apprentissage profond et “word embedding”
Large vocabulary voice command recognition for automation of the home using deep learning and word embedding

Responsable(s) : Michel Vacher, François Portet, Benjamin Lecouteux
Mots-clés : automatic speech recognition, Deep Neural Network, Word Embedding, Home Automation, reconnaissance automatique de Parole, apprentissage profond, représentation vectorielle des mots, domotique
Durée du stage : 5 mois, possibilité de continuer en thèse (financement ANR)
Lieu du stage : équipe GETALP du LIG, bâtiment IMAG, Domaine universitaire de Grenoble
Contexte du stage
Dans le cadre du projet VOCADOM financé par l’ANR, nous cherchons à concevoir un système domotique contrôlé à distance par la voix et qui peut être utilisé dans des conditions réelles (bruit, présence de plusieurs personnes). Outre les problèmes de bruit, l’un des défis à relever est de constamment s’adapter aux utilisateurs finaux en prenant en compte de toute l’information disponible (capteurs audio et capteurs domotiques). L’une de ces adaptions consiste à comprendre l’énoncé de parole des utilisateurs sans
avoir de contrainte sur le vocabulaire et la syntaxe que les utilisateurs utiliseront. Par exemple, l’ordre standard pour allumer la lumière est “Nestor allume la lumière”, mais il pourrait être “Nestor, on n’y voit rien” ou «S’il te plaît Nestor allume la lumière “.
Sujet de stage
Le but de ce stage est de permettre cette association “énoncé vocal” <-> “ordre domotique” avec le minimum de contraintes lexicales et syntaxiques. Des études précédentes ont utilisé une distance phonétique de Levenshtein qui est bien adaptée lorsque les prononciations sont proches. Par conséquent, nous proposons d’opérer non seulement au niveau phonétique mais également au niveau lexical en travaillant sur le treillis de sortie du décodeur du système de reconnaissance vocale.
Le travail proposé consistera en une première étape pour faire une étude bibliographique liées à ce domaine. La deuxième étape consistera à développer une méthode pour explorer le réseau à la sortie du décodeur KALDI ASR. Celle-ci utilisera un modèle externe de mots basés sur le Word Embedding (acquis par apprentissage profond) afin de pouvoir associer un score de proximité entre un énoncé connu et un nouvel énoncé. Le système développé sera ensuite évalué sur un corpus synthétique (construit par synthèse vocale) et sur le corpus enregistré dans l’habitat intelligent du laboratoire.
L’étudiant recruté pourra bénéficier des études antérieures, dans lesquelles les enregistrements ont été faits dans un véritable appartement avec plusieurs chambres, chacune équipée de micros. Les participants y ont joué des scénarios réalistes de la vie quotidienne (AVQ) (dormir, se lever, faire sa toilette, préparer un repas, déjeuner, se détendre, sortir …). Cela nous a permis de recueillir un corpus réaliste contenant les commandes vocales enregistrées dans diverses conditions.
Compétences souhaitées : langage C++, bonne maîtrise de la langue française

Références
M. Vacher, S. Caffiau, F. Portet, B. Meillon, C. Roux, E. Elias, B. Lecouteux, P. Chahuara (2015). “Evaluation of a context-aware voice interface for Ambient Assisted Living: qualitative user study vs. quantitative system evaluation“, ACM – Transactions on Speech and Language Processing, Association for Computing Machinery, 2015, Special Issue on Speech and Language Processing for AT (Part 3), 7 (issue 2), pp.5:1-5:36.
D. Povey, L. Burget, M. Agarwal, P. Akyazi, F. Kai, A. Ghoshal, O. Glembek, N. Goel, M. Karafiát, A. Rastrow, R. C. Rose, P. Schwarz, and S. Thomas (2011). “The subspace gaussian mixture model—a structured model for speech recognition“, Computer Speech & Language, vol. 25, no. 2, pp. 404 – 439.
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). “A neural probabilistic language model“. Journal of machine learning research, 3(Feb), 1137-1155.

-------------------------------------------------------------------------

Michel VACHER Ingénieur de Recherche CNRS - HDR - Directeur adjoint Institut Carnot LSI http://www.carnot-lsi.com - Laboratoire d'Informatique de Grenoble - LIG - UMR 5217 Groupe d'Etude en Traduction/Traitement Automatique des Langues et de la Parole 700 avenue Centrale - Bâtiment IMAG - Bureau 330 Domaine Universitaire - 38401 St Martin d'Hères URL labo : http://www.liglab.fr/ perso : http://lig-membres.imag.fr/vacher/ Mel. : Michel.Vacher@imag.fr | tel: 0033 (0)4 57 42 14 38 Adresse postale : LIG - Bâtiment IMAG - CS 40700 - 38058 GRENOBLE CEDEX 9

Top

6-21(2016-11-20) 3 subjets Master Recherche 2 in NLP at Loria Nancy for 2016-2017

 3 subjets Master Recherche 2 in NLP at Loria Nancy for 2016-2017:



Subject 1

Title: Data selection for the training of deep neural networks in the framework of automatic speech recognition

Supervisor: Irina Illina

Team and lab: MultiSpeech

Contact: illina@loria.fr

Co-supervisor: Dominique Fohr

Team and lab: MultiSpeech

Contact : dominique.fohr@loria.fr

 

Motivations and context

More and more audio/video appear on Internet each day. About 300 hours of multimedia are uploaded per minute. Nobody is able to go thought this quantity of data. In these multimedia sources, audio data represents a very important part. Classical approach for spoken content retrieval from audio documents is an automatic speech recognition followed by a text retrieval. In this internship, we will focus on the speech recognition system.

One of the important modules of an automatic speech recognition system is the acoustic model: it models the sounds of speech, mainly phonemes. Currently, the best performing models are based on deep neural networks. These models are trained on a very large amount of audio data, because the models contain millions of parameters to estimate. For training acoustic models, it is necessary to have audio documents for which the exact text transcript is available (supervised training).

The Multi-Genre Broadcast (MGB) Challenge is an evaluation of speech recognition systems, using TV recordings in English or Arabic. The speech data is broad and multi-genre, spanning the whole range of TV output, and represents a challenging task for speech technology. Speech Data cover the multiple genres in broadcast TV, categorized in terms of 8 genres: advice, children?s, comedy, competition, documentary, drama, events and news.

The problem with MGB data challenge is that the exact transcription of audio documents is not available. Only subtitles of TV recordings are given. These subtitles are sometimes far from what is actually pronounced: some words may be omitted, hesitations are rarely transcribed and some sentences are reformulated.

In this internship, we will focus on the problem of data selection for efficient acoustic model training.

Objectives

A subtitle is composed of a text, a start time appearance (timecode) on the screen and an end time of appearance. These start and end times are given relative to the beginning of the program. It is easy to associate subtitle and the corresponding audio segment.

We have at our disposal a very large audio corpus with the corresponding subtitles and we want to develop data selection methods for obtaining high performance acoustic models. That is to say with a word error rate as small as possible. If we use all the training data, the errors in the subtitles will lead to poor quality acoustic models and therefore a high recognition word error rate.

We propose to use a deep neural network (DNN) to classify the segments into two categories: audio segments corresponding to subtitles and audio segments not corresponding to subtitles. The student will analyze what information, acoustic and/or linguistic, is relevant to this selection task and can be used as input of the DNN.

The student will validate the proposed approaches using the automatic transcription system of TV broadcast developed in our team.

Required skills background in statistics, natural language processing and computer program skills (Perl, Python).

Localization and contacts: Loria laboratory, Speech team, Nancy, France

irina.illina@loria.fr dominique.fohr@loria.fr

Candidates should email a detailed CV with diploma

===========================================================================

Subject 2

Title:  Domain adaptation of neural network language model for speech recognition

Supervisor: Irina Illina

Team and lab: MultiSpeech

Contact: illina@loria.fr

Co-supervisor: Dominique Fohr

Team and lab: MultiSpeech

Contact : dominique.fohr@loria.fr

Motivation and Context

Language models (LMs) play a key role in modern automatic speech recognition systems and ensure that the output respects the pattern of the language. In the state-of-the-art systems, the language model is a combination of n-gram LMs and neural network LMs because they are complementary. These LM are trained on huge text corpora.

The language models are trained on a corpus of varied texts, which provides average performance on all types of data. However, document content is generally heavily influenced by the domain, which can include topic, genre (documentary, news, etc.) and speaking style. It has been shown that domain adaptation of LMs to small amounts of matched in-domain text data provide significant improvements in both perplexity and word error rate. The objective of the internship is to adapt a neural networks based language model to the domain of an audio document to be recognized. For this, we will use a small specific text corpus.

The Multi-Genre Broadcast (MGB) Challenge is an evaluation campaign of speech recognition systems, using TV recordings in English or Arabic. The speech data is broad and multi-genre, spanning the whole range of TV output, and represents a challenging task for speech technology. Speech data covers the multiple genres in broadcast TV, categorized in terms of 8 genres: advice, children?s, comedy, competition, documentary, drama, events and news.

During the internship, the student will develop LM adaptation methods in the context of the MGB data.

Goals and Objectives

Neural network LM adaptation can be categorized as either feature-based or model-based.

In the feature-based adaptation, the input of the neural network is augmented with auxiliary features, which model domain, topic information, etc. However, these auxiliary features must be learn during the training of the LM model and thus require whole model retraining.

Model-based adaptation consists in adding complementary layers and training these layers with domain-specific adaptation data. An advantage of this method is that full retraining is not necessary. Another model-based adaptation method is fine-tuning: after training the model with the whole training data, the model is tuned with the target domain data. The downside of this approach is the lack of the optimization objective.

During the internship, the student will perform a bibliographic study on model adaptation approaches. Depending on the pros and cons of these approaches, we will propose a method specific to MGB data. This method may include changing the architecture of the neural network.

The student will validate the proposed approaches using the automatic transcription system of radio broadcast developed in our team.

Required skills background in statistics, natural language processing and computer program skills (Perl, Python).

Localization and contacts: Loria laboratory, Speech team, Nancy, France

irina.illina@loria.fr dominique.fohr@loria.fr

Candidates should email a detailed CV with diploma

===========================================================================

Subject 3

Title: Using Wikipedia to search for proper names relevant to the audio document transcription

Supervisor: Irina Illina

Team and lab: MultiSpeech

Contact: illina@loria.fr

Co-supervisor: Dominique Fohr

Team and lab: MultiSpeech

Contact : dominique.fohr@loria.fr

Motivation and Context

More and more audio/video appear on Internet each day. About 300 hours of multimedia are uploaded per minute. Nobody is able to go thought this quantity of data. In these multimedia sources, audio data represents a very important part. Classical approach for spoken content retrieval from audio documents is an automatic speech recognition followed by a text retrieval.

An automatic speech recognition system uses a lexicon containing the most frequent words of the language and only the words of the lexicon can be recognized by the system. New proper names (PNs) appear constantly, requiring dynamic updates of the lexicons used by the speech recognition system. These PNs evolve over time and no vocabulary will ever contains all existing PNs. These missing proper names can be very important for the understanding of the test document.

In this study, we will focus on the problem of proper names in automatic recognition systems. The problem is to find relevant proper names for the audio document we want to transcribe. For this task, we will use a remarkable source of information: Wikipedia, free online encyclopedia, the largest and most popular reference work on the internet. Wikipedia contains a huge number of proper names.

Goals and Objectives

We assume that in an audio document to transcribe we have missing proper names, i.e. proper names that are pronounced in the audio document but that are not in the lexicon of the automatic speech recognition system; these proper names cannot be recognized (out-of-vocabulary proper names, OOV PNs)

The goal of this internship is to find a list of relevant OOV PNs that correspond to an audio document. We will use Wikipedia as a source of potential proper names.

Assuming that we have an approximate transcription of the audio document and a Wikipedia dump, two main points will be addressed:

  • How to represent a Wikipedia page and in which space? One possibility is to use word embeddings (for instance Mikolov?s word2vec ).
  • Using the previous representation, how to select relevant pages according to the approximate transcription of the audio document? The automatic speech recognition systems of broadcast news have a 10-20% word error rate. If we project documents in a continuous space, different distances can be studied.

In a first step, we can consider Wikipedia page as a simple text page. In a second step, the student should use the structure of the Wikipedia pages (page links, tables, headings, infobox).

During the internship, the student will investigate methodologies based on deep neural networks.

The student will validate the proposed approaches using the automatic transcription system of radio broadcast developed in our team.

Required skills background in statistics, natural language processing and computer program skills (Perl, Python).

Localization and contacts: Loria laboratory, Speech team, Nancy, France

irina.illina@loria.fr dominique.fohr@loria.fr

Candidates should email a detailed CV with diploma

===========================================================================

 
--
Dominique Fohr
Research scientist CNRS
LORIA-INRIA MultiSpeech Group
Address: Bâtiment LORIA, BP 239, F54506 Vandoeuvre, France
Email: dominique.fohr@loria.fr    Tel: (+33) 3.83.59.20.27
Top

6-22(2016-12-02) Tenure senior research position at INRIA Bordeaux France
Flowers Lab at Inria (Bordeaux, France)
Deadline: 16th december
 
We are searching highly qualilified candidates for a tenure senior research position (full time research, no teaching mandatory). 
Candidates should have an outstanding academic track record in one or two of the following domains:
 
1) Computational modelling of cognitive development, including the following research topics and methods:
   - Models of exploration and active learning in humans and animals
   - Models of human reinforcement learning and decision making
   - Bayesian or neuronal models of development
   - Models of autonomous lifelong learning
   - Models of tool learning 
   - Sensorimotor, language and social development
   - Strong experience of collaborations with developmental psychologists or neuroscientists
 
2) Lifelong autonomous machine learning and artificial intelligence, including:
   - Unsupervised deep reinforcement learning
   - Intrinsic motivation 
   - Developmental learning, curriculum learning
   - Contextual bandit algorithms
   - Multitask and transfer learning
   - Hierarchical learning
   - Strong experience in benchmarking with robotic or virtual world setups
 
As the Flowers lab domains of application are robotics/HRI/HCI and educational technologies, experience in one of these two
domains would be a clear asset. 
 
Experience in writing successful grant proposals will also be considered positively.
 
Candidates should have a strong postdoc experience after their PhD, or may already be at the level of occupying a research position
in a university or research organization.  
 
The Flowers Lab: developmental robotics and lifelong multitask machine learning
===================================================================
 
The Flowers Lab, headed by Pierre-Yves Oudeyer, gathers a team of ~20 members and has been one of the pioneers of developmental robotics and lifelong machine learning
and artificial intelligence in the last decade, in particular through developping models of intrinsically motivated learning of repertoires of skills that have both contributed 
to advance understanding of human curiosity and development, and to advance incremental online multitask machine learning techniques in difficult high-dimensional 
robotic spaces
 
This work in the Flowers lab are conducted in the context of large international projects (e.g. ERC grant, European projects 3rdHand and DREAM, HFSP project Neurocuriosity),
with interdisicplinary collaborations with other labs in neuroscience, psychology, machine learning and robotics. The successful candidates would be directly
involved in these international collaborations. 
 
The Flowers Lab has is also developping applications of these concepts and techniques in the domain of educational technologies, including adaptive
intelligent tutoring systems (using bandit algorithms), educational robotics, and software that stimulate curiosity and learning in humans.
 
The Flowers lab has recently spin-off the Pollen Robotics startup company, and is involved in multiple collaborations with industrials through Inria's strong
support towards impacting both science and industry. 
 
Inria and EnstaParistech
===================
 
The lab is within Inria, which is a prestigious, and also the largest, public European research insitution focused on computer science, mathematics and their applications.
Inria's teams and researchers (> 2800 employees) have received prestigious awards, coordinate many international projects, and have created strong innovations now used in many
parts of industry. Inria research center in Bordeaux gathers around 300 researchers. 
The Flowers Lab is also associated to EnstaParisTech, which is a prestigious French engineering school (university).
 
Bordeaux 
========
 
The Flowers lab in Bordeaux is located in a great building on the border of one of the world most famous vineyard, and 10mn by tram from Bordeaux town center
(and 2 hours from Paris through high-speed trains): https://www.inria.fr/en/centre/bordeaux
 
Web
===
 
Flowers web site: https://flowers.inria.fr
Lifelong intrinsically motivated learning in robots: http://www.pyoudeyer.com/active-learning-and-artificial-curiosity-in-robots/
 
How to apply
===========
 
CVs and letters of motivation should be sent to Pierre-Yves Oudeyer (pierre-yves.oudeyer@inria.fr) before 16th december.
Then, a pre-selection will be made and a full application with a detailed research statement and program will have to be submitted early february.
The successful candidate would begin to work between september and december 2017.
 
Pierre-Yves Oudeyer 
Research director, Inria
Head of Flowers Lab
Inria and Ensta ParisTech
Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA