ISCA - International Speech
Communication Association


ISCApad Archive  »  2014  »  ISCApad #193  »  Jobs

ISCApad #193

Friday, July 11, 2014 by Chris Wellekens

6 Jobs
6-1(2014-02-18) Speech Data Evaluator for French at Google, Dublin

Speech Data Evaluator for French

Job title:

Speech Data Evaluator for French (Multiple positions)

 

Linguistic Field(s):

  • Phonetics/Phonology

  • Semantics

  • Discourse Analysis

 

Job description:

As
a Speech Data Evaluator and a native-level speaker of French, you will
be part of a team based in Dublin, processing large amounts of
linguistic data and carrying out a number of tasks to improve the
quality of Google’s speech synthesis and speech recognition in your own
language.

 

This includes:

  • classifying and annotating linguistic data

  • transcription

  • labeling text for disambiguation, expansion, and text normalization

  • providing phonetic transcription of lexicon entries according to given standards and using in-house tools

 

Job requirements:

  • native-level speaker of French (with good command of the standard dialect) and fluent in English

  • passion for language with good knowledge of orthography and grammar in the target language

  • a
    degree in a language-related field such as linguistics, language
    teaching, translation, editing, writing, proofreading, or similar

  • keen
    interest in technology and computer-literate (should feel comfortable
    using in-house tools and should have an interest in current speech,
    mobile and online technology)

  • attention to detail and good organizational skills

 

Project duration: 6-11 months (with potential for extension)

**This is not a permanent position but a contract position.**

 

For
immediate consideration, please email your CV and cover letter in
English (PDF format preferred) with 'Speech Data Evaluator [French]' in
the subject line.

 

Email Address for applications: DataOpsMan@gmail.com

Contact information: Ara Kim

Closing date: open until filled

Top

6-2(2014-02-20) PhD scholarship on deep neural networks , INRIA, Nancy, F

We are pleased to offer a PhD scholarship on deep neural networks for source separation and noise-robust ASR
http://www.inria.fr/en/institute/recruitment/offers/phd/campaign-2014/%28view%29/details.html?nPostingTargetID=14062 
 


We are looking forward to receiving applications by April 18 (please do not wait until the later deadline indicated on the website).

Top

6-3(2014-02-20) 3 positions of Maitre de conférence (lecturer) at Avignon, F

Trois postes de Maître de Conférences en informatique (section 27) sont mis au concours pour la campagne 2014 au Centre d'Enseignement et de Recherche en Informatique de l'Université d'Avignon (ceri.univ-avignon.fr).

Les personnes recrutées effectueront leur recherche au sein du LIA (lia.univ-avignon.fr).

Les profils des 3 postes, consultables sur lia.univ-avignon.fr rubrique Emplois, couvrent les thématiques du laboratoire :

- Informatique - Spécialité langage/recherche et extraction d'information  (Poste 0284 / Galaxie 4043)

- Informatique - Spécialité réseaux  (Poste 0334 / Galaxie 4045)

- Informatique - Spécialité recherche opérationnelle et optimisation (Poste 0324 / Galaxie 4044)

Contacts :

Enseignement : Corinne Fredouille corinne.fredouille@univ-avignon.fr - Directrice des Etudes du CERI et Fabrice Lefèvre fabrice.lefevre@univ-avignon.fr - Directeur du CERI

Recherche : Georges Linares georges.linares@univ-avignon.fr - Directeur du LIA et Fabrice Lefèvre fabrice.lefevre@univ-avignon.fr - Directeur du CERI

Top

6-4(2014-03-04) Speech Linguistic Project Manager - Google Speech Research, London, UK

Speech Linguistic Project Manager - Google Speech Research


Job title:

Speech Linguistic Project Manager


Location:

London, UK


Linguistic Field(s):

  • Phonetics/Phonology

  • Semantics

  • Discourse Analysis


Job description:

The
Speech Team at Google builds technologies to turn spoken input into
machine understandable format, or written text into spoken output. Our
technology is used to enable voice-based interactions in products like
Google Now, Voice Search, Dictation, Speech-to-Speech Translation,
Google Glass, and more.




As
a Linguistic Project Manager and a native-level speaker of French, you
will oversee and manage all work related to achieving high data quality
for speech projects in your own language.


You
will be part of a team based in London, managing a team of Data
Evaluators and working on a number of projects towards Speech research:
ASR, TTS, and NLP.


This includes:

  • training, managing and overseeing the work of your team

  • creating verbalisation rules, such as expanding URLs, email addresses, numbers

  • propose new projects to fulfill research needs for spoken languages around the world

  • creating annotation conventions

  • evaluating data quality

  • providing expertise on pronunciation and phonotactics

  • working with QA tools according to given guidelines and using in-house tools


Job requirements:

  • native-level speaker of French (with good command of the standard dialect) and fluent in English

  • must have attended elementary school in the country where the language is spoken

  • keen ear for phonetic nuances and attention to detail; knowledge of the language’s phonology

  • ability to quickly grasp technical concepts; should have an interest in current speech, mobile and online technology

  • excellent oral and written communication skills

  • good organizational skills

  • previous project management and people management experience

  • previous experience with speech/NLP-related projects a plus

  • advanced degree in Linguistics preferred; experience with Computational Linguistics a plus

  • also a plus: proficiency with HTML, XML, and some programming language; previous experience working in a Linux environment


Project duration: 6-11 months (with potential for extension)

**This is not a permanent position but a contract position.**


For
immediate consideration, please email your CV and cover letter in
English (PDF format preferred) with 'Speech Linguistic Project Manager
[French]' in the subject line.


Email Address for applications: DataOpsMan@gmail.com

Contact information: Ara Kim

Closing date: open until filled
Top

6-5(2014-03-04) Doctoral student position , Aalto University, Finland

Doctoral student position in an interdisciplinary speech technology project “Computational Modeling of Language Acquisition”

Job description:
The Speech Technology Team of Aalto University offers a full-time doctoral student position on computational modeling of language acquisition. Those who wish to do their master’s thesis on the topic and are willing to pursue post-graduate studies after their graduation are also welcome to apply.
The high-level goal of the research project is to understand how human children learn to understand and produce speech, and how computers could be programmed with similar learning skills in order to outperform the classical manually engineered speech recognition systems. In short-term, the focus is on the development of ecologically plausible computational models of different aspects of human language processing (perception, production, multimodal interactions) and evaluating these models with respect to data from behavioural studies. The research is simultaneously concerned with the development and application of low-level machine learning and signal processing algorithms and on the development of high-level models of language learning, making the work highly interdisciplinary, diverse and challenging.
The more specific content of the current position is not fixed in advance, but will be planned together with a successful applicant. The focus can be more on the technical side (machine learning/signal processing algorithm development), on the behavioural side (collecting and modelling behavioural data), or both.

Requirements:
Due to the interdisciplinary nature of the work, we are looking for an applicant with a strong background, skills and a burning interest in one or more of the following fields: speech/signal processing, auditory modeling, speech acoustics, machine learning, cognitive science, cognitive psychology, computational linguistics, artificial intelligence, or mathematics.
A successful applicant will have a M.Sc. (tech.) or an equivalent degree in a relevant field or has nearly completed the degree, has good programming skills (e.g., MATLAB, Java or C) and is a fluent user of English (both written and verbal). Capability for self-driven independent and productive work is a must. Basic or advanced knowledge in signal processing, machine learning, and/or statistics are a significant advantage.

Job details:
The current work will take place in a project Computational Modeling of Language Acquisition, funded by the Academy of Finland. The first contract will be a 6–month trial period for doctoral students or 6 months for the M.Sc. thesis work. Extension up to a total of 2.5 years is envisioned upon successful performance. The work will take place in the Department of Signal Processing and Acoustics of Aalto ELEC.

Salary:
Salary level will be based on the local-agreement salary system of Aalto University.

How to apply:
Please send your application to D.Sc. Okko Räsänen (okko.rasanen@aalto.fi) by email. The deadline for applications is 31.4.2014. Please include at least the following documents in your application: a cover letter describing your background and motivation for the position, CV, a transcript of bachelor and master’s level study records, and possible letters of recommendation.

For more information, please contact:
D.Sc. Okko RäsänenorProf. Unto K. Laine
okko.rasanen@aalto.fiunto.laine@aalto.fi
+358-50-441 9511+358-50-593 0251

Top

6-6(2014-03-04) Postdoctoral fellow in the area of phonetics, Max Planck Institute, Leipzig, Germany

Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany

 

Announcement of Vacancy

 

The Department of Linguistics at the Max Planck Institute in Leipzig has a vacancy for a postdoctoral fellow in the area of phonetics.

 

In order to understand the abstract structure underlying the movements of articulators involved in speech sound production, the department’s phonetics lab is developing methods aimed at detecting the coupling relations between the dynamical systems involved. This work is based on a variant of joint recurrence analysis which is compatible with the uncontrolled manifold theory of synergies in intentional dynamics. The methods will be applied to explore the abstract coupling links underlying the coordination between the behavior of the larynx and the behavior of oral articulators.

 

The successful candidate will be involved in the conceptual design and in the development of analytical tools. This includes the areas of joint recurrence analysis, recurrence network approaches, and procedures for building surrogate databases for testing the significance of the observed relations.

 

The one-year non-renewable position ends no later than May 31, 2015.

 

Prerequisites for an application are a PhD and knowledge of dynamical systems theory as well as of information theory (as shown by relevant publications or by his/her academic curriculum), with an interest in their application to real datasets. Programming skills (preferably Matlab or C++) are required. Knowledge of recurrence analysis methods and of Bayesian statistics is welcome.

 

The salary is according to the German public service pay scale (TVöD). The Max Planck Society is concerned to employ more disabled people; applications from disabled people are explicitly sought. The Max Planck Society wishes to increase the proportion of women in areas in which they are underrepresented; women are therefore explicitly encouraged to apply.

 

Applicants are requested to send their complete dossier (including curriculum vitae, description of research interests, names and contact details of two referees, and a piece of written work on one of the relevant topics) to:

 

Max Planck Institute for Evolutionary Anthropology

Personnel Department

Prof. Dr. Bernard Comrie

Code word: Postdoc Phonetics

Deutscher Platz 6

D-04103 Leipzig, Germany

 

Application will be received until the position is filled.

 

Please address questions to Dr. Sven Grawunder <grawunder@eva.mpg.de>.

Information on the institute is available at http://www.eva.mpg.de/.

 

Top

6-7(2014-03-05) PhD research fellowship at Université de Sherbrooke, Canada

 

Ver. 2014-03-03

Ph.D. project offer

Title

Development of speech coding algorithms inspired by the central auditory system

Description

The central auditory system has many characteristics that could foster the development of improved speech coders. The objective of this research project is to develop speech coding algorithms that are inspired by the central auditory system. This work will make use of the current knowledge in neural signal processing by the central auditory system as well as of several metrics developed for the objective evaluation of speech processing algorithms. It will allow the study of new paradigms that will lead to the development of novel speech coding algorithms. The research will be conducted jointly in the Speech and Audio Processing Laboratory and the Computational Neuroscience and Intelligent Signal Processing Research Group at the Université de Sherbrooke, Sherbrooke, Canada. A financial support is expected.

Requirements

Excellent academic record.

 Strong background in speech processing.

 Excellent ability to communicate in English. Knowledge of French is considered an advantage.

 Interest in neuroscience.

The Université de Sherbrooke is a French-speaking institution located in the province of Quebec, Canada, and the heart of an international research pole. It is host to more than 40 000 students from over 120 different countries worldwide. (Source: UdeS)

For more information: Eric Plourde, Eng., Ph.D.

Roch Lefebvre, Eng., Ph.D.

Department of Electrical and Computer Engineering

Department of Electrical and Computer Engineering

Faculty of Engineering

Faculty of Engineering

Université de Sherbrooke

Université de Sherbrooke

Sherbrooke, Quebec Canada

Sherbrooke, Quebec Canada

+819-821-8000, # 63255

eric.plourde@usherbrooke.ca

+819-821-8000, # 62134

roch.lefebvre@usherbrooke.ca

www.gel.usherbrooke.ca/plourde/english.html

www.usherbrooke.ca/gelecinfo/fr/departement/profs/lefr-en/

Top

6-8(2014-03-06) 1 postdoc and 1PhD at INRIA, Nancy, F
 
 
*** Postdoc #1
Post-doctoral Fellow - INRIA - Nancy, France
 
Title : Accurate 3D Lip modeling and control in the context of animating a 3D talking head
 
The goal of this work is to develop an accurate 3D lip model that can be integrated within a talking head. A control model will also be developed. The lip model should be as accurate dynamically as possible. When designing this model, the focus will be on the dynamics. For this reason, one can start from a static 3D lip mesh, using a generic 3D lip model, and then we will use MRI images or 3D scans to obtain more realistic shape of the lips. To take into account the dynamic aspect of the lip deformation, we will use an articulograph (EMA) and motion capture technique to track sensors or markers on the lips. The mesh will be adapted to this data. 
To control the lips, we will consider allowing a skeletal animation to be controlled by the EMA sensors or motion capture markers, using inverse kinematic technique, widely used in 3D modeling. In line with conventional skeletal animation, an articulated armature rigged inside the mesh is mapped to vertex groups on the lip mesh by a weight map that can be defined automatically from the envelope of the armature’s shape and manually adjusted if required, where manipulating the armature’s components deforms the surrounding mesh accordingly.
The main challenge is to find the best topology of the sensors or markers on the lips, to be able to better capture accurately its dynamics. The main outcome is to accurately model and animate the lips based on articulatory data. It is very important to have readable lips in that can be lip-read by hard-of-hearing people. 
 
 
For full description and to apply, please visit this website:
 
The applications will be considered as soon as received.
 
 
Slim Ouni
University of Lorraine
Parole - Inria Nancy -Grand Est
 
************************************************************
PhD #1
Position type: PhD Student - INRIA - Nancy, France

Title: Emotion modeling during expressive audiovisual speech
 
The goal of this thesis is to study the expressivity from articulatory and visual points of view. The articulatory and facial gestures will be characterized for the different sounds of speech (called phonemes) in the different expressive contexts. The goal is to determine how facial expressions interact with speech gesture (lips, tongue, face and acoustics), and how this is embedded within the phoneme articulation and their acoustic consequences. The quantification of the intensity of an expression during a given phoneme (or sequence of phonemes) articulation needs to be determined. One important objective of this work is to develop an expressive control model, which describes the interaction of the facial expressions with audiovisual speech.
 
To achieve these goals, two corpora will be acquired using electromagnetography (EMA) and motion capture techniques synchronously with acoustics. The EMA is the technique that uses electromagnetic sensors, glued on the tongue, teeth, lips and eventually the face, that represent 3D positions, and two angle orientations. We will use a marker-less motion capture that allows retrieving the dynamics of the face due to speech and expressivity. The corpora will cover sentences pronounced in several emotional contexts. These corpora provide the articulatory trajectories of the tongue and the lips in addition to the acoustic signal. The acquired data will be processed and analyzed, and the control model will be developed based on the results of this analysis.
 
For full description and to apply, please visit this website:
 
The applications will be considered as soon as received.
 
Slim Ouni
University of Lorraine
Parole - Inria Nancy -Grand Est
Top

6-9(2014-03-07) Post-doc position at LORIA (Nancy, France)

Post-doc position at LORIA (Nancy, France)

Framework of ANR project ContNomina

The technologies involved in information retrieval in large audio/video databases are often based on the analysis of large, but closed, corpora, and on machine learning techniques and statistical modeling of the written and spoken language. The effectiveness of these approaches is now widely acknowledged, but they nevertheless have major flaws, particularly for what concern proper names, that are crucial for the interpretation of the content.

In the context of diachronic data (data which change over time) new proper names appear constantly requiring dynamic updates of the lexicons and language models used by the speech recognition system.

As a result, the ANR project ContNomina (2013-2017) focuses on the problem of proper names in automatic audio processing systems by exploiting in the most efficient way the context of the processed documents. To do this, the postdoc student will address the contextualization of the recognition module through the dynamic adjustment of the language model in order to make it more accurate.

Post-doc subject

The language model of the recognition system (n gram learned from a large corpus of text) is available. The problem is to estimate the probability of a new proper name depending on its context. Several tracks will be explored: adapting the language model, using a class model or studying the notion of analogy.

Our team has developed a fully automatic system for speech recognition to transcribe a radio broadcast from the corresponding audio file. The postdoc will develop a new module whose function is to integrate new proper names in the language model.

Required skills

A PhD in NLP (Natural Language Processing), be familiar with the tools for automatic speech recognition, background in statistics and computer program skills (C and Perl).

Post-doc duration

12 months from June 2014 (these is some flexibility)

Localization

Loria laboratory, Speech team, Nancy, France

Contacts

Irina.illina@loria.frdominique.fohr@loria.fr

Candidates should email a letter of application, a detailed CV with a list of publications and diploma

Top

6-10(2014-03-07) MCF en informatique pour les Sciences Humaines, Université Paris Sorbonne.

Un poste de MCF en informatique pour les Sciences Humaines est ouvert à l'Université Paris Sorbonne. Le candidat enseignera l’Informatique dans les différentes formations de licence et de master du département d’Informatique, Mathématiques et de Linguistique appliquées. Il devra s'inscrire dans un ou plusieurs axes de l'équipe de linguistique computationnelle (www.stih.paris-sorbonne.fr/) : Sémantiques et connaissances, Paralinguistique de la parole et du texte, Jugements d’évaluation, opinions et sentiments.

Personne à contacter : Claude.Montacie@Paris-Sorbonne.fr

Top

6-11(2014-03-08) Senior Technical Engineer/Scientist (Software team Manager) at ELDA

The European Language resources Distribution Agency (ELDA), a company specialized in Human Language Technologies within an international context, acting as the distribution agency of the European Language Resources Association (ELRA), is currently seeking to fill an immediate vacancy for Senior Technical Engineer/Scientist (Software team Manager) position.

Under the supervision of the CEO, the responsibilities of the Senior Technical Engineer/Scientist include managing of a small development team, designing/specifying tools and software components for Language Resources, production frameworks and platforms, carrying out quality control and assessment. He/she will be in charge of renovating the current language resources production workflows. This yields excellent opportunities for young, creative, and motivated candidates wishing to participate actively to the Language Engineering field. He/she will be in charge of conducting the activities related to language resources and Natural Language Processing technologies. The task will mostly consist in managing language resources production projects and co-ordinating ELDA’s participation in R&D projects while being also hands-on whenever required by the development team.

Profile :

    Good knowledge of Linux and open source software
    Proficiency in Python, Django, PhP, Perl, CSS
    Proficiency in Django-CMS is a plus
    Good knowledge of e-commerce development (Python/Django-oriented)
    Proficiency in French and English
    Dynamic and communicative, flexible to combine and work on different tasks
    Experience with technology transfer projects, industrial projects,collaborative projects within the European Commission or other international frameworks
    Good knowledge of the Language Technology area is a plus
    Ability to work independently and as part of a team, in particular the ability to supervise members of a multidisciplinary team
    Citizenship of (or residency papers) a European Union country
    Applications will be considered until the position is filled. The position is based in Paris.

Salary : Commensurate with qualifications and experience.

Applicants should email a cover letter addressing the points listed above together with a curriculum vitae to :

Khalid Choukri
ELRA / ELDA
9, rue des Cordlières
75013 Paris
FRANCE
Fax : 01 43 13 33 30
Mail : job@elda.org


Please check out our other vacant positions:

Top

6-12(2014-03-08) PhD position in spoken dialogue systems research, Charles University, Prague, CZ
PhD position in spoken dialogue systems research
 
Applications are invited for a PhD fellowship in the area of statistical spoken dialogue systems funded by the Czech Government. The student will join the Institute of Formal and Applied Linguistics, Charles University in Prague, Czech Republic with an anticipated start date of October 1st, 2014.
 
Topic description: In recent years, it has been suggested that statistical approach to spoken dialogue system offer a framework to naturally handle inherent uncertainty in the human speech. The two main advantages of statistical methods are increased robustness in noisy conditions and more natural behaviour learnt from data. However, the current methods need large corpora effectively preventing these methods to be used for complex dialogue systems occurring in real-life. The successful PhD candidate will investigate and implement statistical models and methods with the aim of increasing the efficiency of the learning process and reducing the need for large corpora. The conducted research will cover areas of spoken language understanding, dialogue management, and natural language processing. 
 
Skills: The candidate should hold a master degree in a relevant area, such as computer science, mathematics, engineering  or linguistics. A strong mathematical background, excellent programming skills (e.g. C/C++, Java, MATLAB, and various scripting languages under Linux environment), aptitude for creative research and autonomy are expected. Experience in machine learning, Bayesian methods, and natural language processing is a plus.
 
The Institute of Formal and Applied Linguistics is a top-level research group working in the area of computational linguistics and natural language processing. During the fellowship, there will be good opportunities to attend international conferences and workshops. The formal applications should be submitted before June 1. Prospective candidates are strongly encouraged to contact Dr Filip Jurcicek (jurcicek@ufal.mff.cuni.cz) as soon as possible to obtain details about the application process, the institute, and the research opportunities. 
 
Additional information is available at https://ufal.mff.cuni.cz/filip-jurcicek/jobs.
Top

6-13(2014-03-12) Researchers in Speech and dialog at IKERBASQUE, Bilbao, Spain
Ikerbasque would like to inform you that we have launched a new international call to reinforce research and scientific career in the Basque Country. We offer:
 
 
·          25 positions for Promising Researchers Ikerbasque Research Fellows  
o    5 year contracts
o    PhD degree between Jan2004-Dec2011
o    Support letter from the host group is mandatory
o    Deadline: March 31st at 13:00 CET
 
For further information, use this link: www.ikerbasque.net
 
We would appreciate your help in disseminating this information, in case you know about any colleague that could be interested and meets the requirements of the call.
 
 
 
IKERBASQUE
Basque Foundation for Science

Professor M. Inés Torres
PR&Speech Technology
Dpto. Electricidad y Electrónica
Fac. Ciencia y Tecnología - UPV/EHU
Apdo. 644 - 48080 Bilbao (Spain)
Phone: +34 94 601 2715
e_mail: manes.torres@ehu.es


Top

6-14(2014-03-17) Post Doctoral Position Iceland

Applications are invited for a postdoctoral position in the area of speech processing 
Post Doctoral Position in Speech Processing    
for cognitive workload monitoring.  We are looking for an engineer/computer 
scientist with an experience in machine learning, statistics, and speech signal 
processing or in closely related fields. The main aim of the project is to develop a 
pattern recognition software that detects and monitors cognitive workload of air 
traffic controllers from their speech commands, video recordings and other 
observations.  This is achieved with recordings of participants during cognitive 
workload experiments and with with transferring the technology to simulated air 
traffic control environment. A successful applicant will be expected to lead this work 
and assist in the supervision of doctoral and master students working in this area.     
 
Applicants should have received a doctoral degree in engineering, computer science 
Qualifications:   
or a related discipline, or completion of all requirements for the degree should be 
expected prior to the starting date. A successful applicant will have an experience in 
machine learning, statistics, speech processing or closely related fields and be able 
to implement signal processing and machine learning procedures in an appropriate 
environment.   
 
Reykjavik University is an international university located at the heart of Reykjavik, 
About Reykjavik University:   
the capital of Iceland. It was established in 1998 and it is the largest technical 
university in Iceland.  RU places a strong emphasis on research and publishes more 
scientific articles in its fields of expertise than any other university in the country.  
The university has developed a productive and appealing research environment for 
Icelandic and foreign scientists. Approximately 140 academic employees (assistant 
professors, associate professors, professors, adjuncts, postdoctoral fellows, guest 
professors, and other specialists) currently work at RU on various research projects 
in collaboration with both domestic and international research institutions and 
companies.   
 
The appointment is available immediately and is for two years. The participant will 
Appointment:   
be selected based on academic records, recommendations, research interests and 
compatibility of background. The salary is determined from the guideline of the 
Icelandic Research Fund (http://www.rannis.is) for post- doctoral research 
associates.   
 
Electronic applications, including a cover letter, a CV and the details of two referees, 
How to apply:   
should be sent to dr. Jon Gudnason (email: jg@ru.is) and dr. Kamilla Johannsdottir 
(kamilla@ru.is). 

Top

6-15(2014-03-23) Post-doc position at LIMSI-CNRS in the Spoken Language Processing group

Post-doc position at LIMSI-CNRS in the Spoken Language Processing group

A post-doc position is proposed at LIMSI-CNRS (Orsay, France - http://www.limsi.fr), in the context of the ANR-funded CHIST-ERA CAMOMILE Project (Collaborative Annotation of multi-MOdal, MultI-Lingual and multi-mEdia documents - http://www.chistera.eu/projects/camomile).

Context of the project

Human activity is constantly generating large volumes of heterogeneous data, in particular via the Web. These data can be collected and explored to gain new insights in social sciences, linguistics, economics, behavioural studies as well as artificial intelligence and computer sciences.
In this regard, 3M (multimodal, multimedia, multilingual) data could be seen as a paradigm of sharing an object of study, human data, between many scientific domains. But, to be really useful, these data should be annotated, and available in very large amounts. Annotated data is useful for computer sciences which process human data with statistical-based machine learning methods, but also for social sciences which are more and more using the large corpora available to support new insights, in a way which was not imaginable few years ago. However, annotating data is costly as it involves a large amount of manual work, and in this regard 3M data, for which we need to annotate different modalities with different levels of abstraction is especially costly. Current annotation framework involves some local manual annotation, with the help sometimes of some automatic tools (mainly pre-segmentation).
The project aims at developing a first prototype of collaborative annotation framework on 3M data, in which the manual annotation will be done remotely on many sites, while the final annotation will be localized on the main site. Furthermore, with the same principle, some systems devoted to automatic processing of the modalities (speech, vision) present in the multimedia data will help the transcription, by producing automatic annotations. These automatic annotations are done remotely in each expertise point, which will be then combined locally to produce a meaningful help to the annotators.
In order to develop this new annotation concept, we will test it on a practical case study: the problem of person annotation (who is speaking?, who is seen?) in video, which needs collaboration of high level automatic systems dealing with different media (video, speech, audio tracks, OCR, ...). The quality of the annotated data will be evaluated through the task of person retrieval.
This new way to envision the annotation process, should lead to some methodologies, tools, instruments and data that are useful for the whole scientific community who have interest in 3M annotated data.

Requirements and objectives

A PhD in a field related to the project (speech processing, computer vision or machine learning) is required. The candidate will perform research on multimodal person recognition in videos (speaker recognition, face recognition or multimodal fusion) and will also be involved with the partners in the development of the distributed annotation framework. Knowledge of JavaScript and Python programming languages is needed for working on this framework. Salary will follow CNRS standard rules for contractual researchers, according to the experience of the candidate.

Contacts

  • Claude Barras (Claude.Barras [at] limsi.fr)
  • Hervé Bredin (Herve.Bredin [at] limsi.fr)
  • Gilles Adda (Gilles.Adda [at] limsi.fr)

Agenda

  • Opening date: May 2014
  • Duration: 18 months

   

Top

6-16(2014-03-25) Offre de thèse en France

 


Sémiologie  sonore  pour  l'analyse  spatiale  et  la
représentation cartographique

 

                                              
Directeurs de thèse : JOSSELIN Didier  & ALTMAN Eitan

                                                                    
Membres extérieurs du comité de thèse : BONIN Olivier ,  
BRESSON Jean

1. Laboratoire d'Informatique d'Avignon (LIA), Avignon
2. IFSTTAR – AME-LVMT Champs sur Marne, Marne la Vallée
3. UMR ESPACE, Avignon
didier.josselin@univ-avignon.fr
4. IRCAM, Paris
5. INRIA, Sophia-Antipolis

MOTS-CLÉS :  CARTOGRAPHIE,  MUSIQUE,  GEOMATIQUE,  ANALYSE  SPATIALE,  COMPOSITIONS,
PARTITIONS


Résumé :
La  recherche  proposée  consiste  à  poser  les  fondations  d'une  sémiologie  pour  la
représentation cartographique qui ne soit plus basée exclusivement sur la graphique,
mais sur le son, voire la musique au sens large.
Cette recherche est fondamentale puisqu'elle doit appréhender la dialectique entre la
méthodologie cartographiquee et la théorie de la musique, en appréhendant ce que la
carte  apporte  à  la  musique  et  réciproquement,  en  dégageant  les  différentes
dimensions partagées par les deux approches.
Potentiellement,  elle  a  aussi  une  portée  pratique,  puisqu'elle  peut,  d'une  part,
améliorer la compréhension de la lecture de la carte en ajoutant l'audition à la vue
dans les processus d'analyse  spatiale de données géographiques. Elle peut, à terme,
déboucher  sur  des  applications  de  reconnaissance  dynamique  d'environnement  par
des personnes malvoyantes ou à mobilité réduite, via les cartes mentales.
Ce  sujet  de  recherche  est  ouvert  à  différentes  disciplines,  comme  la  géographie
(géomatique  notamment),  les  sciences  de  l'information  et  de  la  communication,
musicologie, sciences cognitives.

1. Contexte et rationalité

   Depuis les travaux précurseurs  de J. Bertin (1975) en sémiologie graphique,  la
cartographie  a  largement  évolué  (Escobar  et  al.,  2008)  sur  deux  aspects  en
particulier : l'interactivité et l'accessibilité en ligne. On parle depuis quelques années
de  cartographie  en  mouvement  (Mac  Eachren,  1995,  Josselin  &  Fabrikant,  2003),
c'est à dire animée, multimédia (notamment en ligne)  ou interactive  (Cartwright et
al, 2007).

   La  cartographie  peut  être  descriptive,  dans  le  sens  où  elle  sert  à  observer  des
phénomènes se déroulant dans le temps. Mais elle devient rapidement exploratoire,
voire  se  dote  d'un  pouvoir  explicatif,  par  la  capacité  de  ses  outils  à  mettre  en
interaction  différentes  dimensions  ou  points  de  vue.  À  ce  titre,  l'analyse  spatiale
exploratoire  (Andrienko,  2006)  s'est  largement  développée  et  ouvre  de  vastes
horizons scientifiques, technologiques et d'usage. En lien direct avec les cartes et la
géographie,  les graphiques  et les indices  statistiques  constituent  en  effet  autant  de
résumés  des données,  prises dans leur ensemble ou via des sélections appropriées.
Ces méthodes d'analyse constituent de puissants outils d'investigation ou de fouille
de données spatiales.

   Dans  les  outils  de  cartographie,  le  recours  à  la  dimension  visuelle  des
représentations  statistiques  reste  prégnant.  Cependant,  d'autres  voies
complémentaires existent, tels que le son ou la musique. Par exemple, le mouvement
du  « soundscape »  (paysage  sonore)  utilise  les  ambiances  sonores  pour  donner  du
sens  aux  lieux  et  aux  environnement  (Murray  Schafer  1969).  D'autres  auteurs
proposent de simples cartographies  des sons (Schiewe  & Kornfeld  2009). Dans ce
cas, les « signatures sonores » marquent et caractérisent les lieux de façon explicite.
Toutefois, elles ne sont pas utilisées pour mettre en évidence des discontinuités, des
gradients  ou  des  structures  dans  l'espace  (de  mobilité,  par  exemple).  Pourtant,
l'association  des  capacités  cognitives  visuelles  et  auditives  ne  peut  qu'améliorer
notre capacité à analyser  les données géographiques, d'autant plus lorsqu'elles sont
complexes.
   L'utilisation de représentations spatiales et de calculs spatiaux pour les structures
musicales  est  ainsi  un  domaine  en  plein  essor.  On  pourrait  étudier  le  problème
inverse :  à  partir  d'une  structure  spatiale  extraite  d'une  carte  ou  d'une  image,
reconstituer  les  contraintes  musicales  (harmoniques,  mélodiques,  rythmiques)
associées.  On  serait  alors  dans  une  démarche  d'aide  à  la  composition  musicale
(Adhitya  and  Kuuskankare,  2012).  Par  le  biais  de  règles  prédéterminées  ou  avec
l'assistance  d'un  utilisateur,  on  pourra  aller  jusqu'à  la  sonorisation  d'images  ou  de
cartes.  Au-delà de cet objectif, le rapprochement  entre analyse  musicale et analyse
spatiale privilégie souvent un sens   : les représentations musicales s'enrichissent de
représentations  spatiales.  Enrichir  le  domaine  de  l'analyse  spatiale  par  des
techniques  issues  de  l'analyse  musicale  est  une  perspective  intéressante  et
complémentaire. Cette thèse s'inscrit dans cette seconde approche.

2. Orientation de la thèse


2.1. Objectifs

   À notre connaissance, il n'existe pas aujourd'hui de sémiologie sonore éprouvée
et  consensuelle  permettant  de  réaliser  un  lien  symbolique  non  équivoque  entre
représentation sonore et représentation cartographique. Existe-elle  in fine ? Quelles
en  sont  les  dimensions ?  Comment  la  construire  et  avec  quelles  méthodes  ?  C'est
tout l'enjeu de ce sujet de thèse, à l'interface entre les sciences de la communication,
la géographie et la géomatique, la musicologie et les sciences cognitives. À l'instar
des  travaux  de  Bertin  sur  la  sémiologie  graphique  et  sur  tous  ses  développement
consécutifs,  cette  thèse  doit  permettre  de  poser  les  bases  d'une  sémiologie  sonore
dédié à la représentation et à l'analyse spatiales.

   Les objectifs généraux de cette recherche sont :

    •   d'évaluer en quoi et de quelle façon le son peut apporter une amélioration
         des méthodes d'analyse spatiale ;
    •   de  rechercher,  dans  la théorie  de  la  musique,  ce  qui  est mobilisable  pour
         représenter  des  portions  typiques  et  structurées  (patterns)  d'espaces
         géographiques ;
    •   d'appréhender  l'aspect  numérique  et  mathématique  de  la  musique  pour
         tenter  de  poser  les  bases  d'une  sémiologie  sonore  signifiante  (pour  une
         culture musicale donnée) ;
    •   de  réfléchir  aux  méthodologies  informatiques  permettant  le  dépôt  ou
         l'exploration des cartes et d'images géographiques par les utilisateurs.


2.2. Des dimensions partagées par la carte et la musique

   La/le doctorant(e) devra appréhender la musique à travers diverses composantes
intéressantes pour l'analyse spatiale (non exclusives) :

 
 
    •   Composantes    sonores :    intensité (puissance,    vélocité,    latence),
         hauteur/degré  (fréquence,  longueur  d'onde,  période),  timbre  (perception,
         émission, grain) ;

    •   Composantes  de  partition  :  durée,  tempo,  séquence  rythmique,  partition
         graphique  contrainte  (portée,  mesures,  systèmes)  ou  non  (continuités  de
         représentation) ;

 
 
    •   Composantes harmoniques : accords, gammes, tonalités, modes ;

    •   Composantes  libres  :  interprétation,  orchestration  (taille,  instruments),
         improvisation,  « sound-painting »,  enregistrement  et  apprentissage
         dynamiques, adaptation du musicien au contexte.

   Elle/il devra mettre en regard ces composantes avec la carte, qui peut ainsi entrer
en résonance avec ces différentes dimensions :

 
    •   Composantes sonores : la description des observations spatiales et de leurs
         attributs ;

    •   Composantes  de  partition  :  aspects  temporels  de  l'analyse  spatiale
         permettant  de  gérer  des  séquences  d'exploration  des  cartes  ou  des
         phénomènes dans le temps ;

 
    •   Composantes harmoniques : les patterns spatiaux (structures et associations
         spatiales)  peuvent  correspondre  à  une  dimension  compositionnelle  pour
         identifier et construire des objets composites dans l'espace géographique ;

    •   Composantes  libres  :  un  cheminement  (festif  ou  flâneur)  de  l'explorateur
         dans l'espace de représentation ou l'espace géographique peut constituer un
         espace de liberté et d'interprétation.


3.  Déroulement de la thèse

   D'une  part,  les  approches  méthodologiques  mobilisables  dans  ce  projet  sont
largement ouvertes. Elles dépendront du profil du / de la candidat(e), qui pourra être
issu(e)  des  différentes  disciplines :  géographie,  géomatique,  sciences  de
l'information  et  de  la  communication,  musicologie,  culture  et  communication,
sciences cognitives.

   D'autre part, la thèse se déroulera en trois phases :

   Un  premier  volet,  assez  théorique  (état  de  l'art),  consiste  à  recenser  les
différentes caractéristiques de la musique et en extraire celles qui peuvent constituer
des références sémiologiques sonores ou musicales. On pourra ici s'appuyer  sur les
théories (autour) de la musique et sur les dimensions de l'analyse spatiale.

   Un  second  volet,  méthodologique,  permet  de  construire  des  protocoles
expérimentaux robustes pour évaluer/valider ces références afin de dévoiler ce que
le  son ou  la  musique  produits suscitent  en termes  d'objets  géographiques  dans un
contexte culturel donné (ancrage des symboles, aspects cognitifs).

   Un  troisième  volet,  plus  technique,  vise  à  mettre  en  oeuvre  une  cartographie
sonore  dans  un  environnement  cartographique  standard  (Système  d'Information
géographique interfacé à des outils d'analyse musicale), reposant sur les conclusions
des deux premiers volets.

    Dans tous les cas, le/la candidat(e) devra être très intéressé(e) par le domaine de
la  musique,  être  capable  d'appréhender  des  applications  informatiques  simples
(musicales  et  cartographiques)  et  de  faire  coopérer  des  logiciels  de  musique  et  de
cartographie en parfaite interopérabilité. La pratique ou la connaissance approfondie
de la musique serait un plus.

 

4.  Modalités

   Les candidat(e)s doivent contacter rapidement Didier Josselin pour présenter leur
profil,  leur  motivation  et  la  façon  dont  ils pensent  appréhender  ce  sujet  de  thèse  :
didier.josselin@univ-avignon.fr (04 90 84 35 74 ou 06 07 40 69 38). Didier Josselin
est directeur de recherche en Géomatique à l'UMR ESPACE et chercheur associé au
LIA. Eitan Alman est directeur de recherche sur les Réseaux à l'INRIA/LIA.

   Les  candidatures  d'étudiant(e)s  diplômé(e)s  ayant  des  déficiences  physiques
(mal-voyance  ou handicap  physique)  seront étudiées  en priorité sur le projet et en
exclusivité  pour  l'appel  spécifique  à  bourses  de  thèse  du  Ministère  fléché
« handicap ». Elles doivent contacter Didier Josselin impérativement avant la fin du
mois de Mars.

   Les autres candidat(e)s peuvent postuler sur la bourse de thèse Agorantic après
avoir également contacté D. Josselin : http://ed537.univ-avignon.fr/.

   La thèse sera réalisée au sein de l'ED 537 Culture & Patrimoine en Géographie
(SHS).  Deux  laboratoires  de  l'Université  d'Avignon  seront  impliqués :  UMR
ESPACE et LIA. Les laboratoires IFSTTAR et IRCAM sont partenaires du projet de
recherche englobant (CartoMuse). Une cotutelle peut être envisagée selon les profils
et souhaits des candidat(e)s.


5. Bibliographie indicative

Alayrangues  Sylvie,  Daragon  Xavier,  Lachaud  Jacques-Olivier,  Lienhardt  Pascal  (2008).
   Equivalence  between  Closed  Connected  n-G-Maps  without  Multi-Incidence  and  n-
   Surfaces,  Journal of Mathematical Imaging and Vision, Volume  32 Issue  1, pp. 1 – 22,
   Kluwer

Adhitya1  Sara  and  Kuuskankare  Mika  (2012).  SUM:  from  Image-based  Sonification  to
   Computeraided Composition, 8 pages, CMMR, 2012, London

Andrienko N. & Andrienko G. (2006), Exploratory Analysis of Spatial and Temporal Data: A
   Systematic Approach, Springer.

Bertin  J.  (1975),    La  Graphique  et  le  traitement  graphique  de  l'information,  Paris,
   Flammarion.

Cartwright W., Peterson M.P., Gartner G. (Eds) (2007), Multimedia cartography, Springer.

Escobar  Francisco,  Cauvin  Colette,  Serradj  Aziz,  (2008),  Cartographie  thématique  en  5
   volumes  (volume  1.  Une  nouvelle  démarche.  la  cartographie,  discipline  scientifique  en
   évolution;  volume  2.  Des  transformations  incontournables.  une  permanence  :  la
   transformation  sémiotique;  volume  3.  Méthodes  quantitatives  et  transformations
   attributaires. de la description à la généralisation d'une variable attributaire z; volume 4.
   Des transformations renouvelées. transformations cartographiques de position; volume 5.
   Des  voies  nouvelles  à  explorer.  les  révolutions  technologiques  et  leurs  conséquences
   conceptuelles et pratiques). Hermès-Lavoisier, Traité IGAT Série Aspects fondamentaux
   de l'analyse spatiale


Josselin D. & Fabrikant S. (Eds)  (2003), N° spécial « cartographie animée et interactive »,
   vol. 13, n°1/2003, Revue Internationale de Géomatique, Hermès, Lavoisier, Paris.

Josselin  D.   (2011),  Spatial  analysis  using  sonorous  cartography.  Some  propositions,
   ICC'2011, Paris, 3-8 July 2011.

Josselin  D.  (2005),  Interactive  Geographical  Information  System  using  LISPSTAT  :
   prototypes and applications. Journal of Statistical Software. Volume 13, Issue 6

Mac  Eachren  A.  (1995),  How  Maps  Work:  Representation,  Visualization,  and  Design,
   Guilford Press, NY.

Murray Schafer R. (1969),  The New Soundscape, Don Mills, Toronto.

Schiewe  J.,   Kornfeld  A.-L.  (2009),  Framework  and  Potential  Implementations  of  Urban
   Sound  Cartography,  12th  AGILE  International  Conference  on  Geographic  Information
   Science 2009, 8 pages.

Tymoczko Dmitri (2012). The Generalized Tonnetz, Journal o f music theory, 56:1, pp. 1-52.

Top

6-17(2014-03-27) Post doc in Grenoble (F)

Speech Unit(e)s

The multisensory-motor unity of speech

Understanding how speech unites the sensory and motor streams,

And how speech units emerge from perceptuo-motor interactions

 

ERC Advanced Grant, Jean-Luc Schwartz, GIPSA-Lab, Grenoble, France

 

 

Proposal for a post-doc position beginning in September 2014: “The informational structure of audio-visuo-motor speech”

 

Context

The Speech Unit(e)s project is focused on the speech unification process associating the auditory, visual and motor streams in the human brain, in an interdisciplinary approach combining cognitive psychology, neurosciences, phonetics (both descriptive and developmental) and computational models. The framework is provided by the “Perception-for-Action-Control Theory (PACT)” developed by the PI (Schwartz et al., 2012).

PACT is a perceptuo-motor theory of speech communication connecting, in a principled way, perceptual shaping and motor procedural knowledge in speech multisensory processing. The communication unit in PACT is neither a sound nor a gesture but a perceptually shaped gesture, that is a perceptuo-motor unit. It is characterized by both articulatory coherence – provided by its gestural nature – and perceptual value – necessary for being functional. PACT considers two roles for the perceptuo-motor link in speech perception: online unification of the sensory and motor streams through audio-visuo-motor binding, and offline joint emergence of the perceptual and motor repertoires in speech development.



Objectives of the post-doc position

In a multisensory-motor context such as the present one, a major requirement concerns a better knowledge of the structure of the information. Speech scientists have acquired a very good knowledge of the structure of speech acoustics, capitalizing on large audio corpora and up-to-date statistical techniques (mostly based on sophisticated implementations of Hidden Markov Models, e.g. Schutz & Waibel, 2001, Yu & Deng, 2011). Data on speech rhythms, in relation with the syllabic structure, have been analyzed clearly in a number of works (Greenberg, 1999; Grant & Greenberg, 2003).

In spite of strong efforts in the field of audiovisual speech automatic recognition (Potamianos et al., 2003), characterization of the structure of audiovisual information is scarce. While an increasing number of papers on audiovisual speech perception presenting cognitive and neurophysiological data quote the “advance of the visual stream on the audio stream”, few papers provide quantitative evidence (see Chandrasekaran et al., 2009) and when they do, these are sometimes mistaken or oversimplified. Actually, the temporal relationship between visual and auditory information is far from constant from one situation to another (Troille et al., 2010). Concerning the perceptuo-motor link, the situation is even worse. Few systematic quantitative studies are available because of the difficulty to acquire articulatory data, and in these studies the focus is generally set on the so-called “inversion problem” (e.g. Ananthakrishnan & Engwall, 2011, Hueber et al., 2012) rather than a systematic characterization of the structure of the perceptuo-motor relationship. Finally, there is a strong lack of systematic investigations of the relationship between orofacial and bracchio-manual gestures in face-to-face communication.

We shall gather a large corpus of audio-visuo-articulatory-gestural speech. Articulatory data will be acquired through ultrasound, electromagnetic articulography (EMA) and video imaging. Labial configurations and estimates of jaw movements will be automatically extracted and processed thanks to a large range of video facial processing. We also additionally plan to record information about accompanying coverbal gestures by the hand and arm, thanks to an optotrack system enabling to track bracchio-manual gestures. A complete equipment for audio-video-ultrasound-EMA-optotrack acquisition and automatic processing, named ultraspeech-tools, is available in Grenoble (more information on www.ultraspeech.com). The corpus will consist in isolated syllables, chained syllables, simple sentences and read material. Elicitation paradigms will associate reading tasks (with material presented on a computer screen) and dialogic situations between the subject and an experimenter to evoke coverbal gestures as ecologically as possible.

The corpus will be analyzed, by extensive use and possibly development of original techniques, based on data–driven statistical models and machine learning algorithms, in the search for three major types of characteristics:

  1. Quantification of auditory, visual and motor rhythms, from kinematic data – e.g. acoustic envelope, lip height and/or width, variations in time of the principal components of the tongue deformations, analysis of the arm/hand/finger dynamics;

  2. Quantification of delays between sound, lips, tongue and hand in various kinds of configurations associated with coarticulation processes (e.g. vowel to vowel anticipatory and perseverative phenomena, consonant-vowel coproduction, vocal tract preparatory movement in silence) and oral/gestural coordination;

  3. Quantification of the amount of predictability between lips, tongue, hand and sound, through various techniques allowing the quantitative estimate of joint information (e.g. mutual information, entropy, co-inertia) and perform statistical inference between modalities (e.g. Graphical models such as dynamic Bayesian Framework, multi-stream HMM, etc.)

The work will be performed within a multidisciplinary group in GIPSA-Lab Grenoble, associating specialists in speech and gesture communication, cognitive processes, signal processing and machine learning (partners of the project: Jean-Luc Schwartz, Marion Dohen from the “Speech, Brain, Multimodality Development” team; Thomas Hueber, Laurent Girin from the “Talking Machines and Face to Face Interaction” team; Pierre-Olivier Amblard and Olivier Michel from the “Information in Complex Systems” team).



Practical information

The post-doc position is open for a two-year period, with a possible third-year prolongation. The position is open from September 2014, or slightly later if necessary.

Candidates should have a background in speech and signal processing, face-to-face communication, and machine learning, or at least two of these three domains.

Candidates should send as soon as possible a short email to Jean-Luc Schwartz (Jean-Luc.Schwartz@gipsa-lab.grenoble-inp.fr) to declare their intention to submit a full proposal.

Then they must send a full application file in the next weeks. This application file will include an extended CV and a list of publications, together with a letter explaining why they are interested in the project, what their specific interests could be, possibly suggesting other experiments related to the general question of the informational structure of audio-visuo-motor speech, and also how this position would fit into their future plans for the development of their own career. They should also provide two names (with email addresses) for recommendations about their applications. Preselected candidates will be interviewed.

 

Final selection should occur before mid-June.



Top

6-18(2014-03-30) Another Post-doc position in Grenoble (F)

Speech Unit(e)s

The multisensory-motor unity of speech

Understanding how speech unites the sensory and motor streams,

And how speech units emerge from perceptuo-motor interactions

 

ERC Advanced Grant, Jean-Luc Schwartz, GIPSA-Lab, Grenoble, France

 

 

Proposal for a post-doc position beginning in September 2014: “Joint development of perception, action and phonology”

 

Context

The Speech Unit(e)s project is focused on the speech unification process associating the auditory, visual and motor streams in the human brain, in an interdisciplinary approach combining cognitive psychology, neurosciences, phonetics (both descriptive and developmental) and computational models. The framework is provided by the “Perception-for-Action-Control Theory (PACT)” developed by the PI (Schwartz et al., 2012).

PACT is a perceptuo-motor theory of speech communication, which connects in a principled way perceptual shaping and motor procedural knowledge in speech multisensory processing. The communication unit in PACT is neither a sound nor a gesture but a perceptually shaped gesture, that is a perceptuo-motor unit. It is characterized by both articulatory coherence – provided by its gestural nature – and perceptual value – necessary for being functional. PACT considers two roles for the perceptuo-motor link in speech perception: online unification of the sensory and motor streams through audio-visuo-motor binding, and offline joint emergence of the perceptual and motor repertoires in speech development.



Objectives of the post-doc position

The general objective here is to gather phonetic data on the joint development of perception, action and phonology, to assess how perceptuo-motor speech units emerge and evolve in the course of acquisition, reacquisition, evolution or learning of a given phonological system.

Among the considerable amount of studies about the development of speech production and speech perception in the first years of life, very few consider the link between perception and action. Apart from historical disciplinary reasons – perceptual development is mainly studied by psycholinguists and production development by phoneticians – there is also a theoretical cause to this puzzling separation. Since it is now well known that phonetic perceptual abilities in infants are well in advance of production skills, it has become implicitly obvious for most researchers that perception was largely independent from production. The advance of perceptual on production skills has actually been considered as a major weakness of motor theories, and generally a decisive argument in favor of auditory theories of speech perception.

The view in PACT is different. While it is known that the construction of perceptual prototypes begins earlier than the development of motor prototypes, we assume in PACT that the later development of speech production intervenes in modifying the perceptual repertoire. This will be studied in this post-doc position through two series of experiments, aiming at illustrating two possible mechanisms for motor-driven evolution of perceptual phonetic categories.

  1. Reconfiguration of perceptual categories: coupled auditory and motor idiosyncrasies – The development of motor prototypes during acquisition should induce reconfiguration of perceptual categories. Therefore, motor idiosyncrasies – specific choices of individual ways to produce a target inside the space defined by native phonology – should result in perceptual idiosyncrasies: if a subject chooses to produce a specific phonological contrast in her language in a specific way, this should be mirrored in the way the subject perceives the same contrast. This hypothesis will be studied in various phonetic dimensions.

  2. Creation of new categories by perceptuo-motor coupling – According to PACT, the acquisition of new motor knowledge could enable the infant to create a new category. A typical example concerns plosive place of articulation, for which acoustic characterisation remains controversial (e.g. Stevens & Blumstein, 1981; Sussman et al., 1998), while articulatory characterization is straightforward. The relationship between the emergence of such categories in perception and in production is seldom considered. We assume that once an infant knows how to produce syllables with different places of articulation, she can realise that they are associated with specific articulatory gestures and hence form a new perceptual category related to articulatory properties. We will test this assumption in infants before and after the onset of canonical babbling (3 to 12 months), to possibly relate the emergence of perceptual categories with the onset of articulatory abilities. The same kind of studies will be possibly undertaken on pre-lingual children after cochlear implantation.

The work will be realized within a speech team in GIPSA-Lab Grenoble (partners in the project: Jean-Luc Schwartz, Anne Vilain), in close collaboration with a developmental and cognitive psychology team in LPNC Grenoble (partners in the project: Hélène Loevenbruck, Olivier Pascalis, David Meary).



Practical information

The post-doc position is open for a two-year period, with a possible third-year prolongation. The position is open from September 2014, or slightly later if necessary.

Candidates should have an experience in phonetics, cognitive and neurocognitive psychology, and developmental psychology, or at least two of these three domains.

Candidates should send as soon as possible a short email to Jean-Luc Schwartz (Jean-Luc.Schwartz@gipsa-lab.grenoble-inp.fr) to declare their intention to submit a full proposal.

Then they must send a full application file in the next weeks. This application file will include an extended CV and a list of publications, together with a letter explaining why they are interested in the project, what could be their specific interests, possibly proposing other experiments related to the general question of the joint development of perception, action and phonology; and also how this position would fit into their future plans for the development of their own career. They should also provide two names (with email addresses) for recommendations about their applications. Preselected candidates will be interviewed.

 

Final selection should occur before mid-June.

Top

6-19(2014-03-31) POSTDOCTORAL FELLOWSHIP in speech communication with robots for people with Alzheimer’s disease

--== POSTDOCTORAL FELLOWSHIP in speech communication with robots for people with Alzheimer’s disease ==--

 

Employer: Toronto Rehabilitation Institute and the University of Toronto

Title: PostDoc

Specialty: Machine learning, natural language processing, human-computer interaction

Location: Toronto Ontario Canada

Deadline: Until filled

Date Posted: 31 March, 2014

 

We are seeking a skilled postdoctoral fellow (PDF) whose expertise intersects automatic speech recognition (ASR) and human-computer interaction (HCI). The PDF will work with a team of internationally recognized researchers on software for two-way speech-based dialogue between individuals with Alzheimer’s disease (AD) and robot ‘caregivers’. This software will automatically adapt the vocabularies, language models, and acoustic models of the component ASR to data collected from individuals with AD. The type of speech produced by the robot in response to human activity is vital, and several statistical models of dialogue will be pursued, including partially-observable Markov decision processes.

 

Work will involve software development, data analysis, dissemination of results (e.g., papers and conferences), and partial supervision of graduate and undergraduate students. Some data collection may be involved. Although primarily a technological intervention, this work is highly multidisciplinary, with a strong connection to the field of speech-language pathology and clinical practice.

 

The successful applicant will have:

1)      A doctoral degree in a relevant field of computer science, electrical engineering, biomedical engineering, or a relevant discipline;

2)      Evidence of impact in research through a strong publication record in relevant venues;

3)      Evidence of strong collaborative skills, including possible supervision of junior researchers, students, or equivalent industrial experience;

4)      Excellent interpersonal, written, and oral communication skills;

5)      A strong technical background in machine learning, natural language processing, and human-computer interaction. Experience with clinical populations, especially those with dementia or Alzheimer’s disease, is preferred.

This work will be conducted at the Toronto Rehabilitation Institute and at the University of Toronto. Toronto Rehab has a diverse workforce and is an equal opportunity employer. Work can commence as soon as June 2014. The initial contract is for 1 year although extension is possible; the project itself will last 3 years.

 

Please contact Dr. Frank Rudzicz by email at frank@cs.toronto.edu with any questions or with 1) your up-to-date CV, 2) a cover letter, 3) a short 1-page statement of purpose if interested in applying to the position.

 

Top

6-20(2014-04-01) Ingénieur en Maths appliquées Kware, Aix en Provence (F)
Kware est une jeune entreprise innovante, spécialisée dans l'étude et l'extraction d'informations marketing. Dans le cadre de la finalisation de notre produit, nous recrutons un jeune docteur en informatique.

Poste : CDI ingénieur en recherche appliquée (H/F)

Diplôme : docteur en informatique (section 27)

Domaine : recherche d'informations, extraction d'informations, apprentissage automatique

Expérience : expérience industrielle souhaitée (thèse cifre), mais non requise, bonne expérience scientifique (publications nationales et internationales).

 

Missions :

L'ingénieur sera intégré au sein d'une équipe de 10 personnes.

 

Dans ce contexte, l’ingénieur se verra attribuer les missions suivantes :
Industrialisation des algorithmes,
Tests et évaluation,

Participation à l'intégration,

Profil : De formation Bac + 8 (titulaire d’un doctorat en informatique), spécialisé en traitement du langage, recherche d'informations ou extraction d'informations.

Vous êtes motivés par la recherche appliquée, l'industrialisation des algorithmes et la réalisation d'un produit industriel innovant.

Vous disposez dans l’idéal de connaissances complémentaires en méthodes d’apprentissage automatique.

Vos connaissances en programmation (JAVA) ainsi que vos qualités rédactionnelles seront vos atouts pour réussir dans vos missions.

Lieu : Aix en Provence

 

Candidature : adresser votre candidature par mail avec votre CV en
pièce jointe à frederic.duvert@kware.fr copie à michel.benoit@kware.fr avec pour sujet (impératif) :
[poste doc Kware] votre nom

Top

6-21(2014-04-08) New call for applicant/ research fellow, University of Macedonia, Greece

New call for applicant/ research fellow

The research group on hearing impairment and cochlear implants is located at the Dept. of

Educational and Social Policy of the University of Macedonia which works in collaboration

with the AHEPA Cochlear Implant Center as well as other local educational, speechpathology

and parental organizations in Greece. It focuses on assessing and optimizing

communication skills and also fostering educational success for children with hearing loss

and/or cochlear implants. Researchers have access to audiological and speech state-of-the

art equipment (nasometer, palatometer, OAE systems, etc) and work in a scientifically

focused and welcoming multidisciplinary environment. The doctoral students are enrolled in

the Graduate Program of the Dept. of Educational and Social Policy. A vacancy is available

for an:

Early Stage Researcher (PhD)

to investigate “Speech processing cues in children with hearing loss (CI)”

This position has been opened in the framework of ‘iCARE’, a European research and training

network (Marie Curie ITN) on ‘improving Children’s Auditory REhabilitation’. iCARE is an

international and interdisciplinary consortium from academia, industry and socio-economic

agencies. The position is available for 3 years, starting in June 2014 and aims in the

completion of a Ph.D. within 4 years. Limited funding may be available in the 4th year.

The aim of the ESR is to investigate the phono-prosodic cues that underlie the processing of

spoken words by profoundly hearing impaired children with cochlear implants (CI).

Objectives:

Assess the role of speech signal cues on lexical processing by CI children

Investigate the acquisition of phonological grammar by CI children in two languages,

Greek and another language.

Develop hierarchies of phono-prosodic cues that facilitate lexical processing

Develop e-learning/remote rehabilitation tools for clinical practice and/or parent

guidance.

Tasks and methodology:

Psycholinguistic experiments that involve presentation of controlled speech stimuli,

computerized, forced-choice tasks, reaction times and elicited productions.

Obtaining accuracy ratings and performing acoustic analysis of speech via spectrography.

Conducting speech and language assessments in young children with CI, working with

them in sound-attenuated rooms, making audio recordings.

Programming and carrying out computerized tasks.

Your profile:

A university Master degree (or equivalent) in Communication Disorders, Hearing

Impairment, Speech Science or Hearing Science

Scientific background in the study of speech, acoustics and phonology

A vested interest in hearing-impaired children and multi-disciplinary international team

work, demonstrated by willingness to move to Salonika, Greece.

Familiarity with research tools, such as PRAAT, E-Prime or other computerized software

for designing psycholinguistic tasks.

Technical competence for data analysis (Excell, SPSS, Statistica, etc.).

Excellent command of English language, both in academic writing and in verbal

communication. Knowledge of Greek is recommended.

Research experience in any of the above fields and clinical or other experience in working

with children is desirable.

How to apply? Please send your CV, full contact information for two references and letter of

intent to Prof. Areti Okalidou (okalidou@uom.gr). Make sure that you obtain a confirmation

response upon their receipt. Other documents such as English language proficiency and

recommendations may be requested to be available later on.

Mailing address:

Areti Okalidou,

Associate Professor,

Dept. of Educational and Social Policy,

University of Macedonia,

Egnatia 156, P.O. Box 1591,

Salonika 540 06

Greece.

Tel. : +30-2310-891358.

 

 

 

 

General Information:

Improving Children’s Auditory Rehabilitation (iCARE)

FP7‐Marie Curie Initial Training Network

Objectives

The objectives of improving Children’s Auditory REhabilitation (iCARE) are

1) to provide training to create a new generation of researchers capable of exploiting the

synergies between different disciplines to optimize spoken communication in children with

hearing impairment, and

2) to combine research across disciplines to develop novel methods, training skills and procedures

for improving auditory rehabilitation.

iCARE is an international and interdisciplinary consortium from academia, industry and

socio-economic agencies and offers a choice of 11 PhD and 3 postdoc positions, starting June or

July 2014. Each project is supervised by a multidisciplinary team of experts and will benefit from

extensive training. Please contact 1 or more partners for project specific educational

prerequisites. In December 2013 a website will be available with more details and a procedure for

applying.

Partners and topic

KU Leuven (Leuven, Belgium): Prof. Dr. Astrid van Wieringen

(astrid.vanwieringen@med.kuleuven.be) Temporal processing in children with unilateral

HI.

KU Leuven (Leuven, Belgium): Prof. dr .Wim Van Petegem

(wim.vanpetegem@kuleuven.be) Factors influencing e-learning.

RWTH (Aachen, Germany): Prof. Dr.-Ing. Janina Fels (Janina.Fels@akustik.rwth-aachen.de)

Acoustic Virtual Reality for HI and Development of ‘realistic’ test procedures for children

with HI.

LiU (Linköping, Sweden:) Prof. dr. Björn Lyxell (bjorn.lyxell@liu.se) Higher-order

(auditory-cognitive) remediation.

RUN (Nijmegen, the Netherlands): Prof. dr. Ad Snik (A.Snik@kno.umcn.nl) Optimizing

auditory scene analysis for the hearing impaired.

UCL (London, UK): dr. Lorna Halliday ( l.halliday@ucl.ac.uk) Auditory processing in

children with HI.

UOM (Thessaloniki, Greece): Prof. dr. Areti Okalidou (okalidou@uom.gr) Speech

processing cues in children with HI.

GAVLE (Gävle, Sweden): Prof. dr. Staffan Hygge (Staffan.Hygge@hig.se) Learning in

different acoustic scenes.

COCHLEAR UK (Mechelen, Belgium*): dr. Filiep Vanpoucke (fvanpoucke@cochlear.com)

Investigating listening situations by means of scene classifiers. Music remediation.

NOLDUS (Wageningen, the Netherlands): dr. Nico van der Aa (n.vanderaa@noldus.nl)

Development of a new system to determine quality of communication.

Eligibility

Marie Curie funding is intended to promote mobility of early career researchers within the

research community. Candidates must:

a) have received a degree (Bachelor or Master's) that qualifies them for PhD training,

b) should not have undertaken more than 4 years of fulltime research subsequent to that degree,

and

c) should not have been resident within the ‘country of interest’ (see individual projects) for more

than 12 months within the 3 years prior to 1 June 2014.

For a full description of the eligibility conditions see:

http://ec.europa.eu/research/mariecurieactions/

An excellent 1st degree, good verbal and written communication skills in English, and an interest

in multidisciplinary research are essential.

At this stage applicants can express their interest and/or ask for additional information by

contacting the individual partner(s). Mobility expenses are provided in addition to a salary.

Prof dr Astrid van Wieringen ExpORL,

Dept Neurosciences Herestraat 49, 3000 Leuven, Belgium

astrid.vanwieringen@med.kuleuven.be

tel 00 32 16 330478

 

Top

6-22(2014-04-10) PhD thesis: Smart dialogue based on human-human conversation, Orange, Issy-les-Moulineaux, F

PhD thesis: Smart dialogue based on human-human conversation

 

Supervisor: Laroche Romain (romain.laroche@orange.com)

Company: Orange Labs (www.orange.com/en/innovation)

Place: Issy-les-Moulineaux (Paris, France)

To start around October 2014

 

  • Context

Automatic Speech Recognition is a module that transcribes speech to text. Its robustness, accurateness, and universality have greatly improved these last years, so that it is now possible to use it with a generic language model, even in a noisy environment. This technologic advance pushes numerous innovations, including the one tackled by this PhD thesis: active listening to a human-human conversation to extract meaning and deliver a contextual service related to this meaning.

 

Internally to Orange Labs, we have already worked a service of this kind in the contextualisation service that is integrated in the work environment of call user agents. This environment embeds several professional services: chat, e-mail, call management, appointment scheduling, location, Customer Relationship Management, scripting, statistics, etc. The contextualisation tool consists in guiding the agent by launching semi-automatically or fully-automatically applications according to conversational patterns detected during communication.

 

As well, a personal assistant named MindMeld (http://www.expectlabs.com/mindmeld/) has been commercialised. It listens to phone calls and proposes in real time contents related to approached subjects.

 

The PhD thesis subject will focus on the dialogical part, namely from one side, not only recognize the keywords or even analyze sentences, but placing analysis in a human-human dialogue model ; and from the other side, not only display an information or propose an action, but start a dialogue with the user. Both fields have never been investigated at this time to our knowledge.

 

The scientific domain of the PhD thesis will be split into semantic analysis of human-human dialogue and human-machine dialogue systems. The latter include interactive voice response systems, but also chatbots or interactions that some video games propose between the player and some characters. Spoken dialogue systems are expanding and deal commercially several billions calls a year.

 

  • NADIA team

We are a small unit of 8 permanents, 3 PhD students, and a post doctorate researcher. The team is in charge of Orange dialogue technology: Disserto suite. Disserto has been used to develop dozens of interactive voice response systems, which receive more than 100M calls a year. Some Disserto-based multimodal prototypes have been released in the research side as well. In addition to its strong industrial implication, NADIA is also involved in academic research with 3 conference articles published in 2013 and already 4 accepted in 2014.

 

  • Scientific objectives

PhD thesis scientific thematic revolves around three main disjointed subjects:

  • How to model semantic analysis of a human-human conversation and extract a meaning further used in an application?

  • How to initiate, lead and conclude a dialogue with a user that is already committed in a conversation with someone else?

  • How to make this kind of listening and dialoguing application easily extensible, and even customizable?

 

We will ask to the PhD student to endorse the prototype development of several demonstrators and to organise experimentation campaigns to evaluate the results of her/his work.

 

On the other hand, the following subjects are beyond the scope of the PhD: automatic speech recognition (we will use a plug and play product) and multimodality theoretic studies. Both are very large fields of research that must be studied separately.

 

  • specialities,

  • Spoken Dialogue Systems

  • Human-human dialogue theory

  • Natural Language Processing

  • Machine Learning

 

  • Required educational level,

  • Research master of science or « école d’ingénieur »

  • At least one of the cited specialities recorded inside the university curriculum

 

  • Required experience,

At least an internship in one of the cited specialities, or more largely in Artificial Intelligence

 

 

  • Attractiveness of the position

This PhD thesis offers the opportunity to contribute to the future of dialogue applications, and to participate to putting them to everyone’s reach in everyday life. Those two domains are currently fast-growing and their emergence seems ineluctable in the ten years to come. According to Gartner, personal assistants are among the top10 technological trends for 2014, inside the “smart machines” thematic.

 

The PhD student is integrated in a small team comprising three other PhD students and a post-doctorate researcher. All team works in the dialogue system field. The other Orange Labs teams gather every kind of artificial intelligence skills and more largely in computer science. It is the perfect environment to lead significant research and get quick and high level advice from experts.

 

Orange warrants that the PhD student will work in the best conditions by making several commitments:

  • Provision of a state-of-the-art industrial software: our historical tool, namely Disserto, will be used by the PhD student. It will enable her/him to tackle the PhD issues without spending to much effort on peripheral development.

  • Provision of our previous research results: research and market watch, theoretical studies and code will be made available. Orange Labs gather 3700 researchers. Whatever the research study is, it is likely that somebody has prepared the ground.

  • Provision of necessary equipment: to study dialogue in home environment, we have already invested on equipment to build a showroom.

  • Experimentation programs: we are used at Orange Labs to launch user test campaigns on our internal development. Each campaign is specific, but to give an idea, my first PhD student launched three experimentations with 500 unique testers (and almost 2000 dialogues)

Provision of data: Orange has a big experience in spoken dialogue systems and has a large amount of data available (Disserto-based systems receive more than 150M calls a year).

 

 

 

 

Top

6-23(2014-04-08) PhD analyse video/texte à Aix-Marseille (F)

Dans le cadre d'un co-financement entre la DGA et l'Université d'Aix Marseille, nous recherchons des candidats pour un contrat doctoral de 3 ans, débutant en septembre 2014, portant sur l'analyse automatique de documents vidéo, et plus particulièrement sur des approches multimodales de traitement de l'information visuelle, audio et texte.

Nous recherchons des candidats titulaire d'un Master Informatique (ou équivalent), ou en cours d'obtention d'un tel Master (et qui sont actuellement en stage).

La date limite de candidature est fixée au 25 avril 2014. Le dossier de candidature est composé d'un CV détaillé, la copie du dernier diplôme ou de l'attestation de réussite avec un relevé de notes et tous les éléments que vous jugerez utiles de joindre pour conforter votre dossier de candidature.

Le titre de la proposition de thèse est : 'Compréhension Multimodale – vers des traitements joints audio/image pour la compréhension multimodale de documents vidéo'
Mots-clés : traitement automatique de la parole, traitement automatique de la langue, traitement d'image, apprentissage automatique, recherche d'information.
Une description du sujet est disponible à l'adresse suivante : http://pageperso.lif.univ-mrs.fr/~frederic.bechet/prop_these_court_DGA2014_LIF_TALEP.pdf

Pour toute demande de renseignement : frederic.bechet@univ-amu.fr

Top

6-24(2014-04-13) Research Assistant/Associate in Statistical Spoken Dialogue Systems (Fixed term)

Research Assistant/Associate in Statistical Spoken Dialogue

Systems (Fixed Term)

Applications are invited for a research position in statistical spoken dialogue systems in the

Dialogue Systems Group at the Cambridge University Engineering Department. The position

is sponsored by Toshiba Cambridge Research Laboratory.

The main focus of the work will be on the development of techniques and algorithms for

implementing robust statistical dialogue systems which can support conversations ranging

over very wide, potentially open, domains. The work will extend existing techniques for

belief tracking and decision making by distributing classifiers and policies over the

corresponding ontology.

The successful candidate will have good mathematical skills and be familiar with machine

learning. S/he will have the ability to design tests and experiments and to design and

develop new methods and algorithms to address research objectives and find solutions.

Preference will be given to candidates with specific understanding of Bayesian methods and

reinforcement learning, experience in spoken dialogue systems, and strong software

engineering skills. Candidates with good communication and writing skills and knowledge of

semantic web, OWL and ontologies will be at an advantage. Candidates should have, or will

shortly have, a PhD in an area related to speech technology. However, candidates with

comparable research experience are also encouraged to apply.

This is an exciting opportunity to join one of the leading groups in statistical speech and

language processing. Cambridge provides excellent research facilities and there are

extensive opportunities for collaboration, visits and attending conferences.

Salary Ranges: Research Assistant: £24,289 £27,318 Research Associate: £28,132 £36,661

Fixedterm: The funds for this post are available for 24 months in the first instance.

The post is based in Central Cambridge, Cambridge, UK.

Once an offer of employment has been accepted, the successful candidate will be required

to undergo a health assessment.

To apply online for this vacancy, please click on the 'Apply' button below. This will route you

to the University's Web Recruitment System, where you will need to register an account (if

you have not already) and log in before completing the online application form.

Please ensure that you upload your Curriculum Vitae (CV), a statement of research interests

and a covering letter in the Upload section of the online application. If you upload any

additional documents which have not been requested, we will not be able to consider these

as part of your application. Please submit your application by midnight on the closing date.

If you have any questions about this vacancy or the application process, please contact:

Elisabeth Barlow, email Elisabeth.Barlow@admin.cam.ac.uk. (Tel +44 01223 765692)

Please quote reference NM03182 on your application and in any correspondence about this

vacancy.

The University values diversity and is committed to equality of opportunity.

The University has a responsibility to ensure that all employees are eligible to live and work

in the UK

Apply online link:

<a href='http://hrsystems.admin.cam.ac.uk/recruitui/apply/NM03182' class='camplprimarycta'>

Apply online</a>

Top

6-25(2014-04-16) Post-doctoral position in Cognitive Neuroscience (MRG Group) , Barcelona, Spain

 

Multisensory Research Group

Center for Brain and Cognition

 

Post-doctoral position in Cognitive Neuroscience (MRG Group)

Applications are invited for a full-time post-doctoral research position in the MULTISENSORY RESEARCH GROUP, led by Salvador Soto-Faraco, at the Pompeu Fabra University (Barcelona).

The group addresses the behavioral and neural expression of multisensory integration processes and their underlying mechanisms in the brain. This research spans several perceptual domains (speech, temporal and spatial processing, in vision, audition and touch, and body representation) and research approaches (psychophysics, neuroimaging, and brain stimulation with TMS).

We seek a person with solid, demonstrable experience who is able to lead a research line complementary to the group’s activities. Involvement in some organizational and management aspects is expected. The candidate must have (1) a PhD degree at the moment of application or soon thereafter (Sept 2014), (2) a solid demonstrable experience with publications in the field of Cognitive or Computational Neuroscience, and (3) strong motivation for multisensory integration, perception and attention. Please, bear in mind that applications that do not meet these standards will not be considered. Candidates outside the EU are welcome to apply but they might have to clear working visa requirements on their own.

We offer onsite ERP/EEG, TMS, and psychophysical testing facilities, neuronavigation system, a wide range of visual auditory and somatosensory stimulation equipment, and the use of fMRI and MEG recording facilities externally. The position will be funded for up to three years.

Starting date: Sept 2014 or before.

Salary: Commensurate with experience (up to 47,300Eur/Year gross)

How to apply

Applications should include:

- a full CV

- a short cover letter including a statement of research interests

- the names of three scholars who are willing to serve as references for the applicant

Check out www.mrg.upf.edu for info on the group. For informal enquiries about the position and applications, please contact Salvador Soto-Faraco at applications.MRGLab@gmail.com.

Deadline May 7th, 2014

Please, mention that you are applying to the POSTDOCTORAL position in the email subject.

Top

6-26(2014-04-20) Offre de thèse en co-encadrement LIG (Grenoble) / DDL (Lyon)
Offre de thèse en co-encadrement LIG (Grenoble) / DDL (Lyon) - Démarrage Octobre 2014


Traitement automatique de la parole pour l'aide à la description de langues africaines.

 

Cette thèse financée sur le projet ANR ALFFA (African Languages in the Field, Speech Fundamentals and Automation), en co-encadrement entre un laboratoire d'informatique et un laboratoire de linguistique, consiste à proposer et évaluer l'apport des outils automatiques de traitement de la parole pour aider les linguistes de terrain dans leur travail de description des langues (enregistrement de corpus, analyse phonétique, etc.).

En plus du travail opérationnel sur la projet ALFFA (participation à la vie du projet, construction de systèmes de reconnaissance automatique de la parole pour diverses langues africaines), la partie exploratoire de cette thèse sera consacrée à la proposition d'outils et méthodes (de préférence sur supports mobiles) pour l'analyse de terrain assistée par la machine (segmentation de signaux de parole, analyses prosodiques, étiquetage auto. par alignement force, etc), et à leur évaluation sur des cas d'usage concrets (analyse à grand échelle de particularités phonologiques de langues en danger, etc.).

Des déplacements en Afrique de l'ouest sont à prévoir dans le cadre de la thèse.

 

Profil recherché:

 

-Indispensable: Master ou Ingénieur Info. ; expérience dans le domaine du traitement de la parole

-Ce qui serait un plus: intérêt pour les langues (phonétique , linguistique) ; expérience en développement sur applications mobiles

-Autres qualités : qualité de rédaction (pour publications dans des conférences telles que Interspeech, Labphon, etc.) et de communication, travail en équipe

 

Résumé du projet ALFFA :

Le nombre de langues parlées en Afrique varie de 1 000 à 2500, selon les estimations et les définitions. Les états monolingues n'existent pas vraiment sur ce continent car les langues traversent les frontières. Le nombre de langues varie de 2 ou 3, au Burundi et au Rwanda, à plus de 400 au Nigeria. Le multilinguisme est en effet omniprésent dans les sociétés subsahariennes d'Afrique. 

Aujourd'hui, les conditions sont très favorables au développement d'un marché pour le traitement de la parole pour les langues africaines. L'accès des populations aux TIC se fait principalement par mobile (et clavier) et la nécessité de services vocaux peut être mise en évidence dans tous les secteurs: des plus prioritaires (santé, alimentation), aux plus ludiques (jeux, réseaux sociaux).

Pour cela, surmonter la barrière de la langue est nécessaire et c'est ce que nous proposons dans ce projet où deux aspects principaux sont concernés: les aspects fondamentaux de l'analyse du langage parlé (description des langues, phonologie, dialectologie) et les technologies de la parole (reconnaissance et synthèse) pour les langues africaines. Le projet ALFFA est interdisciplinaire puisqu'il ne réunit pas seulement des experts en technologie (LIA, LIG, VOXYGEN), mais inclut aussi des linguistes sur le terrain et des phonéticiens (DDL). Dans le projet, les technologies développées seraient utilisées pour créer des micro services vocaux pour les téléphones mobiles en Afrique (par exemple, un service téléphonique pour consulter le prix des denrées alimentaires ou fournir des informations locales, etc.). 

 
Site Web du projet : http://http://alffa.imag.fr 
 
 
Top

6-27(2014-04-21) OFFRE DE THÈSE–ASLAN 2014-2017: Babillage et oralité alimentaire, Lyon, France

 

OFFRE DE THÈSE–ASLAN2014-2017

BABILLAGE ET ORALITE ALIMENTAIRE

La candidature à retourner par voie électronique à Sophie Kern (Sophie.Kern@univ-lyon2.fr) et Mélanie Canault (melanie.canault@univ-lyon1.fr).

Cadre de la thèse

Cette thèse, financée par le laboratoire d’excellence ASLAN (Advanced studies on language complexity). Le financement est de l’ordre de 1 350 €net par mois sur une durée de 3 ans.

 Responsables scientifiques :

Sophie KERN et Mélanie CANAULT

 Laboratoire de rattachement : laboratoire DDL Dynamique Du Langage (UMR5596 CNRS – Université Lumière Lyon 2, Lyon, France)

 Date de recrutement : septembre 2014 - octobre 2014

 Date limite de candidature : 31 mai 2014

 Documents demandés : un CV détaillé accompagné d’une lettre de motivation. Le CV devra clairement faire état du parcours universitaire et des compétences acquises par le candidat. Les relevés de notes de master 1 et 2 sont également exigés.

Profil du candidat

Cette proposition de thèse s’adresse principalement à des étudiants titulaires d’un Master en sciences du langage ou en sciences cognitives. Le candidat sélectionné devra être intéressé par l’acquisition du langage chez le très jeune enfant. Les candidatures étrangères sont recevables à condition que le candidat ait une excellente maîtrise du français, à l’oral comme à l’écrit.

Dans l’idéal, le candidat devra présenter des connaissances dans l’un ou plusieurs des domaines suivants :

- Psycholinguistique développementale.

- Phonétique acoustique utilisation de logiciels de traitement du signal : Praat®.

 

- Oralité alimentaire

Et avoir une expérience dans l’expérimentation avec les jeunes enfants.

Description du projet

Le développement oro-moteur au cours de la première année de vie est un processus extrêmement riche et complexe qui va conduire le jeune enfant sur le chemin du développement linguistique.

La période du babillage (6-12 mois) est souvent décrite comme une étape cruciale du processus d’acquisition du langage au cours de laquelle le potentiel articulatoire du bébé va considérablement progresser. Cette période est très facilement identifiée par les parents car elle correspond à l’émergence des premières syllabes. Ces dernières seraient le résultat de la superposition du mouvement vertical de la mandibule à la phonation (MacNeilage 1998).

Au stade du babillage, la mandibule est donc un articulateur dominant. Cela s’explique par les liens anatomiques, cérébraux et moteurs existants entre l’activité de parole et celle de nutrition (Luschei & Goldberg 1981, Lund et Enomoto 1988, Rizzolatti et al. 1996, Fogassi et Ferrari 2005), et c’est en partie pour ces raisons que les professionnels du langage établissent un lien étroit entre le développement de l’oralité alimentaire et celui du langage (Rééducation Orthophonique 2004).

La mandibule est ainsi directement impliquée dans le développement de l’oralité. Néanmoins, le contrôle moteur du bébé est immature et ses mouvements mandibulaires sont plus lents que ceux de l’adulte. En effet, la parole adulte s’établit sur un rythme s’élevant à 5-6Hz (Jürgens 1998, Lindblom 1983) alors que les productions précoces avoisinent les 2.5-3Hz (Dolata 2008). Le timing des mouvements mandibulaires doit donc se réorganiser au cours du développement. On émet l’hypothèse que des changements importants surviendraient au cours de la premières année, d’une part, parce que des études ont montré que les patrons cinématiques de la mandibule se rapprocheraient de ceux de l’adulte dès l’âge d’un an (Green et al 2000, 2002) et d’autre part, parce que des travaux préliminaires nous ont permis (Canault & Laboissière 2011, Fouache & Malcor 2013) de montrer, grâce à l’observation de la durée syllabique, qu’une accélération de l’oscillation mandibulaire s’amorçait entre l’âge de 8 mois et celui de 12 mois. Toutefois, les tendances dégagées doivent être confirmées.

L’enjeu parait important au vu du caractère prédictif du babillage. Des travaux ont en effet déjà fait ressortir que les productions du babillage et des premiers mots pouvaient rendre compte du potentiel articulatoire et communicatif ultérieur (Stoel-Gammon 1988, Stark et al. 1988, Oller et al. 1999, Levin 1999, Otapowicz et al. 2007, Nip et al. 2010), mais aucun ne s’appuie sur le paramètre temporel ni ne s’appuie sur les caractéristiques segmentales et structurelles des productions.

Objectifs

1. Définir des étapes charnières de l’oralité alimentaire au cours des deux premières années de vie.

 

2. Etablir le lien entre les étapes de l’oralité alimentaire et les étapes du babillage en termes de fréquences oscillatoires et de caractéristiques structurelles des énoncés (ex : fréquences des réduplication, contenu segmental….

3. Déterminer les liens entre les caractéristiques du babillage et le développement lexical.

Canault, M. & Laboissière, R. (2011). Le babillage et le développement des compétences articulatoires : indices temporels et moteurs. Faits de langues, 37, 173-188.

Dolata J.K., Davis, B.L. & MacNeilage, P.F. (2008). Characteristics of the rhythmic organization of vocal babbling: implications for an amodal linguistic rhythm. Infant behavior & development, 31 (3), 422-431.

Fogassi L. & Ferrari P.F. (2005). Mirror neurons, gestures and language evolution. Interaction Studies, 5 (3), 345-363.

Fouache, M. & Malcor M. (2013). Evolution de la fréquence d’oscillation mandibulaire du babillage canonique aux premiers mots. Mémoire d’orthophonie, Université Lyon1.

Green, J.R., Moore, C.A., Higashikawa, M. & Steeve, R.W. (2000). The physiologic development of speech motor control: lip and jaw coordination. Journal of Speech, Language, and Hearing Research, 43, 239-255.

Green, J.R., Moore, C.A. & Reilly, K.J. (2002). The sequential development of jaw and lip control for speech. Journal of Speech, Language, and Hearing Research, 45, 66-79.

Jürgens U. (1998). Speech evolved from vocalization, not mastication. Commentaire à MacNeilage P.F.(1998). The Frame/Content theory of evolution of speech production. Behavioral and Brain Sciences, 21, 519-520.

Kern, S. & Gayraud, F (2010). IFDC. Les éditions la cigale, Grenoble.

Levin K. (1999). Babbling in infants with cerebral palsy. Clinical Linguistics & Phonetics, 13 (4), 249-267.

Lindblom B. (1983). Economy of speech gestures. In The Production of Speech. MacNeilage P.F. (Ed.).New York, Springer, 217-245.

Lund J.P. & Enomoto S. (1988).The generation of mastication by the central nervous system. In Neural control of rhythmic movement. Cohen A., Rossignol S. & Grillenr S. (Eds.). New York, Wiley,41-72.

Luschei E.S. & Goldberg L.J. (1981). Mastication and voluntary biting. In Handbook of physiology: the nervous system, vol.2. Brooks V.B. (Ed.). Bethesda, American Physiological Society, 1237-1274.

MacNeilage P.F. (1998).The Frame/Content theory of evolution of speech production. Behavioral and Brain Sciences, 21, 499-546.

Nip I.S.B., Green J.R. & Marx D.B. (2010). The co-emergence of cognition, language, and speech motor control in early development : A longitudinal correlation study. Journal of Communication Disorders, 44 (2), 149-160.

Oller D.K., Eilers R.E., Neal A.R. & Schwartz H.K. (1999). Precursors to speech in infancy: the prédiction of speech and language disorders. Journal of Communication Disorders, 32, 223-245.

Otapowicz D., Sobaniec W., Kutak W., Sendrowski K. (2007). Severity of dysarthric speech in children with infantile cerebral palsy in correlation with the brain CT and MRI. Advances in Medical Sciences, 52, 188-223.

Les troubles de l’oralité alimentaire, Rééducation Orthophonique, 220, 2004.

Rizzolatti G., Fadiga L., Gallese V. & Fogassi L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131-141.

Stark R.E., Ansel B.M. & BOND J. 1988. Are prelinguistics abilities predictive of learning disability? A follow-up study. In Preschool prevention of reading failure. Maslan R.L. & Masland M. (Eds.). Parkton, York Press.

Stoel-Gammon C. 1988. Prelinguistic vocalisations of hearing-impaired and normally hearing subjects: a comparison of consonantal inventories. Journal of Speech and Hearing Disorders, 53, 302-315.

Top

6-28(2014-04-30) Scientific Associate (Doctoral Student) at University of Hamburg, Germany

Human-Computer Interaction

Prof. Dr. Frank Steinicke

* The temporary deployment is based on § 2 Wissenschaftszeitvertragsgesetz.

!

Faculty: Mathematics, Computer Science and Natural Science

Department: Computer Science

Area: Human-Computer Interaction

The Human-Computer Interaction group at the University of Hamburg is seeking for talents!

We invite applications for the position of a

Scientific Associate (Doctoral Student)

to be filled for a period of three years as soon as possible.* Salary for the position is according to E13

TV-L (100% position, which corresponds to 39 hours of work per week).

The University of Hamburg seeks to increase the number of female researchers, and we particularly

invite women to apply. Women will be given preference if they are equally qualified. Part-time positions

are also possible.

Responsibilities:

The responsibilities include academic services, in particular teaching and research. According to § 28

Abs. 1 S. 3 Hamburgisches Hochschulgesetz (HmbHG), the candidate will get the opportunity for

scientific qualification, in particular to pursue a doctoral degree.

Job Description:

We are looking for highly motivated candidates, who are eager to get involved in cutting edge research

as well as creative teaching approaches in human-computer interaction. It is required that

candidates hold a (natural scientific) university degree (diploma or master), preferably in the fields

human-computer interaction or computer science (in media). Applicants should have solid background

in programming and interest in interdisciplinary teamwork.

Skills and knowledge are preferred in the fields of in-/output devices (e.g., stereoscopic 3D-displays,

tracking systems, multi-touch devices), good command of English, as well as interests in at least one

of the following research fields:

• Human-Computer Interaction,

• Virtual, Augmented or Mixed Reality,

• Interactive 3D Computer Graphics or Computer Vision,

• Interaction Design,

• Perceptual Psychology, and/or

• Cognitive Sciences.

Condition of Employment:

It is required that candidates hold a university degree in areas according to the job description.

Handicapped person will be given preference over nonpreferential persons if they are equally qualified.

Applications with the typical information (covering letter, CV, certificates) should be sent as pdf document

not later than June 1st, 2014 to Prof. Dr. Frank Steinicke, Human-Computer Interaction, Department

of Computer Science, Vogt-Köln-Str, 30, 22527 Hamburg, Germany.

For further information, please contact Prof. Dr. Frank Steinicke (steinicke@informatik.unihamburg.

de).

Top

6-29(2014-05-12) Proposal for an INRIA PhD fellowship (Cordi-S)

Proposal for an INRIA PhD fellowship (Cordi-S)

Title of the proposal:

Nonlinear speech analysis for differential diagnosis between

Parkinson's disease and Multiple-System Atrophy

Project Team INRIA: GeoStat (http://geostat.bordeaux.inria.fr/)

Supervisor: Khalid Daoudi (khalid.daoudi@inria.fr)

Scientific context:

Parkinson's disease (PD) is the most common neurodegenerative disorder after Alzheimer's disease.

Prevalence is 1.5% of the population over age 65 and affects about 143,000 French. Given the aging of

the population, the prevalence is likely to increase over the next decade.

Multiple-System Atrophy (MSA) is a rare and sporadic neurodegenerative adult disorder, of

progressive evolution and of unknown etiology. The MSA has a prevalence of 2 to 5/100 000 and has

no effective treatment. It usually starts in the 6th decade and there is a slight male predominance. It

takes 3 years on average from the first signs of the disease for a patient to require a walking aid, 4-6

years to be in a wheelchair and about 8 years to be bedridden.

The PD and MSA require different treatment and support. However, the differential diagnosis between

PD and MSA is a very difficult task because, at the early stage of the diseases, patients look alike as

long as signs, such as dysautonomia, are not more clearly installed for MSA patients. There is currently

no valid clinical nor biological marker for clear distinction between the two diseases at an early stage.

Goal:

Voice and speech disorders in Parkinson's disease is a clinical marker that coincides with a motor

disability and the onset of cognitive impairment. Terminology commonly used to describe these

disorders is dysarthria [1,2].

Like PD patients, depending on areas of the brain that are damaged, people with AMS may also have

speech disorders: difficulties of articulation, staccato rhythm, squeaky or muted voice. Dysarthria in

AMS is more severe and early in the sense that it requires more early rehabilitation compared to PD.

Since dysarthria is an early symptom of both diseases and of different origin, the purpose of this thesis

is to use dysarthria, through digital processing of voice recordings of patients as a mean for objective

discrimination between PD and MSA. The ultimate goal is to develop a numerical dysarthria measure,

based on the analysis of the speech signal of the patients, which allows objective discrimination

between PD and MSA and would thus complement the tools currently available to neurologists in the

differential diagnosis of the two diseases.

Project:

Pathological voices, such as in PD and MSA, generally present high non-linearity and turbulence [3].

Nonlinear/turbulent phenomena are not naturally suited to linear signal processing. The latter is

however ruling over current speech technology. Thus, from the methodological point of view, the goal

of this thesis is to investigate the framework of nonlinear and turbulent signals and systems, which is

better suited to analyzing the range of nonlinear and turbulent phenomena observed in pathological

voices in general, and in PD and MSA voices in particular. We will adopt an approach based on novel

nonlinear speech analysis algorithms recently developed in the GeoStat team [4] and which led, in

particular, to new and promising techniques for pathological voice analysis. The goal will be to extract

relevant speech features to design new dysarthria measures that enable accurate discrimination between

PD and MSA voices. This will also require investigation of machine learning theory in order to develop

robust classifiers (to discriminate between PD and MSA voices) and to make correspondence

(regression) between speech measures and standard clinical rates.

The PhD candidate will actively participate, in coordination with neurologists from the Parkinson's

Center of Pellegrin Hospital in Bordeaux, in the set up of the experimental protocol and data collection.

The latter will consist in recording patient's voices using a numerical recorder and the DIANA/EVA2

workstations (http://www.sqlab.fr/).

References:

[1] Pinto, S et al. Treatments for dysarthria in Parkinson's disease. The Lancet Neurology. Vol 3, Issue 9, 2004.

[2] Auzou, P.; Rolland, V.; Pinto, S., Ozsancak C. (eds.). Les dysarthries. Editions Solal. 2007.

[3] Tsanas A et al. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s

disease . IEEE Transactions on Biomedical Engineering, 2012; 59 (5):1264-1271.

[4] PhD thesis of Vahid Khanagha. GeoStat team, INRIA Bordeaux-Sud Ouest. January 2013.

http://geostat.bordeaux.inria.fr/images/vahid%20khanagha%204737.pdf

Duration: 3 years (starting fall 2014)

Net Salary: ~1700 / month (including health care insurance)

Prerequisites: Good level in signal/speech processing is necessary, as well as Matlab and C/C++

programing. Knowledge in machine learning would be a strong advantage.

Candidates should send a CV to khalid.daoudi@inria.fr and also apply via the Inria website:

http://www.inria.fr/en/institute/recruitment/offers/phd/campaign-2014/(view)/details.html?

id=PNGFK026203F3VBQB6G68LOE1&LOV5=4509&LG=EN&Resultsperpage=20&nPostingID=83

24&nPostingTargetID=14059&option=52&sort=DESC&nDepartmentID=28

Top

6-30(2014-05-12) Thèse à Orange Labs

Thèse à Orange Labs: analyse en locuteurs d'une collection de documents multi-média

 

Les personnes présentes dans les contenus multi-média constituent une méta-donnée clé pour la recherche et la navigation dans les contenus.

 

L’analyse en personne d’un document multi-média, sur sa composante audio, implique d’abord une étape de segmentation en locuteurs en tours de parole, puis de regroupement des tours de parole venant du même locuteur. Ensuite, une étape d’extraction de caractéristiques de ce locuteur (en rôle par exemple), et une étape d’identification de ce locuteur sont possibles. L’identification du locuteur peut être réalisée soit à l’aide de caractéristiques biométriques, impliquant l’existence préalable d’un modèle biométrique de la voix du locuteur, soit à l’aide d’un modèle d’inférence de l’identité, à partir d’informations permettant de nommer les locuteurs de façon non-ambigüe (par exemple en utilisant les contextes des noms détectés dans les caractères incrustés à l’écran ou dans la parole ou dans les sous-titres).

 

Alors que l’immense majorité des traitements d’analyse en personnes des contenus multi-média a été jusqu’à présent focalisée sur l’analyse des documents audio pris isolément, les études récentes en segmentation et regroupement en locuteurs abordent l’aspect « inter-contenu » (apparaissant dans la littérature sous les termes « cross-show speaker diarization », « speaker linking » ou « speaker attribution »), pour associer les tours de parole d’un même locuteur, à travers différents contenus. L’approche proposée dans cette thèse est d’approfondir cet aspect inter-contenu, en abordant l’analyse en locuteurs sous l’angle des collections, où la collection est définie comme un ensemble de documents audiovisuels présentant des caractéristiques communes (e.g. nom de l’émission, date de diffusion, thème, etc).

 

Cette approche par collections doit permettre d’une part d’améliorer robustesse et performances, et d’autre part d’offrir une représentation synthétique de la collection en termes de personnes ainsi que de nouveaux modes d’exploration de la collection, par l’analyse des relations entre les personnes présentes dans cette collection.

Par exemple, si la collection est constituée de plusieurs épisodes d’une même émission, l’objectif pourrait être d’inférer la structure de l’émission (présentateur, chroniqueur, invités) et d’identifier en particulier les invités. Si la collection concerne des documents relatifs à l’actualité sur une période temporelle courte, l’analyse en locuteurs de cette collection permettrait d’étudier un évènement à travers l’ensemble de ses acteurs, et pourrait compléter de façon pertinente les technologies de suivi d’actualité.

 

La thèse se déroulera dans les locaux d’Orange Labs, à Lannion, sous la forme d’un CDD de 36 mois, avec une rémunération motivante.

Elle s’adresse à un étudiant diplômé du 2ème cycle (master2 ou ingénieur), ayant des compétences en traitement automatique de la parole, et/ou fouilles de données et apprentissage automatique

Pour plus d’informations : http://orange.jobs/jobs/offer.do?joid=38569&lang=fr&wmode=light

Ou contacter directement : delphine.charlet@orange.com

 

Top

6-31(2014-05-15) Post-Doc Position 'Visual analysis and synthesis of affects', Univ.Bordeaux, F
Post-Doc Position 'Visual analysis and synthesis of affects'

Description : 
In the communication between men or between men and machines, the expression of mental states, emotional, feelings, intentions or attitudes of the speaker is a source of information that plays an important role in understanding the context of speech. The study of psychological and emotional states provides more evidence of the fundamental role of affects for communication understanding. Affects are expressed by different levels of audio and visual cognitive processes: some expressions cannot be voluntarily controlled (emotions) while others are intentional (attitudes and expressiveness language, choice of vocabulary and grammatical paraphrases). Affects depend on the culture which may lead to mis-interpretations in communication. They should therefore be thought of when learning foreign languages. Similarly, affects should be integrated in automatic translators. Affects are present both in the speech and the body language. We are here particularly interested in finding the v!
isual characteristics of the attitudinal expressions (seduction, irony, irritation, admiration etc.) in three cultures (US, Japan and France).
The postdoctoral fellow will first extract different spatio-temporal descriptors for gestural and facial expressions classification. Existing studies in cognitive sciences will be used to select the correct characteristics for each part of the face and body. If needed new descriptors will be designed. He will also have
to participate to validations through perception tests on panel of native speakers. In a second part, we are interested in generating attitudes by realizing morphing from a neutral expression to a given attitude. This second phase will also include improving our voice transformation technique.

Profile of applicant:
PhD in computer science, cognitive science or applied mathematics. A good experience with image processing
is required and good programming skills in Matlab and/or C/C++. Knowledge of speech processing techniques is definitely a plus.

Duration: 16 months
Job statuts: PostDoctoral researcher, full time
LOCATION : LaBRI, UMR 5800, Université de Bordeaux, Talence, France
DATE: As soon as possible

-details of announcement: http://cpu.labex.u-bordeaux.fr/Jobs/Post-doc_Visual-analysis-and-synthesis-of-affects,Job-140.html

Supervisors/Contact:
Aurélie Bugeau, LaBRI : aurelie.bugeau@labri.fr
Takaaki Shochi, CLLE-ERSSàB, LABRI :Takaaki.Shochi@u-bordeaux3.fr
Jean-Luc Rouas, LaBRI : jean-luc.rouas@labri.fr
Application:

A CV and a motivation letter have to be sent to Aurélie Bugeau, Takaaki Shochi and Jean-Luc Rouas.
Application deadline: July 2014

Top

6-32(2014-05-16) Offre de thèse en co-encadrement LIG (Grenoble) / DDL (Lyon)
Offre de thèse en co-encadrement LIG (Grenoble) / DDL (Lyon) - Démarrage Octobre 2014


Traitement automatique de la parole pour l'aide à la description de langues africaines.

 

Cette thèse financée sur le projet ANR ALFFA (African Languages in the Field, Speech Fundamentals and Automation), en co-encadrement entre un laboratoire d'informatique et un laboratoire de linguistique, consiste à proposer et évaluer l'apport des outils automatiques de traitement de la parole pour aider les linguistes de terrain dans leur travail de description des langues (enregistrement de corpus, analyse phonétique, etc.).

En plus du travail opérationnel sur la projet ALFFA (participation à la vie du projet, construction de systèmes de reconnaissance automatique de la parole pour diverses langues africaines), la partie exploratoire de cette thèse sera consacrée à la proposition d'outils et méthodes (de préférence sur supports mobiles) pour l'analyse de terrain assistée par la machine (segmentation de signaux de parole, analyses prosodiques, étiquetage auto. par alignement force, etc), et à leur évaluation sur des cas d'usage concrets (analyse à grand échelle de particularités phonologiques de langues en danger, etc.).

Des déplacements en Afrique de l'ouest sont à prévoir dans le cadre de la thèse.

 

Profil recherché:

 

-Indispensable: Master ou Ingénieur Info. ; expérience dans le domaine du traitement de la parole

-Ce qui serait un plus: intérêt pour les langues (phonétique , linguistique) ; expérience en développement sur applications mobiles

-Autres qualités : qualité de rédaction (pour publications dans des conférences telles que Interspeech, Labphon, etc.) et de communication, travail en équipe

 

Résumé du projet ALFFA :

Le nombre de langues parlées en Afrique varie de 1 000 à 2500, selon les estimations et les définitions. Les états monolingues n'existent pas vraiment sur ce continent car les langues traversent les frontières. Le nombre de langues varie de 2 ou 3, au Burundi et au Rwanda, à plus de 400 au Nigeria. Le multilinguisme est en effet omniprésent dans les sociétés subsahariennes d'Afrique. 

Aujourd'hui, les conditions sont très favorables au développement d'un marché pour le traitement de la parole pour les langues africaines. L'accès des populations aux TIC se fait principalement par mobile (et clavier) et la nécessité de services vocaux peut être mise en évidence dans tous les secteurs: des plus prioritaires (santé, alimentation), aux plus ludiques (jeux, réseaux sociaux).

Pour cela, surmonter la barrière de la langue est nécessaire et c'est ce que nous proposons dans ce projet où deux aspects principaux sont concernés: les aspects fondamentaux de l'analyse du langage parlé (description des langues, phonologie, dialectologie) et les technologies de la parole (reconnaissance et synthèse) pour les langues africaines. Le projet ALFFA est interdisciplinaire puisqu'il ne réunit pas seulement des experts en technologie (LIA, LIG, VOXYGEN), mais inclut aussi des linguistes sur le terrain et des phonéticiens (DDL). Dans le projet, les technologies développées seraient utilisées pour créer des micro services vocaux pour les téléphones mobiles en Afrique (par exemple, un service téléphonique pour consulter le prix des denrées alimentaires ou fournir des informations locales, etc.). 

 
Site Web du projet : http://http://alffa.imag.fr 
 
 
Top

6-33(2014-05-17) Research Assistant position, Laboratoire Parole et Langage, Aix-en-Provence, France
Call for application

A five-month Research Assistant position is open for application at the Laboratoire Parole et Langage, Aix-en-Provence, France. 
The successful candidate will participate in the SPIC (Speaking in Concert: Cerebral, articulatory and acoustic convergence 
between speakers in conversational interaction) research project, jointly conducted by Luciano Fadiga, Leonardo Badino and 
Alessandro D'Ausilio (Italian Institute of Technology, Genova, Italy) and Noël Nguyen, Simone Falk and
 Thierry Legou (Laboratoire Parole et Langage). The goal of this project is to better understand what makes speakers 
sound more like each other in a conversational interaction, by means of simultaneously recorded cerebral, articulatory and 
acoustic data. The position is funded by the Brain and Language Research Institute at Aix-Marseille University (blri.fr).

At the end of the contract, the possibility will be offered to the candidate to start a PhD with a joint affiliation to the IIT and 
the LPL, in the framework of the same project.

Qualifications: Master's degree in any of the following disciplines: Speech and Language Sciences, Speech and Language
 Technology, Neurosciences, Cognitive Sciences. A strong background in computer science and mathematics is necessary. 
Experience with EEG or articulatory measures in speech production will be appreciated.

Salary: 1760 euros / month.

Starting date: as early as June 2014.

How to apply: Applicants should send a cover letter, a CV and the names and contact information of two references to 
Noel Nguyen (noel.nguyen@lpl-aix.fr).

Links:
. Laboratoire Parole et Langage: http://www.lpl-aix.fr
. Robotics, Brain and Cognitive Sciences Department, Italian Institute of Technology: 
http://www.iit.it/en/research/departments/robotics-brain-and-cognitive-sciences.html
. Brain and Language Research Institute: http://blri.fr
Top

6-34(2014-05-19) THÈSE–ASLAN 2014-2017, Univ. Lyon 1-2, F

 

OFFRE DE THÈSE–ASLAN 2014-2017

BABILLAGE ET ORALITE ALIMENTAIRE

La candidature à retourner par voie électronique à Sophie Kern (Sophie.Kern@univ-lyon2.fr) et Mélanie Canault (melanie.canault@univ-lyon1.fr).

Cadre de la thèse

Cette thèse, financée par le laboratoire d’excellence ASLAN (Advanced studies on language complexity). Le financement est de l’ordre de 1 350 €net par mois sur une durée de 3 ans.

 Responsables scientifiques :

Sophie KERN et Mélanie CANAULT

 Laboratoire de rattachement : laboratoire DDL Dynamique Du Langage (UMR5596 CNRS – Université Lumière Lyon 2, Lyon, France)

 Date de recrutement : septembre 2014 - octobre 2014

 Date limite de candidature : 31 mai 2014

 Documents demandés : un CV détaillé accompagné d’une lettre de motivation. Le CV devra clairement faire état du parcours universitaire et des compétences acquises par le candidat. Les relevés de notes de master 1 et 2 sont également exigés.

Profil du candidat

Cette proposition de thèse s’adresse principalement à des étudiants titulaires d’un Master en sciences du langage ou en sciences cognitives. Le candidat sélectionné devra être intéressé par l’acquisition du langage chez le très jeune enfant. Les candidatures étrangères sont recevables à condition que le candidat ait une excellente maîtrise du français, à l’oral comme à l’écrit.

Dans l’idéal, le candidat devra présenter des connaissances dans l’un ou plusieurs des domaines suivants :

- Psycholinguistique développementale.

- Phonétique acoustique utilisation de logiciels de traitement du signal : Praat®.

 

- Oralité alimentaire

Et avoir une expérience dans l’expérimentation avec les jeunes enfants.

Description du projet

Le développement oro-moteur au cours de la première année de vie est un processus extrêmement riche et complexe qui va conduire le jeune enfant sur le chemin du développement linguistique.

La période du babillage (6-12 mois) est souvent décrite comme une étape cruciale du processus d’acquisition du langage au cours de laquelle le potentiel articulatoire du bébé va considérablement progresser. Cette période est très facilement identifiée par les parents car elle correspond à l’émergence des premières syllabes. Ces dernières seraient le résultat de la superposition du mouvement vertical de la mandibule à la phonation (MacNeilage 1998).

Au stade du babillage, la mandibule est donc un articulateur dominant. Cela s’explique par les liens anatomiques, cérébraux et moteurs existants entre l’activité de parole et celle de nutrition (Luschei & Goldberg 1981, Lund et Enomoto 1988, Rizzolatti et al. 1996, Fogassi et Ferrari 2005), et c’est en partie pour ces raisons que les professionnels du langage établissent un lien étroit entre le développement de l’oralité alimentaire et celui du langage (Rééducation Orthophonique 2004).

La mandibule est ainsi directement impliquée dans le développement de l’oralité. Néanmoins, le contrôle moteur du bébé est immature et ses mouvements mandibulaires sont plus lents que ceux de l’adulte. En effet, la parole adulte s’établit sur un rythme s’élevant à 5-6Hz (Jürgens 1998, Lindblom 1983) alors que les productions précoces avoisinent les 2.5-3Hz (Dolata 2008). Le timing des mouvements mandibulaires doit donc se réorganiser au cours du développement. On émet l’hypothèse que des changements importants surviendraient au cours de la premières année, d’une part, parce que des études ont montré que les patrons cinématiques de la mandibule se rapprocheraient de ceux de l’adulte dès l’âge d’un an (Green et al 2000, 2002) et d’autre part, parce que des travaux préliminaires nous ont permis (Canault & Laboissière 2011, Fouache & Malcor 2013) de montrer, grâce à l’observation de la durée syllabique, qu’une accélération de l’oscillation mandibulaire s’amorçait entre l’âge de 8 mois et celui de 12 mois. Toutefois, les tendances dégagées doivent être confirmées.

L’enjeu parait important au vu du caractère prédictif du babillage. Des travaux ont en effet déjà fait ressortir que les productions du babillage et des premiers mots pouvaient rendre compte du potentiel articulatoire et communicatif ultérieur (Stoel-Gammon 1988, Stark et al. 1988, Oller et al. 1999, Levin 1999, Otapowicz et al. 2007, Nip et al. 2010), mais aucun ne s’appuie sur le paramètre temporel ni ne s’appuie sur les caractéristiques segmentales et structurelles des productions.

Objectifs

1. Définir des étapes charnières de l’oralité alimentaire au cours des deux premières années de vie.

 

2. Etablir le lien entre les étapes de l’oralité alimentaire et les étapes du babillage en termes de fréquences oscillatoires et de caractéristiques structurelles des énoncés (ex : fréquences des réduplication, contenu segmental….

3. Déterminer les liens entre les caractéristiques du babillage et le développement lexical.

Canault, M. & Laboissière, R. (2011). Le babillage et le développement des compétences articulatoires : indices temporels et moteurs. Faits de langues, 37, 173-188.

Dolata J.K., Davis, B.L. & MacNeilage, P.F. (2008). Characteristics of the rhythmic organization of vocal babbling: implications for an amodal linguistic rhythm. Infant behavior & development, 31 (3), 422-431.

Fogassi L. & Ferrari P.F. (2005). Mirror neurons, gestures and language evolution. Interaction Studies, 5 (3), 345-363.

Fouache, M. & Malcor M. (2013). Evolution de la fréquence d’oscillation mandibulaire du babillage canonique aux premiers mots. Mémoire d’orthophonie, Université Lyon1.

Green, J.R., Moore, C.A., Higashikawa, M. & Steeve, R.W. (2000). The physiologic development of speech motor control: lip and jaw coordination. Journal of Speech, Language, and Hearing Research, 43, 239-255.

Green, J.R., Moore, C.A. & Reilly, K.J. (2002). The sequential development of jaw and lip control for speech. Journal of Speech, Language, and Hearing Research, 45, 66-79.

Jürgens U. (1998). Speech evolved from vocalization, not mastication. Commentaire à MacNeilage P.F.(1998). The Frame/Content theory of evolution of speech production. Behavioral and Brain Sciences, 21, 519-520.

Kern, S. & Gayraud, F (2010). IFDC. Les éditions la cigale, Grenoble.

Levin K. (1999). Babbling in infants with cerebral palsy. Clinical Linguistics & Phonetics, 13 (4), 249-267.

Lindblom B. (1983). Economy of speech gestures. In The Production of Speech. MacNeilage P.F. (Ed.).New York, Springer, 217-245.

Lund J.P. & Enomoto S. (1988).The generation of mastication by the central nervous system. In Neural control of rhythmic movement. Cohen A., Rossignol S. & Grillenr S. (Eds.). New York, Wiley,41-72.

Luschei E.S. & Goldberg L.J. (1981). Mastication and voluntary biting. In Handbook of physiology: the nervous system, vol.2. Brooks V.B. (Ed.). Bethesda, American Physiological Society, 1237-1274.

MacNeilage P.F. (1998).The Frame/Content theory of evolution of speech production. Behavioral and Brain Sciences, 21, 499-546.

Nip I.S.B., Green J.R. & Marx D.B. (2010). The co-emergence of cognition, language, and speech motor control in early development : A longitudinal correlation study. Journal of Communication Disorders, 44 (2), 149-160.

Oller D.K., Eilers R.E., Neal A.R. & Schwartz H.K. (1999). Precursors to speech in infancy: the prédiction of speech and language disorders. Journal of Communication Disorders, 32, 223-245.

Otapowicz D., Sobaniec W., Kutak W., Sendrowski K. (2007). Severity of dysarthric speech in children with infantile cerebral palsy in correlation with the brain CT and MRI. Advances in Medical Sciences, 52, 188-223.

Les troubles de l’oralité alimentaire, Rééducation Orthophonique, 220, 2004.

Rizzolatti G., Fadiga L., Gallese V. & Fogassi L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131-141.

Stark R.E., Ansel B.M. & BOND J. 1988. Are prelinguistics abilities predictive of learning disability? A follow-up study. In Preschool prevention of reading failure. Maslan R.L. & Masland M. (Eds.). Parkton, York Press.

Stoel-Gammon C. 1988. Prelinguistic vocalisations of hearing-impaired and normally hearing subjects: a comparison of consonantal inventories. Journal of Speech and Hearing Disorders, 53, 302-315.

Top

6-35(2014-05-26) Multiple postdoc positions in Speech Technology at the Department of Signal Processing and Acoustics, Aalto University, Finland'

Department of Signal Processing and Acoustics, Aalto University (formerly known as the Helsinki University of Technology), is looking for outstanding candidates for:

 

Postdoc position in Speech Recognition, Adaptation and Synthesis

 

The speech recognition group (led by Prof. Mikko Kurimo) at Aalto University works on machine learning and probabilistic modeling in speech recognition and model adaptation. The group belongs also to the national Center of Excellence in Computational Inference Research (by Prof. Oja, 2012-2017).

We are looking for a postdoc to join our research group to work on any of our research themes, for example:

  • large-vocabulary speech recognition and text-to-speech synthesis

  • acoustic and language model adaptation

  • speech recognition in noisy environments

  • large-vocabulary language modeling based on unsupervised morpheme models

  • speech recognition for indexing streaming audio and video

  • speech recognition for L2 pronunciation training

Postdoc: 1-2 years. Starting date: As soon as possible.

 

Send your application, CV and references directly by email to

Prof. Mikko Kurimo, mikko.kurimo at aalto.fi

 

 

Postdoc position in Computational Modeling of Language Acquisition

 

The speech technology group (led by Prof. Unto K. Laine) at Aalto University works on computational modeling of language acquisition, perception and production. The overall goal is to understand how spoken language skills can be acquired by humans or machines through communicative interaction with and without supervision. The research in our topic involves cross-disciplinary effort across fields such as machine learning, signal processing, speech processing, linguistics, and cognitive science. The research is funded by the Academy of Finland.

We are currently looking for a postdoc to join our research team to work on our research themes, including:

 

  • pattern discovery from speech

  • articulatory modeling and inversion

  • modeling and methods for autonomous acquisition of lexical, phonetic, and grammatical structure from speech input

  • multimodal statistical learning (associative learning between multiple input domains such as speech, articulation, and vision).

  • context-aware computational systems

 

Postdoc: 2-2.5 years. Starting date: August/September 2014.

 

Send your application, CV and references directly by email to

D.Sc. (Tech.) Okko Räsänen, okko.rasanen at aalto.fi

 

Postdoc position in Speech Production and Synthesis



The speech communication technology research group (led by Prof. Paavo Alku) at Aalto University works on interdisciplinary topics aiming at describing, explaining and reproducing communication by speech. The main topics of our research are: analysis and parameterization of speech production, statistical parametric speech synthesis, enhancement of speech quality and intelligibility in mobile phones, robust feature extraction in speech and speaker recognition, occupational voice care and brain functions in speech perception.

We are currently looking for a postdoc to join our research team to work on any of the team’s research themes, for example:



  • statistical speech synthesis: vocoding, adaptation etc.

  • mathematical inversion in speech production

  • robust feature extraction, e.g., in speaker verification

  • speech intelligibility

 

Postdoc: 1-2 years. Starting date: September/October 2014

 

Send your application, CV and references directly by email to

Prof. Paavo Alku, paavo.alku at aalto.fi

 

General information

 

All positions require a relevant doctoral degree in CS or EE, skills for doing excellent research in a group, and outstanding research experience in any of the research themes mentioned above. The candidate is expected to perform high-quality research and assist in supervising PhD students. Please send your application email with the subject line “Aalto post-doc recruitment, spring 2014”.

In Helsinki you will join the innovative international computational data analysis and ICT community. Among European cities, Helsinki is special in being clean, safe, liberal, Scandinavian, and close to nature, in short, having a high standard of living. English is spoken everywhere. See, e.g. www.visitfinland.com.

 

 

 

 

 

Top

6-36(2014-05-26) PhD Fellowships at FONDAZIONE Bruno Kessler, Trento, Italy

PhD Fellowships to start in 2014

The HLT unit of FBK will sponsor three project specific grants for doctoral students starting with the A.Y. 2014-2015. Doctorate fellowships will be formally pursued at the ICT International Doctorate School of the University of Trento. The program starts between September and November 2014, has a minimum length of three years and includes attendance of courses during the first two years, while the third is fully dedicated to research work.

The opening of the call is expected at the beginning of May 2014. Potential candidates are invited to contact us in advance for preliminary interviews. There is also the possibility to begin with an internships at our group during the Summer before the official PhD program starts.  The call closes on 16 June 2014. 

Topic: Machine Translation

Title: Human in the loop for advanced machine translation

Nowadays, human translation and machine translation are no longer antithetical opposites. Rather, the two worlds are getting closer and started to complement each other. On one side, the evolution of translation industry is witnessing a clear trend towards the adoption of Machine Translation (MT) as a primary support to professional translators. On the other side, the variety of data that can be collected from human feedback provides to MT research an unprecedented wealth of knowledge about the dynamics (practical and cognitive) of the translation process. The future is a symbiotic scenario where humans are assisted by reliable MT technology that, at the same time, continuously evolves by learning from translators activity. This grant aims to transform this vision into reality. The candidate will team up a world-class research effort developing new MT technology capable to integrate information obtained unobtrusively from real professional translation workflows. Relevant topics include: i) the extraction and generalization of knowledge (e.g. translation and correction strategies) from different types of human feedback, ii) projecting the acquired knowledge onto the core MT components, iii) modeling cognitive aspects of the translation process, iv) evaluating the effect of machine translation on human translation.

Contacts: Marco Turchi and Matteo Negri

 

 

Topic: Automatic Speech Recognition

Title:  Acoustic Modeling for Speech Recognition 

FBK has been pursuing research in automatic speech recognition (ASR) for two decades with the goal to develop state-of-the-art technology for interactive- and found-speech recognition, and to address applications ranging from speech analytics over the phone line to transcription of speech as found in any audio/visual document.  Languages on which we are working with include Italian, English, Spanish, German, Dutch, Arabic, Turkish, Russian, Portuguese and French.

Although FBK is interested in applicants in all areas of automatic transcription technology, most relevant topics for this Call are in the areas of acoustic modeling for large vocabulary ASR (which includes, for example, neural networks in ASR, building ASR systems for under-resourced languages, speaker adaptive training, methods for fast and efficient adaptation to changing application domains, data selection methods for acoustic model training), speaker diarization and spoken language detection.  The candidate will team up a world-class research effort developing new ASR  technology and advancing beyond the state-of-the-art, taking advantage from the large experience gained by FBK during the last 20 years.

Contacts: Diego Giuliani

 

 

Topic: Content Processing

Title: Building the Web of Data exploiting Natural Language Processing

Web of Data is about making information available on the Web accessible to machines and hence transforming how information can be found and manipulated. Of all recent initiatives oriented to create the Web of data, Wikidata is the most relevant. According to the promoters “the project aims to build a free knowledge base about the world that can be read and edited by humans and machines alike.” In this PhD the candidate is asked to investigate natural language processing and machine learning techniques that can be used to automatically contribute to Wikidata. Specifically, it will be investigated semi-supervised approaches that can bootstrap from the data already available in Wikidata and other resources such as DBpedia and Freebase. Furthermore, careful consideration will have to be given to develop approaches applicable to different languages. Finally, as Wikidata will be edited by both humans and machines, active learning could play a crucial role and open new research challenges due to the crowd-sourcing approach: will the automatic approach be able to interact with the other users during the discussion necessary to collect/approve/filter the data to publish?

Contacts: Claudio Giuliano

Top

6-37Revue TAL: numéro spécial numéro spécial sur le traitement automatique du langage parlé

 

Special issue on spoken language processing

Guest editors: Laurent Besacier, Wolfgang Minker

 

Speech is the most natural way to communicate and interact (with the machine or with another person) . Spoken language processing and dialogue have now many direct applications in various areas such as (but not limited to) information retrieval, natural language interaction with mobile devices, social robotics, assistive technologies, technologies for language learning, etc. . However, spoken language processing poses specific problems related to the nature of the speech material itself. Indeed, spontaneous speech utterances have to be processed and they contain many paralinguistic features. For instance, disfluencies (repetitions , false starts, etc.) reduces the syntactic regularity of utterances. Moreover, spontaneous utterances convey rich information related to emotions , etc. Furthermore, automatic speech recognition (ASR) step, often required before the application of higher level processing (understanding , translation, analysis, etc.), produces noisy outputs (with errors ) which require robust and tight coupling between modules.

 We invite contributions on any aspect (theoretical, methodological and practical) of spoken language processing and oral communication ; in particular (non-exclusive list):

 -Automatic speech recognition

-Spoken language understanding

-Speech translation

-Text-to-Speech synthesis

-Man-machine dialogue

-Robust analysis of spoken language

-Analysis of social affects or emotions in spontaneous speech

-Mining spoken language documents

-Spoken language applications (mobile interaction, robotics, etc. )

-Technologies for language learning

-Multilingual aspects of spoken language processing

-Evaluation for spoken language processing

-Corpora and resources for spoken language

-(Spoken) discourse analysis

-Adaptive dialogue (context, user profile)

-Analysis of paralinguistic features in spoken language

 

IMPORTANT DATES

-call : march 2014
-submission of contributions : 30 june 2014
-first authors notification : 15 september 2014 
-publication : end 2014 / begin 2015

Submission format

LANGUAGE
Manuscripts may be submitted in English or French. French-speaking authors are requested to submit their contributions in French.

PAPER SUBMISSION
Papers must describe original, completed, and unpublished work.  Each submission will be reviewed by two programme committee members. 
Papers must be submitted on Sciencesconf platform  http://tal-55-2.sciencesconf.org/ 

Accepted papers will be maximum 25 pages long in PDF. Style sheets are available for download on the Web site of the TAL journal

Top

6-38(2014-05-18) (W/M) developer position at IRCAM for Large-Scale Audio Indexing

Position:         1 (W/M) developer position at IRCAM for Large-Scale Audio Indexing

Starting:         September 1st, 2014

Duration:       15 months

Deadline for application:    July 1st, 2014

 

The BeeMusic project aims at providing the description of music for large-scale collections (several millions of music titles). In this project IRCAM is in charge of the development of music content description technologies (automatic recognition of genre or mood, audio fingerprint …) for large-scale music collections.

 

Position description 201406BMDEV:

-------------------------

For this project IRCAM is looking for a good developer -for the C++ development of the audio content analysis technologies and -the development of the management system for the storage, access and search over distributed data (audio and meta-data). He/she will be also in charge of the development of scalable search algorithms.

 

Required profile:

-------------------------

*High skill in C++ development (including template-based meta-programming)

*High skill in scalable indexing technologies and distributed framework (hash-table, Hadoop, SOLR)

*High skill in database management systems

*Good Skills in Matlab, Python and Java

*Good knowledge of Linux, Mac OSX and Windows development environment (gcc, Intel and MSVC,

svn)

*High productivity, methodical work, excellent programming style.

 

The developer will collaborate with the project team and participate in the project activities (evaluation of technologies, meetings, specifications, reports).  

 

Introduction to IRCAM:

-------------------------

IRCAM is a leading non-profit organization associated to Centre Pompidou, dedicated to music production, R&D and education in sound and music technologies. It hosts composers, researchers and students from many countries cooperating in contemporary music production, scientific and applied research. The main topics addressed in its R&D department include acoustics, audio signal processing, computer music, interaction technologies, and musicology. Ircam is located in the center of Paris near the Centre Pompidou, at 1, Place Igor Stravinsky 75004 Paris.

 

Salary:

-------------------------

According to background and experience

 

Applications:

-------------------------

Please send an application letter with the reference 201406BMDEV together with your resume and any suitable information addressing the above issues preferably by email to: peeters_at_ircam_dot_fr with cc to vinet_at_ircam_dot_fr, roebel_at_ircam_dot_fr.

 

 

Top

6-39(2014-05-29) (W/M) researcher positions SKAT-VG project at IRCAM, Paris

ENGLISH VERSION:

 

 

Positions: 2 (W/M) researchers positions at IRCAM for Large-Scale Audio Indexing

Starting:     August 18, 2014

Duration:     12 months

Deadline for application:   July, 1st, 2014

 

 

The BeeMusic project aims at providing the description of music for large-scale collections (several millions of music titles). In this project IRCAM is in charge of the development of music content description technologies (automatic genre or mood recognition, audio fingerprint …) for large-scale music collections.

 

Position description 201406BMRESA:

 

For this project IRCAM is looking for a researcher for the development of the technologies of automatic genre and mood recognition.

 

The hired Researcher will be in charge of the research and the development of scalable technologies for supervised learning (i.e. scaling GMM, PCA or SVM algorithms) to be applicable to millions of annotated data.

He/she will then be in charge of the application of the developed technologies for the training of large-scale music genre and music mood models and their application to large-scale music catalogues.

 

Required profile:

* High skill in audio indexing and data mining (the candidate must hold a PHD in one of these fields)

* Previous experience into scalable machine-learning models

* High-skill in Matlab programming, skills in C/C++ programming

* Skill in audio signal processing (spectral analysis, audio-feature extraction, parameter estimation)

* Good knowledge of Linux, Windows, MacOS environments

* High productivity, methodical works, excellent programming style.

 

The hired researcher will also collaborate with the development team and participate in the project activities (evaluation of technologies, meetings, specifications, reports).

 

Position description 201406BMRESB:

 

For this project IRCAM is looking for a researcher for the development of the technologies of audio fingerprint.

 

The hired Researcher will be in charge of the research and the development of audio fingerprint technologies that are robust to audio degradations (sound capture through mobile-phones in noisy environment) and fingerprint search algorithms in large-scale database (millions of music titles).

 

Required profile:

* High skill in audio signal processing and audio fingerprint design (the candidate must hold a PHD in one of these fields)

* High skill in indexing technologies and distributed computing (hash-table, Hadoop, SOLR)

* High-skill in Matlab programming, skills in Python and Java programming

* Good knowledge of Linux, Windows, MacOS environments

* High productivity, methodical works, excellent programming style.

 

The hired researcher will also collaborate with the development team and participate in the project activities (evaluation of technologies, meetings, specifications, reports).

 

Introduction to IRCAM:

 

IRCAM is a leading non-profit organization associated to Centre Pompidou, dedicated to music production, R&D and education in sound and music technologies. It hosts composers, researchers and students from many countries cooperating in contemporary music production, scientific and applied research. The main topics addressed in its R&D department include acoustics, audio signal processing, computer music, interaction technologies and musicology. Ircam is located in the centre of Paris near the Centre Pompidou, at 1, Place Igor Stravinsky 75004 Paris.

 

 

Salary:

According to background and experience

 

 

Applications:

Please send an application letter with the reference 201406BMRESA or 201406BMRESB together with your resume and any suitable information addressing the above issues preferably by email to: peeters_a_t_ircam dot fr with cc to vinet_a_t_ircam dot fr, roebel_at_ircam_dot_fr

 

VERSION FRANCAISE:

 

 

Offre d’emploi : 2 postes de chercheur (H/F) à l’IRCAM pour technologies d’indexation audio à grande échelle

Démarrage : 18 Aout 2014

Durée : 12 mois

Date limite pour candidature: 1er juillet 2014

 

 

Le projet BeeMusic a pour objectif de décrire la musique à grande échelle (plusieurs millions de titres musicaux). Dans ce projet, IRCAM est en charge du développement des technologies de description du contenu audio (reconnaissance automatique du genre et de l’humeur musicale, identification audio par fingerprint …) pour des grands catalogues musicaux.

 

Description du poste 201406BMRESA:

 

Pour ce projet, l’IRCAM recherche un/une chercheur(se) pour le développement des technologies de reconnaissance automatique de genre et humeur.

 

Le/la chercheur(se) sera en charge de la recherche et des développements concernant la mise à l’échelle des technologies d’apprentissage supervisée (passage à l’échelle des algorithmes GMM, PCA ou SVM), afin de permettre leur application à des millions de données. Il/elle sera en charge de l’application de ces technologies pour l’entrainement de modèles de genre et humeur musicale ainsi que de leur application à des grands catalogues.

 

Profil requis:      

* Très grande expérience en algorithmes d’apprentissage automatique et en techniques d’indexation (le candidat doit avoir un PHD dans un de ces domaines)

* Expérience de passage à l’échelle des ces algorithmes

* Très bonne connaissance de la programmation Matlab, connaissance de la programmation C/C++

* Bonne connaissance du traitement du signal (analyse spectrale, extraction de descripteurs audio, estimation de paramètres) 

* Bonne Connaissance des environnements Linux, Windows et Mac OS-X.

* Haute productivité, travail méthodique, excellent style de programmation, bonne communication rigueur

 

Le/la chercheur(se)  collaborera également avec l’équipe de développement et participera aux activités du projet (évaluation des technologies, réunion, spécifications, rapports).

 

Description du poste 201406BMRESB:

 

Pour ce projet, l’IRCAM recherche un/une chercheur(se) pour le développement des technologies d’identification audio par fingerprint.

 

Le/la chercheur(se) sera en charge de la recherche et du développement de la technologie d’identification audio par fingerprint robuste aux dégradations sonores (capture du son à travers un téléphone mobile en environnement bruité) et des algorithmes de recherche des fingerprints dans une très grande base de données (plusieurs millions de titres musicaux).

 

Profil requis:      

*Très bonne connaissance en traitement du signal et en conception d’audio fingerprint (le candidat doit avoir un PHD dans un de ces domaines)

*Très bonne connaissance en techniques d’indexation et de systèmes distribués (hash-table, Hadoop, SOLR)

* Très bonne connaissance de la programmation Matlab, connaissance de la programmation Python et Java

*Bonne connaissance des environnements Linux, Windows et Mac OS-X.

*Haute productivité, travail méthodique, excellent style de programmation, bonne communication rigueur

 

Le/la chercheur(se)  collaborera également avec l’équipe de développement et participeront aux activités du projet (évaluation des technologies, réunion, spécifications, rapports).

 

Présentation de l’Ircam:

 

L'Ircam est une association à but non lucratif, associée au Centre National d'Art et de Culture Georges Pompidou, dont les missions comprennent des activités de recherche, de création et de pédagogie autour de la musique du XXème siècle et de ses relations avec les sciences et technologies. Au sein de son département R&D, des équipes spécialisées mènent des travaux de recherche et de développement informatique dans les domaines de l'acoustique, du traitement des signaux sonores, des technologies d’interaction, de l’informatique musicale et de la musicologie. L'Ircam est situé au centre de Paris à proximité du Centre Georges Pompidou au 1, Place Stravinsky 75004 Paris.

 

 

Salaire:

Selon formation et expérience professionnelle

 

 

Candidatures:

Prière d'envoyer une lettre de motivation avec la référence 201406BMRESA ou 201406BMRESB  et un CV détaillant le niveau d'expérience/expertise dans les domaines mentionnés ci-dessus (ainsi que tout autre information pertinente) à peeters_a_t_ircam dot fr avec copie à

vinet_a_t_ircam dot fr, roebel_at_ircam_dot_fr

 

Top

6-40(2014-05-29) (W/M) researcher positions at IRCAM for Large-Scale Audio Indexing, Paris

ENGLISH VERSION:

 

 

Positions: 2 (W/M) researchers positions at IRCAM for Large-Scale Audio Indexing

Starting:     August 18, 2014

Duration:     12 months

Deadline for application:   July, 1st, 2014

 

 

The BeeMusic project aims at providing the description of music for large-scale collections (several millions of music titles). In this project IRCAM is in charge of the development of music content description technologies (automatic genre or mood recognition, audio fingerprint …) for large-scale music collections.

 

Position description 201406BMRESA:

 

For this project IRCAM is looking for a researcher for the development of the technologies of automatic genre and mood recognition.

 

The hired Researcher will be in charge of the research and the development of scalable technologies for supervised learning (i.e. scaling GMM, PCA or SVM algorithms) to be applicable to millions of annotated data.

He/she will then be in charge of the application of the developed technologies for the training of large-scale music genre and music mood models and their application to large-scale music catalogues.

 

Required profile:

* High skill in audio indexing and data mining (the candidate must hold a PHD in one of these fields)

* Previous experience into scalable machine-learning models

* High-skill in Matlab programming, skills in C/C++ programming

* Skill in audio signal processing (spectral analysis, audio-feature extraction, parameter estimation)

* Good knowledge of Linux, Windows, MacOS environments

* High productivity, methodical works, excellent programming style.

 

The hired researcher will also collaborate with the development team and participate in the project activities (evaluation of technologies, meetings, specifications, reports).

 

Position description 201406BMRESB:

 

For this project IRCAM is looking for a researcher for the development of the technologies of audio fingerprint.

 

The hired Researcher will be in charge of the research and the development of audio fingerprint technologies that are robust to audio degradations (sound capture through mobile-phones in noisy environment) and fingerprint search algorithms in large-scale database (millions of music titles).

 

Required profile:

* High skill in audio signal processing and audio fingerprint design (the candidate must hold a PHD in one of these fields)

* High skill in indexing technologies and distributed computing (hash-table, Hadoop, SOLR)

* High-skill in Matlab programming, skills in Python and Java programming

* Good knowledge of Linux, Windows, MacOS environments

* High productivity, methodical works, excellent programming style.

 

The hired researcher will also collaborate with the development team and participate in the project activities (evaluation of technologies, meetings, specifications, reports).

 

Introduction to IRCAM:

 

IRCAM is a leading non-profit organization associated to Centre Pompidou, dedicated to music production, R&D and education in sound and music technologies. It hosts composers, researchers and students from many countries cooperating in contemporary music production, scientific and applied research. The main topics addressed in its R&D department include acoustics, audio signal processing, computer music, interaction technologies and musicology. Ircam is located in the centre of Paris near the Centre Pompidou, at 1, Place Igor Stravinsky 75004 Paris.

 

 

Salary:

According to background and experience

 

 

Applications:

Please send an application letter with the reference 201406BMRESA or 201406BMRESB together with your resume and any suitable information addressing the above issues preferably by email to: peeters_a_t_ircam dot fr with cc to vinet_a_t_ircam dot fr, roebel_at_ircam_dot_fr

 

VERSION FRANCAISE:

 

 

Offre d’emploi : 2 postes de chercheur (H/F) à l’IRCAM pour technologies d’indexation audio à grande échelle

Démarrage : 18 Aout 2014

Durée : 12 mois

Date limite pour candidature: 1er juillet 2014

 

 

Le projet BeeMusic a pour objectif de décrire la musique à grande échelle (plusieurs millions de titres musicaux). Dans ce projet, IRCAM est en charge du développement des technologies de description du contenu audio (reconnaissance automatique du genre et de l’humeur musicale, identification audio par fingerprint …) pour des grands catalogues musicaux.

 

Description du poste 201406BMRESA:

 

Pour ce projet, l’IRCAM recherche un/une chercheur(se) pour le développement des technologies de reconnaissance automatique de genre et humeur.

 

Le/la chercheur(se) sera en charge de la recherche et des développements concernant la mise à l’échelle des technologies d’apprentissage supervisée (passage à l’échelle des algorithmes GMM, PCA ou SVM), afin de permettre leur application à des millions de données. Il/elle sera en charge de l’application de ces technologies pour l’entrainement de modèles de genre et humeur musicale ainsi que de leur application à des grands catalogues.

 

Profil requis:      

* Très grande expérience en algorithmes d’apprentissage automatique et en techniques d’indexation (le candidat doit avoir un PHD dans un de ces domaines)

* Expérience de passage à l’échelle des ces algorithmes

* Très bonne connaissance de la programmation Matlab, connaissance de la programmation C/C++

* Bonne connaissance du traitement du signal (analyse spectrale, extraction de descripteurs audio, estimation de paramètres) 

* Bonne Connaissance des environnements Linux, Windows et Mac OS-X.

* Haute productivité, travail méthodique, excellent style de programmation, bonne communication rigueur

 

Le/la chercheur(se)  collaborera également avec l’équipe de développement et participera aux activités du projet (évaluation des technologies, réunion, spécifications, rapports).

 

Description du poste 201406BMRESB:

 

Pour ce projet, l’IRCAM recherche un/une chercheur(se) pour le développement des technologies d’identification audio par fingerprint.

 

Le/la chercheur(se) sera en charge de la recherche et du développement de la technologie d’identification audio par fingerprint robuste aux dégradations sonores (capture du son à travers un téléphone mobile en environnement bruité) et des algorithmes de recherche des fingerprints dans une très grande base de données (plusieurs millions de titres musicaux).

 

Profil requis:      

*Très bonne connaissance en traitement du signal et en conception d’audio fingerprint (le candidat doit avoir un PHD dans un de ces domaines)

*Très bonne connaissance en techniques d’indexation et de systèmes distribués (hash-table, Hadoop, SOLR)

* Très bonne connaissance de la programmation Matlab, connaissance de la programmation Python et Java

*Bonne connaissance des environnements Linux, Windows et Mac OS-X.

*Haute productivité, travail méthodique, excellent style de programmation, bonne communication rigueur

 

Le/la chercheur(se)  collaborera également avec l’équipe de développement et participeront aux activités du projet (évaluation des technologies, réunion, spécifications, rapports).

 

Présentation de l’Ircam:

 

L'Ircam est une association à but non lucratif, associée au Centre National d'Art et de Culture Georges Pompidou, dont les missions comprennent des activités de recherche, de création et de pédagogie autour de la musique du XXème siècle et de ses relations avec les sciences et technologies. Au sein de son département R&D, des équipes spécialisées mènent des travaux de recherche et de développement informatique dans les domaines de l'acoustique, du traitement des signaux sonores, des technologies d’interaction, de l’informatique musicale et de la musicologie. L'Ircam est situé au centre de Paris à proximité du Centre Georges Pompidou au 1, Place Stravinsky 75004 Paris.

 

 

Salaire:

Selon formation et expérience professionnelle

 

 

Candidatures:

Prière d'envoyer une lettre de motivation avec la référence 201406BMRESA ou 201406BMRESB  et un CV détaillant le niveau d'expérience/expertise dans les domaines mentionnés ci-dessus (ainsi que tout autre information pertinente) à peeters_a_t_ircam dot fr avec copie à

vinet_a_t_ircam dot fr, roebel_at_ircam_dot_fr

 

Top

6-41(2014-05-30) Research Assistant/Associate in Statistical Spoken Dialogue Systems

Research Assistant/Associate in Statistical Spoken Dialogue Systems (Fixed Term)


Applications are invited for a research position in statistical spoken dialogue systems in the Dialogue Systems Group at the Cambridge University Engineering Department. The position is sponsored by Toshiba Cambridge Research Laboratory.

The main focus of the work will be on the development of techniques and algorithms for implementing robust statistical dialogue systems which can support conversations ranging over very wide, potentially open, domains. The work will extend existing techniques for belief tracking and decision making by distributing classifiers and policies over the corresponding ontology.

The successful candidate will have good mathematical skills and be familiar with machine learning. S/he will have the ability to design tests and experiments and to design and develop new methods and algorithms to address research objectives and find solutions. Preference will be given to candidates with specific understanding of Bayesian methods and reinforcement learning, experience in spoken dialogue systems, and strong software engineering skills. Candidates with good communication and writing skills and knowledge of semantic web, OWL and ontologies will be at an advantage. Candidates should have, or will shortly have, a PhD in an area related to speech technology. However, candidates with comparable research experience are also encouraged to apply.

This is an exciting opportunity to join one of the leading groups in statistical speech and language processing. Cambridge provides excellent research facilities and there are extensive opportunities for collaboration, visits and attending conferences.

Salary Ranges: Research Assistant: £24,289 - £27,318 Research Associate: £28,132 - £36,661

Fixed-term: The funds for this post are available for 24 months in the first instance.

The post is based in Central Cambridge, Cambridge, UK.

Once an offer of employment has been accepted, the successful candidate will be required to undergo a health assessment.

To apply online for this vacancy, please click on the 'Apply' link below. This will route you to the University's Web Recruitment System, where you will need to register an account (if you have not already) and log in before completing the online application form.

Please ensure that you upload your Curriculum Vitae (CV), a statement of research interests and a covering letter in the Upload section of the online application. If you upload any additional documents which have not been requested, we will not be able to consider these as part of your application. Please submit your application by midnight on the closing date.

If you have any questions about this vacancy or the application process, please contact: Elisabeth Barlow, email Elisabeth.Barlow@admin.cam.ac.uk. (Tel +44 01223 765692)

Please quote reference NM03182 on your application and in any correspondence about this vacancy.

The University values diversity and is committed to equality of opportunity.

The University has a responsibility to ensure that all employees are eligible to live and work in the UK

Apply online link:

http://hrsystems.admin.cam.ac.uk/recruit-ui/apply/NM03182

 

Top

6-42(2014-06-05) Thesis grant in Neurophysiological Investigation of prosodic cues.... Univ. Toulouse II -III, F

Subject : « NEUROPROS- Neurophysiological Investigation of prosodic cues processing by monolingual French and Spanish speakers, and bilingual speakers (French-Occitan and French-Spanish) »

Supervisors: Barbara Köpke, Denis Fize, Corine Astésano, Radouane El Yagoubi

Host Laboratories:

U.R.I Octogone-Lordat (EA 4156), Université de Toulouse II

CERCO (UMR 5549), Université Paul Sabatier - Toulouse III

Discipline: Linguistics

Doctoral School: Comportement, Langage, Education, Socialisation, Cognition (CLESCO)

Scientific description of the research project:

The project falls within an eminently interdisciplinary approach (linguistics, cognitive

neuropsychology and neurosciences) aiming at studying prosodic cues processing by

monolingual and bilingual French speakers. French is a language with so-called post-lexical,

non-distinctive accentuation, contrary to languages like Spanish, Catalan or Occitan where

accentual patterns are represented in the lexical entry. These prosodic characteristics have

lead to consider French as a ‘language without accent’ (Rossi, 1980), which makes it difficult

for this language to be integrated in models of speech processing (Cutler et al, 1997) since

they are mostly based on the metrical and accentual characteristics of languages (Cutler &

Norris, 1988). Also, these prosodic characteristics are said to be responsible for some degree

of ‘stress deafness’ by French listeners in foreign languages (Dupoux et al, 1997, inter alia).

However, if one considers the French accentual system in all its complexity, taking into

account the interaction between the primary final accent and the secondary initial accent in

the marking of prosodic constituents (Di Cristo, 2000), it becomes possible to postulate a role

of French accentuation in speech segmentation and lexical access strategies (Bagou &

Frauenfelder, 2006). More particularly, the Initial Accent seems to play a predominant role in

the marking of prosodic constituents in French (Astésano et al, 2007) and it is clearly

perceived by naïve listeners (Astésano et al, 2012). Recent neuroimaging studies (EEG)

indicate that metric incongruity slows lexical access in French (Magne et al, 2007). More

recently, we showed in a MisMatch Negativity paradigm that French listeners can readily

discriminate stress patterns in French and that the Initial Accent is encoded in long-term

memory at the level of the lexical word in French (Aguilera et al, 2014).

It is now necessary to consolidate these results by extending our investigations to other EEG

paradigms and by adapting the protocols to fMRI, in order to more precisely describe the

neural substrates and the temporal dynamics of prosodic cues processing in French.

Furthermore, these processing strategies have been observed on monolingual speakers only.

Comparing the linguistic strategies of monolingual and bilingual speakers (French, Spanish

and/or Catalan monolinguals, French/Occitan – French/Spanish or French/Catalan bilinguals)

will not only allow us to considerably enrich our comprehension of lexical access

mechanisms in these languages with different prosodic systems, but also to observe the

influence of the use of several languages with different stress patterns on the perception and

processing of prosodic cues.

The selected candidate will benefit from a stimulating scientific environment: (s)he will

integrate the Interdisciplinary Research Unit Octogone-Lordat (Toulouse II :

http://octogone.univ-tlse2.fr/) and will be co-supervised by Prof. Barbara Köpke, a specialist

on bilingualism, and by Dr. Denis Fize at the Research Centre on Brain and Cognition

(CERCO, Toulouse III), a researcher in Neurosciences and neuroimaging specialist. The

research will take place in the frame of a research group managed by Dr. Corine Astésano, a

specialist in prosody, and with Dr. Radouane El Yagoubi, a specialist of cognitive

neurosciences and psychology. The project is also connected to the French ANR research

project PhonIACog (http://aune.lpl-aix.fr/~phoniacog/) managed by Dr. Corine Astésano.

Bibliography

Aguilera, M. ; El Yagoubi, R. ; Espesser, R. ; Astésano, C. (2014). Event Related Potential investigation of Initial Accent

processing in French. Speech Prosody 2014, Dublin, U.K., May 20-23 2014 : 383-387.

Astésano, C.; Bard, E.; Turk, A. (2007) Structural influences on Initial Accent placement in French. Language and Speech,

50 (3), 423-446.

Astésano, C.; Bertrand, R.; Espesser, R.; Nguyen, N. (2012). Perception des frontières et des proéminences en français. JEPTALN-

RECITAL 2012, Grenoble, 4-8 juin 2012: 353-360.

Bagou, O., & Frauenfelder, U. H. (2006). Stratégie de segmentation prosodique: rôle des proéminences initiales et finales

dans l'acquisition d'une langue artificielle. Proceedings of the XXVIèmes Journées d'Etude sur la Parole, 571-574.

Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental

Psychology: Human perception and performance, 14(1), 113.

Cutler, A., Dahan, D., & Van Donselaar, W. (1997). Prosody in the comprehension of spoken language: A literature review.

Language and speech, 40(2), 141-201.

Di Cristo, A. (2000). Vers une modélisation de l'accentuation du français (seconde partie). Journal of French Language

Studies, 10(01), 27-44.

Dupoux, E., Pallier, C., Sebastian, N., & Mehler, J. (1997). A destressing “deafness” in French?. Journal of Memory and

Language, 36(3), 406-421.

Magne, C.; Astésano, C.; Aramaki, M.; Ystad, S.; Kronland-Martinet, R.; Besson, M. (2007) Influence of Syllabic

Lengthening on Semantic Processing in Spoken French: Behavioral and Electrophysiological Evidence. Cerebral Cortex

2007, 17(11), 2659-2668. doi: 10.1093/cercor/bhl174.

Rossi, M. (1980). Le français, langue sans accent?. Studia Phonetica Montréal, 15, 13-51.

Required skills:

- Master in Linguistics, cognitive sciences, neuropsychology or equivalent

- Experience in experimental phonetics and/or linguistics, psycholinguistics,

neurolinguistics

- Skills in signal processing (speech, EEG, fMRI) required, and dedication to the

development of these skills is essential

- Experimental skills wished, as well as a yearning for contact with participants and

motivation for the recruitment of participants

- Autonomy and motivation for learning new skills

- Good knowledge of French and English; knowledge of Spanish, Catalan, Occitan an

asset.

Salary:

- 1 684.93monthly gross (1 368net), 3 year contract

Calendar:

- Sending of applications: 27th june 2014

- Audition of selected candidates: 3rd july 2014

- Start of contract: 1rst october 2014

Applications must be sent to Corine Astésano (corine.astesano at univ-tlse2.fr) and will

include:

- A detailed CV, with list of publications if applicable

- A copy of grades for the Master’s degree

- A summary of the Master’s dissertation and a pdf file of the Master’s dissertation

- A cover letter / letter of interest and/or scientific project (1 page max.)

- The names and email addresses of 2 referent scientific personalities/ supervisors.

Top

6-43(2014-06-012) 2 PhD scholarships at Italian Institute of Technology, Genova, Italy

 

1

 

1. Acoustic-articulatory modeling for automatic speech recognition

Tutors: Leonardo Badino, Lorenzo Rosasco, Luciano Fadiga

Department: Robotics Brain and Cognitive Sciences (Italian Istitute of Technology), Genova, Italy

http://www.iit.it/rbcs

Description: State-of-the art Automatic Speech Recognition (ASR) systems produce remarkable results in some scenarios but still lags behind human level performance in several real usage scenarios and often perform poorly whenever the type of acoustic noise, the speaker’s accent and speaking style are 'unknown' to the system, i.e., are not sufficiently covered in the data used to train the ASR system.

The goal of the present theme is to improve ASR accuracy by learning representations of speech that combine the acoustic and the (vocal tract) articulatory domain as opposed to purely acoustic representations, which only consider the surface level of speech (i.e., speech acoustics) and ignore its causes (the vocal tract movements). Although in real usage settings the vocal tract cannot be observed during recognition it is still possible to exploit the articulatory representations of speech where phonetic targets (i.e., the articulatory targets necessary to produce a given sound) are largely invariant (e.g., to speaker variability) and complex (in the acoustic domain) speech phenomena have simple descriptions.

Joint acoustic-articulatory modeling will be applied in two different ASR training settings: a typical supervised machine learning setting where phonetic transcriptions of the training utterances are provided by human experts, and a weakly supervised machine learning setting where much sparser and less informative labels (e.g., word-level rather than phone level labels) are available.

Requirements: The successful candidate will have a degree in computer science, bioengineering, physics or related disciplines, and a background in machine learning. Interest in neuroscience.

Reference: King, S., Frankel, J., Livescu, K., McDermott, E., Richmond, K., Wester, M. (2007). 'Speech production knowledge in automatic speech recognition'. Journal of the Acoustical Society of America, vol. 121(2), pp. 723-742.

Contacts: leonardo.badino@iit.it, lorosasco@mit.edu, luciano.fadiga@iit.it 2

2. Speech production for automatic speech recognition in human–robot verbal interaction

Tutors: Giorgio Metta, Leonardo Badino, Luciano Fadiga

Department: iCub Facility (Istituto Italiano di Tecnologia), Genova, Italy

http://www.iit.it/iCub

Description: State-of-the art Automatic Speech Recognition (ASR) systems produce remarkable results in partially controlled scenarios but still lags behind human level performance in unconstrained real usage situations and perform poorly whenever the type of acoustic noise, the speaker’s accent and speaking style are 'unknown' to the system, i.e., are not sufficiently covered in the data used to train the ASR system. The goal of this PhD theme is to attack the problem of ASR in a human to robot conversation. To this aim, we will create a robust Key Phrases Recognition system where commands delivered by the user to the robot (i.e., the key phrases) have to be recognized in unconstrained utterances (i.e., utterances with hesitations, disfluencies, additional out-of-task words, etc.), in the challenging conditions of human-robot verbal interaction where speech is typically distant (to the robot) and noisy. To increase the robustness of the ASR, articulatory information will be integrated into a Deep Neural Network – Hidden Markov Model system.

This work will be carried out and tested on the iCub platform.

Requirements: background in computer science, bioengineering, computer engineering, physics or related disciplines. Solid programming skills in C++, Matlab, GPU (CUDA) are a plus. Attitude for problem solving. Interests in understanding/learning basic biology.

Reference: Barker, J., Vincent, E., Ma, N., Christensen, H., Green, P., (2013) 'The PASCAL CHiME Speech Separation and Recognition Challenge'. Computer Speech and Language, vol. 27(3), pp. 621-633.

Contacts: leonardo.badino@iit.it, giorgio.metta@iit.it, luciano.fadiga@iit.it

Additional information

Starting date: November 2014.

PhD scholarship: the scholarship will cover all fees with a gross salary of 16500 euros/year (≈1250 euros/month after taxes)

Top

6-44(2014-06-11) Post-doc position at IMMI-CNRS

Post-doc position at IMMI-CNRS

A post-doctoral position is proposed at IMMI-CNRS (Orsay, France - http://www.immi-labs.org/). IMMI is an International Joint Research CNRS Unit (UMI) in the field of Multimedia and Multilingual Document Processing. It gathers three contributing partners: LIMSI-CNRS, RWTH Aachen and KIT (Karlsruhe Institute of Technology).

Context of the project

The project relies on an experimental platform for online monitoring of social media and information streams, with self-adaptive properties, in order to detect, collect, process, categorize, and analyze  multilingual streams.  The platform includes  advanced linguistic analysis, discourse analysis, extraction of entities and terminology, topic detection,  translation and the project includes studies on unsupervised and cross-lingual adaptation.

Requirements and objectives

A PhD in a field related to the project (translation, natural language processing or machine learning) is required. The candidate will perform research in the framework mentioned above, and will supervize collection and annotation of the data. Salary will follow CNRS standard rules for contractual researchers, according to the experience of the candidate.

Contacts

  • Gilles Adda (adda [at] immi-labs.org)

Agenda

  • Opening date: August 2014
  • Application deadline: Open until filled
  • Duration: 24 months

   

Top

6-45(2014-06-12) Postdoc position on conversation summarization, Univ.Aix-Marseille France
Postdoc position on conversation summarization
(Full time, one year - Closing date for applications 2014-07-01)

We are looking for an outstanding research scientist to join the
'SENSEI' european project (http://www.sensei-conversation.eu/). You
will contribute to conversation analysis summarization research to
allow the exploitation of large quantity of comments in social
media and spoken conversations.

Job description:
You will contribute to the design and development of speech and text
summarization technologies for conversational data such as social
media comments and tweets. There will be three components to the
system: linguistic analysis of the conversations, content selection
and aggregation, and generation of the summaries (text or other
media). The approach is expected to make use of recent machine
learning advances such as deep learning, and focus on limiting the
quantity of supervision needed. The prototype will be evaluated by
end-user professionals in ecological conditions.

Profile:
The applicant must hold a PhD degree, preferably in the field of
natural language processing or machine learning. He/she should:
- Be proficient in one of Java / C++ programming and python or php scripting
- Have experience with developing efficient NLP / machine learning systems.
- Be keen on researching the literature and writing papers
- Enjoy team work and be autonomous

Location:
You will work at the LIF computer science lab at Aix-Marseille
University in France, at the Luminy campus next to the
calanques.

Dates:
Interviews will be held in July 2014, the Postdoc will start in
september / october 2014 and last one year.

Contact:
Enquiries and applications should be sent to Benoit Favre:
benoit.favre@lif.univ-mrs.fr

SENSEI project page: http://www.sensei-conversation.eu/
Top

6-46(2014-06-18) Two positions at the University of Cambridge, UK
 
Top

6-47(2014-06-21) 3 PhD Positions in Speech Processing at LIG/Grenoble (France)

3 PhD Positions in Speech Processing at LIG/Grenoble (France)

 
The Study Group for Machine Translation and Automated Processing of Languages and Speech (GETALP) of LIG (Laboratory of Informatics of Grenoble) offers 3 PhD Positions in Speech Processing. We are looking for outstanding young research scientists to join the group on several projects involving speech processing.
 
Opened Positions
 
  1. PhD  / Automatic speech recognition and machine assisted speech annotation for African Languages
You will work in the context of the ALFFA project which is really interdisciplinary since it not only gathers technology experts (LIG, LIA, VOXYGEN) but also includes fieldwork linguists/phoneticians (DDL). The PhD will focus on analysing the capabilities of existing automatic speech processing systems to investigate phonetic characteristics of languages or annotate speech (especially on mobile devices: tablets, glasses, etc) to provide an innovative digital assistant to the fieldwork linguist.
Start : Fall 2014
Duration : 36 months
Particular aspect : co-supervision with DDL lab in Lyon
Project Web Site : http://alffa.imag.fr
 

  1. PhD / Speech interaction for socio-affective ubiquitous agents and robots in ambient assisted living environments

You will work on a research and development project (CASSIE) involving academic and industrial stakeholders of spoken dialog, assistive technologies, affectives sciences and social robotics. The PhD objective is to design a spoken dialogue system that will interact with a user in her/his home through an ubiquitous (physical and/or virtual) and personalized agent. This dialogue system will be corpus based, with iterative machine learning approach hydride with boostrap expert knowledge (observed from “intelligent” annotations) from spontaneous and ecological data collected in real or quasi-real environment (Smart Home) and situation (real scenario). The system will focus on the socio-affective dimensions of the interaction (socio-affective prosody, paralinguistic events, imitation, synchrony etc), especially the dynamics (timing) of the dialog… One aspect of this PhD will also focus on  the comparison of the same character implemented in robot versus virtual agent for interaction (epathy aspects, etc.).

 

Start : Fall 2014

Duration : 36 months

Contact : Veronique.Auberge@imag.fr & Benjamin.Lecouteux@imag.fr (+Laurent.Besacier@imag.fr)


3. PhD / Context-aware spoken dialogue in ambient assisted living environments

  1. You will work on a research and development project (CASSIE) involving academic and industrial stakeholders of spoken dialog, assistive technologies and social robotics. The PhD objective is to make a social cyber-physical agent 'aware'  of its environment by sensors and/or connected objects. This contextual information will drive the system interaction (natural language understanding and dialog). The heart of the research will be to build probabilistic and logical models for multimodal situation analysis and understanding in a domestic and multilingual context. For the experimental development and validation, the research will benefit from the fully-equipped LIG smart home (DOMUS).
    Start : Fall 2014
    Duration : 36 months (PhD)
 
Profiles The applicants must hold a Master degree in Computational Linguistics, Computing sciences or Cognitive Sciences preferably with experience in the fields of speech processing and/or natural language processing and/or machine learning. Good background in programming will also be required. 
He/she will also be involved in experimenting the technology with human participants being either French or English speakers. For this reason good English level is required as well as a good command of French. Finally effective communication skills in English, both written and verbal are mandatory.
 
Location Grenoble is a high-tech city with 4 universities. It is located at the heart of the Alps, in outstanding scientific and natural surroundings. It is 3h by train from Paris ; 2h from Geneva ; 1h from Lyon ; 2h from Torino and is less than 1h from Lyon international airport.
 
Research Group Website : http://getalp.imag.fr 
 
Dates Interviews will be held in July 2014 (until September 2014 if needed). Meetings during Interspeech 2014 in SIngapore can be also organized.
Top

6-48(2014-06-22) Two PhD student positions in phonetics or speech science, Saarland University, Saarbrücken, Germany

Two PhD student positions in phonetics or speech science, Saarland
University, Saarbrücken, Germany

Closing date 5 July 2014 (open until filled), positions starting 1
October 2014

http://www.coli.uni-saarland.de/~moebius/page.php?id=jobs

Top

6-49(2014-06-25) RESEARCH FACILITATOR IN SPEECH TECHNOLOGY - CLOUDCAST NETWORK, Univ. Sheffield, UK

RESEARCH FACILITATOR IN SPEECH TECHNOLOGY - CLOUDCAST NETWORK

Applications are invited for a position as Research Facilitator in the Speech and Hearing (SPandH) research group and the Centre for Assistive Technology and Connected Healthcare at Sheffield University to work on CloudCAST, a recently-awarded international network funded by the Leverhulme Trust and coordinated by Professor Phil Green. The vision of CloudCAST is

'.. to provide a way in which rapid developments in machine learning and speech technology can be placed in the hands of professionals who deal with speech problems: therapists, pathologists, teachers, assistive technology experts.. We intend to do this by creating a free-of-charge, remotely-located, internet-based resource 'in the cloud' which will provide a set of software tools for personalised speech recognition, diagnosis, interactive spoken language learning and the like. We will provide interfaces which make the tools easy to use for people who are not speech technology experts and create a self-sustaining CloudCAST community to manage future development.'

CloudCAST involves collaboration with


The Facilitator will be responsible for the software engineering required to build the CloudCAST resource. This involves taking algorithms and data which have been developed for research and knitting them into a form that

  •     is accessible to people who are not experts in speech technology,
  •     has a uniform look-and-feel,
  •     allows for amendments and additions,
  •     encourages others to contribute
  •     is available over the internet ('resides in the cloud').


The Facilitator may also become involved with pilot research studies using the resource, and will be responsible for organising and participating in an extensive series of visits between the 4 sites involved.

A good degree in Computer Science, Software Engineering, Mathematics or a closely-related subject. Is required for all applicants.  An appointment at Grade 7 will require a PhD. in speech technology or equivalent industrial experience. Applicants should have knowledge of speech technology and software engineering skills. The supporting documentation gives details. You can view the documentation by clicking on About the Job and About the University located near the top of your screen.

This is a full-time post, available now.

For supporting documentation and details of how to apply, visit

http://www.jobs.ac.uk/job/AIW424/research-facilitator-in-speech-technology-cloudcast-network/

Informal enquiries to Professor Phil Green, p.green@shef.ac.uk

Closing Date: 30th June 2014.

 

Top

6-50(2014-06-27) Research professorship at KU Leuven ESAT/PSI – Audio and/or Speech Processing
Research professorship at KU Leuven ESAT/PSI – Audio and/or Speech Processing The division ESAT/PSI (Processing Speech & Images, http://www.esat.kuleuven.be/psi) performs 
fundamental and applied research in the broad field of audio-visual information processing. The 
research is multidisciplinary and integrates expertise from engineering, physics, mathematics, 
medicine, linguistics, machine learning and computational science. New methods are developed and 
validated in computer vision, medical imaging, speech and audio processing and other application 
fields. PSI is one of the leading labs in its areas of research. The division is part of the EE 
department (ESAT) of the University of Leuven, the largest and highest-ranked university in 
Belgium. Leuven lies about 25 km east of Brussels and 15 minutes from Brussels airport by train.

To strengthen and widen its research domain, the PSI division is looking for a research 
professor in the area of audio and/or speech processing. The focus is on the interpretation of 
large amounts of these data, possibly in combination with other sensorial data (eg. images). We 
live in an environment where sound is ubiquitous and the interpretation of speech and other 
sounds is crucial for safety, for communication, for understanding of our environment, ... In 
many applications the ability of a computer to achieve human-like performance in this respect is 
highly desired and worldwide a lot of research effort is spent to achieve this goal. We want to 
expand our own research lines by hiring a new professor to enlarge the existing group with new 
projects and researchers exploring new ideas and paradigms that advance the state-of-the-art in 
this area.

The candidate must be an internationally recognized researcher, with a strong publication 
record. At the start of the mandate he/she must have at least 3 years of experience in 
scientific research as a postdoc, with hands-on experience in supervising PhD students. 
Experience with successful project grant writing is a definite plus. He/she also needs to 
possess didactic qualities. The position is primarily research-oriented, but the applicants must 
be prepared and are also expected to undertake limited teaching assignments. Applicants should 
be prepared to learn Dutch.

Entering research professors are appointed with a rank depending on their qualifications. Young 
researchers with at least 3 years and less than 7 full years of postdoctoral experience at the 
time of the appointment are typically offered a Tenure Track position, without excluding a 
higher academic position. Advanced researchers with at least 7 years of postdoctoral experience 
at the time of appointment are typically hired as a full professor, without excluding a Tenure 
Track position.

Applications should include a CV (incl. a complete publication list) and an abstract (1-2 pages) 
of a research proposal for the coming five years. They should be submitted by e-mail as soon as 
possible but ultimately before August 31st, 2014 to

Katholieke Universiteit Leuven
Department of Electrical eEgineering - ESAT
Center for Processing Speech and Images - PSI
Kasteelpark Arenberg 10 bus 2441
3001 Heverlee, Belgium
E-Mail: patrick dot wambacq at esat dot kuleuven dot be

Applicants may be invited to give a seminar to the staff of the research division ESAT/PSI. 
Subsequently, promising candidates will be asked to participate to the university-wide selection 
procedure for research professorships. Each year the KU Leuven appoints a number of research 
professors. These positions are financed by a university fund called 'BOF' (Bijzonder 
Onderzoeksfonds) that is funded by the Flemish Government.
Top

6-51(2014-06-26) Two 2-year post-doctoral position, Univ. Aix-Marseille, FR

Call for a two 2-year post-doctoral position

Laboratoire Parole et Langage (UMR 7309 Aix-Marseille Université / CNRS)

Aix-en-Provence, France

Principal investigator: Serge Pinto, Ph.D

 

Dysarthria in Parkinson’s disease: Lusophony vs. Francophony comparison (FraLusoPark)

Parkinson’s disease (PD) is classically characterized by a symptomatic triad that includes rest tremor, akinesia and hypertonia and although the motor expression of the symptoms involves mainly the limbs, the muscles implicated in speech production are also subject to specific dysfunctions. Motor speech disorders, so-called dysarthria, can thus be developed by PD patients. The main objective of our project is to evaluate the physiological parameters (acoustics), perceptual markers (intelligibility) and psychosocial impact of dysarthric speech in PD, in the context of language (French vs. Portuguese) modulations. PD patients will be enrolled in the study in Aix-en-Provence, France and Lisbon, Portugal. The proposed position refers to the data acquisition and analysis in the French site (Aix-en-Provence).

In order to achieve the goals of this project, 1 post-doctoral position is proposed to a young and dynamic researcher. The candidate, who should have experience in speech sciences research (acoustics, perception, prosody), will participate to the acquisition and analysis of speech data.

This project benefits from a bilateral ANR/FCT financial support (for the French side: project n° ANR-13-ISH2-0001-01).

 

Interested candidates should contact the principal investigator by sending:

-          a detailed CV

-          a letter of motivation

-          letters of recommendation (optional)

 

Duration of the position: 2 years (full-time)

Monthly salary: 2 000 € net

Application deadline: 2014, September 30th

Starting date: 2014, November 1st

For supplementary information and applications: serge.pinto@lpl-aix.fr



Top

6-52(2014-07-01) Postdoc position in Speech Synthesis, Saarland University, Saarbrücken, Germany

*Postdoc position in Speech Synthesis* (full-time, 2 years
from October 2014, extendable) at Saarland University, Saarbrücken, Germany

Please link to
http://www.coli.uni-saarland.de/~steiner/job_advertisement.pdf

Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA