ISCA - International Speech
Communication Association


ISCApad Archive  »  2012  »  ISCApad #168  »  Jobs

ISCApad #168

Sunday, June 10, 2012 by Chris Wellekens

6 Jobs
6-1(2011-12-01) Ph.D. Program Carnegie Mellon | PORTUGAL
Ph.D. Program Carnegie Mellon | PORTUGAL in the area of Language and Information Technologies Deadline: December 15, 2011 The Language Technologies Institute (LTI) of the School of Computer Science at Carnegie Mellon University offers a dual degree Ph.D. Program in Language and Information Technologies in cooperation with Portuguese Universities. This Ph.D. program is part of the Carnegie Mellon | Portugal Partnership. The Language Technologies Institute, a world leader in the areas of speech processing, language processing, information retrieval, machine translation, machine learning, and bio-informatics, has been formed 20 years ago. The breadth of language technologies expertise at LTI enables new research in combinations of the core subjects, for example, in speech-to-speech translation, spoken dialog systems, language-based tutoring systems, and question/answering systems. The Portuguese consortium of Universities includes (but is not limited to) the Spoken Language Systems Lab (L2F) of INESC-ID Lisbon/IST, the University of Lisbon (FLUL), the University of Beira Interior (UBI) and the University of Algarve (UALG). These Universities share expertise in the same language technologies as LTI, although with a strong focus on processing the Portuguese language. The LT program involves 1 or 2 new PhD students every year. Each Ph.D. student will receive a dual degree from LTI and the selected Portuguese University, being co-supervised by one advisor from each institute, and spending approximately half of the 5-year doctoral program at each institute.The academic part will be done during the first 2 years, including a maximum of 8 courses, with a proper balance of focus areas (Linguistic, Computer Science, Statistical/Learning, Task Orientation). The remaining 3 years of the doctoral program will be dedicated to research. The thesis topic will be in one of the research areas of the cooperation program, defined by the two advisors. Two multilingual topics have been identified as primary research areas (although other areas of human language technologies may be also contemplated): computer assisted language learning (CALL) and speech-to-speech machine translation (S2SMT). The doctoral students will be involved in one of the collaborative projects between LTI and the Portuguese Universities aimed at building real HLT systems. The scholarship will be funded by the Foundation for Science and Technology (FCT), Portugal. How to Apply The application deadline for the LT Ph.D. program in the scope of the CMU-Portugal partnership is December 15. Students interested in the dual doctoral program must apply by filling the corresponding form at the LTI webpage. For more information about the joint degree doctoral program in LT, send email to the coordinators of the program: •Isabel.Trancoso at inesc-id dot pt •LTI_Portugal_Admissions at cs dot cmu dot edu The applications will be screened by a joint committee formed by representatives of LTI and of the Portuguese Universities. The candidates should indicate their scores in GRE and TOEFL tests. Despite the particular focus on the Portuguese language, applications are not in any way restricted to native or non-native speakers of Portuguese. Program Highlights REAP.PT project http://call.l2f.inesc-id.pt/reap.public/ PT-STAR project http://pt-star.l2f.inesc-id.pt/ptstar/ See also Priberam Machine Learning Lunch Seminars http://www.priberam.pt/Empresa/Inovacao/Seminarios.aspx Lisbon Machine Learning Summer School http://lxmls.it.pt/
Back  Top

6-2(2011-12-01) Post-doctoral position in Cognitive Neuroscience

Post-doctoral position in Cognitive Neuroscience
Applications are invited for a full-time post-doctoral research position in the
MULTISENSORY RESEARCH GROUP at the Pompeu Fabra University (Barcelona).
The post is part of the BRAINGLOT project, a Research Network on Bilingualism and
Cognitive Neuroscience (Consolider-Ingenio 2010 Scheme, Spanish Ministry of
Science and Education).
Project
The project brings together the efforts of several research groups spanning different
scientific disciplines with the common purpose of addressing the phenomenon of
bilingualism from an open and multidisciplinary perspective. The MRG attempts to
understand the use of multisensory cues (audiovisual) speech information in the
context of learning and using a second language. The project includes behavioral,
neuroimaging (fMRI, ERP) and, neurostimulation (TMS) approaches.
Job description
We seek a person who leads the electrophysiological aspects of the project
(ERP/EEG), including the development of independent scientific studies, as well as the
participation (i.e., supervision) in others. Involvement in some organizational and
management aspects is also expected.
Candidate requirements
- Previous experience in ERP/EEG recording and analysis is *indispensable*
- PhD
- Motivation about the question of multisensory integration and/or speech perception
- Background in cognitive neuroscience, neuroscience, and/or cognitive psychology
- Programming skills
* Applicants from outside the EU are welcome to apply but must qualify for a valid visa.
Conditions
- Duration: The position will be funded and renewable for up to two years
- Starting date: As soon as possible
- Salary: 28000EUR/year
- Travel: The project may require some travel to conferences / meetings
How to apply
Applications should include:
- a C.V. including a list of publications
- the names of two referees who would willing to write letters of recommendation
- a brief cover letter describing research interests
For informal enquiries about the position and applications, please contact Salvador
Soto-Faraco. salvador.soto@icrea.es (http://www.mrg.upf.edu). Applications will be
accepted until the position is filled.
Please, mention that you are applying to the POSTDOCTORAL position in the email
subject

Back  Top

6-3(2011-12-23) Offre post-doctorale en traitement de la parole, LIA Avignon France

Offre post-doctorale en traitement de la parole.

Avignon, LIA.

Date limite de candidature : 15 Février 2012

Dans le cadre du projet « DECODA » (http://decoda.univ-avignon.fr)  financé par l'ANR, le LIA souhaite recruter un chercheur post-doctorant dans le domaine du traitement automatique de la parole. Le projet Decoda s'intéresse au dépouillement de données audio issues de grands centres d'appels. C'est un projet mené par le LIA, le LIF, la RATP et la société Sonear.

Le post-doctorant travaillera au développement et à l'évaluation de méthodes globales intégrant reconnaisseurs de parole et modules de catégorisation de documents pour la structuration de très grandes bases de documents parlés.


Le candidat devra être titulaire d'une thèse en informatique ou en traitement du signal avec une bonne expérience en traitement de la parole et/ou en modélisation statistique du langage. Il devra posséder des connaissances théoriques et pratiques en reconnaissance des formes, classification et apprentissage automatique.

Le LIA (http://lia.univ-avignon.fr) est une Equipe d'Accueil (EA n° 4128) qui regroupe les enseignants-chercheurs de l’Université d’Avignon et des Pays de Vaucluse (UAPV) relevant de la 27e section du CNU ainsi que les ingénieurs, les doctorants et les stagiaires de MASTER durant la période consacrée à leur travail de recherche. Le post-doctorant rejoindra la thématique langage du laboratoire et travaillera avec les membres de la thématique travaillant sur le projet Decoda.

Le poste, d'une durée de 12 mois (avec prolongations possibles) est à pourvoir à partir du 15e janvier 2012. La rémunération est comprise entre 2000 et 2400 euros nets par mois, suivant l'expérience du candidat.


Contact: Georges Linarès, LIA (georges.linares@univ-avignon.fr)

Back  Top

6-4(2011-12-23) Researcher and Research Software Engineer Positions , AT&T Labs - Research

AT&T Labs - Research

Researcher and Research Software Engineer Positions

AT&T Research, one of the premier industrial research laboratories in
the world, is looking for talented individuals to make a difference in
the world of communications.  Our researchers and research software
engineers are dedicated to solving real problems in speech and language
processing, and are involved in inventing, creating and deploying
innovative services. We also explore fundamental research problems in
these areas. Outstanding Ph.D.-level candidates at all levels of
experience are encouraged to apply.  Candidates must demonstrate
excellence in research, a collaborative spirit and strong communication
and software skills.

Areas of particular interest are

     * Large-vocabulary automatic speech recognition
     * Acoustic and language modeling
     * Robust speech recognition
     * Signal processing
     * Adaptive learning
     * Pronunciation modeling
     * Natural language understanding and dialog
     * Speaker biometrics
     * Voice and multimodal search
     * Software engineering for speech and language processing

Positions will be based in New Jersey, New York, or California,
depending on area of focus.

For more information, visit www.research.att.com and click on 'Working
with us'.

Back  Top

6-5(2011-12-24) Stage fin d’études pré-embauche : Ingénieur Développement

Stage fin d’études pré-embauche : Ingénieur Développement

Avant-vente: Authentification forte grâce à la voix

La société :

La société GEOLSemantics est une société française basée à Paris et qui
compte une quinzaine de personnes.
Elle a pour vocation de proposer des solutions logicielles pour aider à
la détection et au tracking d’activités hostiles envers les Etats, les
entreprises et les personnes. Ces solutions s’appliquent à l’information
contenue dans les différents média : texte, audio et vidéo.
Elles utilisent des algorithmes sophistiqués d’analyse sémantique
multilingue pour permettre aux opérationnels de disposer d’une
information synthétique représentant la connaissance extraite des
différentes sources d’information.
Elles utilisent également des technologies d’avant-garde
d’identification de locuteur dans les médias audio ou vidéo.

Sujet du stage

GEOLSemantics recherche un stagiaire bac +5 en informatique pour la
réalisation d’un prototype ‘authentification forte de locuteur’
utilisant nos outils de biométrie vocale.
Intégré(e) au sein d'une équipe à taille humaine, vous aurez la
possibilité de vous épanouir dans le monde de la sécurité renforcée et
de la lutte contre l’usurpation d’identité.

Contenu du stage :

Le but de ce stage est de réaliser un prototype pour appuyer la vente de
nos solutions de biométrie vocale. L’objectif est d’améliorer la
sécurité d’accès des applications grâce à l’identification vocale du
locuteur. Cette authentification forte vient en complément des méthodes
classiques basées sur un login/mot de passe.

Déroulement du stage :
-    Choix et mise en place de l’architecture,
-    Elaboration de scenarii,
-    Développement de l’infrastructure du prototype,
-    Développement d’une application sous Androïd pour la mise à
        disposition du prototype sur mobile,
-    Test et présentation aux avant projets client.

Vos capacités d'autonomie, vos aptitudes à vous exprimer et à rédiger
vous permettront rapidement d'être responsable de la réalisation et du
suivi complet de vos travaux sous la conduite d’un ingénieur
expérimenté.

Profil du stagiaire :

Compétences techniques requises :

-    Master 2 ou ingénieur en informatique
-    bonnes connaissances en Java Web (notion de HTML/CSS)
-    des connaissances en Androïd (ou capacité à acquérir ces
        connaissances rapidement)
-    capacité à appréhender un domaine nouveau (biométrie vocale)
-    Maîtrise du XML et de RDF

Qualités requises : Bon rédactionnel. Forte curiosité fonctionnelle,
forte autonomie

Le stage est placé sous la responsabilité du responsable avant-vente

-    Durée : 6 mois minimum
-    Début du stage : dès que possible
-    Lieu : Paris 15ème
-    Indemnités : à débattre

Candidature à adresser  (CV, lettre de motivation, photo) à :
Emmanuel Dupont, Directeur des Opérations
Société GEOLSemantics
32,  rue Brancion
75015 Paris
emmanuel.dupont@geolsemantics.com

Back  Top

6-6(2012-01-04) Audio Indexing Researcher W/M position at IRCAM – 3DTV project

Audio Indexing Researcher W/M position at IRCAM – 3DTV project

Starting :  January - February , 2012

Duration : 18 months

 

Introduction to IRCAM

IRCAM is a leading non-profit organization associated to Centre Pompidou, dedicated to music production, R&D and education in acoustics and music. It hosts composers, researchers and students from many countries cooperating in contemporary music production, scientific and applied research. The main topics addressed in its R&D department include acoustics, audio signal processing, computer music, interaction technologies, musicology. Ircam is located in the centre of Paris near the Centre Pompidou, at 1, Place Igor Stravinsky 75004 Paris.

 

Introduction to 3DTVs project

The goal of the 3DTVS project is to devise scalable 3DTV AV content description, indexing, search and browsing methods across open platforms, by using mobile and desktop user interfaces and to incorporate such functionalities in 3D audiovisual content archives. 3D multichannel audio analysis targets audio event detection based on fusion techniques that combine the feature analysis performed in the individual channels as well as source localization and separation algorithms for the detection of moving audio sources. The results will be used in 3D audio/cross-modal indexing and retrieval. Multimodal 3D audiovisual content analysis will built on the results of 3D video and audio analysis. 3DTV content description and search mechanisms will be developed to enable fast reply to semantic queries.

 

Role of IRCAM in the 3DTV Project

In the 3DTVs project, IRCAM is in charge of the research and development of technologies related to

-        Audio event detection using multi-channel audio scenes

-        Speaker diarization

-        Segmentation into Movie scene from the audio signal

-        Sound source separation, localization and identification

 

Position description

Hired Researcher will be in charge of the development of technologies related to:

  • Audio event detection using multi-channel audio scenes
  • Speaker diarization
  • Segmentation into Movie scene from the audio signal

 

The Researchers will also collaborate with the development team and participate in the project activities (evaluation, meetings, specifications).

 

Required profiles

  • High skill in audio indexing and data mining (statistical modelling, automatic feature selection algorithm …); especially late-fusion algorithms
  • High skill in audio signal processing (spectral analysis, audio-feature extraction, parameter estimation)
  • High-skill in Matlab programming, skills in C/C++ programming

 

  • Good knowledge of Linux, Windows, MAC-OS environments
  • High productivity, methodical works, excellent programming style.

 

 

Salary

According to background and experience

 

Applications

Please send an application letter together with your resume and any suitable information addressing the above issues preferably by email to: peeters_a_t_ircam dot fr with cc to vinet_a_t_ircam dot fr, roebel_at_ircam_dot_fr

 

 


L’Ircam recrute un Chercheur  H/F – en CDD de 18 mois et à temps plein – Projet 3DTVs

Poste disponible à partir du 1er janvier 2012

 

Présentation de l’Ircam

L'Ircam est une association à but non lucratif, associée au Centre National d'Art et de Culture Georges Pompidou, dont les missions comprennent des activités de recherche, de création et de pédagogie autour de la musique du XXème siècle et de ses relations avec les sciences et technologies. Au sein de son département R&D, des équipes spécialisées mènent des travaux de recherche et de développement informatique dans les domaines de l'acoustique, du traitement des signaux sonores, des technologies d’interaction, de l’informatique musicale et de la musicologie. L'Ircam est situé au centre de Paris à proximité du Centre Georges Pompidou au 1, Place Stravinsky 75004 Paris.

 

Introduction au projet 3DTVs

L'objectif du projet 3DTVs est de concevoir des descriptions évolutives des contenus 3DTV, leur indexation, leur recherche ainsi que la conception de méthodes de navigation sur toutes des plateformes ouvertes, en utilisant des interfaces utilisateurs mobiles et fixes et d'intégrer de telles fonctionnalités 3D dans les archives de contenus audiovisuels. L’analyse multi canal audio 3D vise la détection d’événements audio basés sur des techniques de fusion combinant l'analyse audio effectuée dans les canaux individuels ainsi que des algorithmes de localisation et de séparation de source pour la détection des mouvements des sources audio. Les résultats seront utilisés pour l’indexation 3D audio et cross modale ainsi que pour la recherche. L’indexation audio/ video multimodale 3D des contenus audiovisuels s’appuiera sur les résultats de l’indexation vidéo 3D et audio 3D. Des méthodes de description de contenu et de recherche seront développées afin de permettre des réponses rapides aux recherches sémantiques.

 

Rôle de l’Ircam dans le projet Quaero

Dans le projet 3DTVs, l'Ircam est en charge de la recherche et du développement des technologies relatives à la

-        Détection des événements audio en utilisant les scènes audio multi canal

-        Segmentation en tours de parole

-        Segmentation de scène de films  partir de l’audio

-        Séparation, localisation et identification des sources sonores

 

Missions

Le Chercheur sera en charge du développement des technologies liées à:

-        Détection des événements audio en utilisant les scènes audio multi canal

Le chercheur collaborera également avec l'équipe de développement et participera aux activités du projet (évaluation, réunions, spécification).

 

Profil recherché

  • Grande expérience de recherche en indexation audio (modélisation statistique, sélection automatique de descripteurs …) ; grande connaissance en techniques de fusion tardives
  • Grande expérience de recherche en traitement du signal (analyse spectrale, extraction de descripteurs audio, estimation de paramètres) 
  • Très bonne connaissance du langage Matlab

 

  • Connaissance des environnements Linux, Windows et Mac OS-X.
  • Connaissance des langages C et C++
  • Haute productivité, travail méthodique, excellent style de programmation, bonne communication rigueur

 

Salaire

Selon formation et expérience professionnelle

 

Candidatures

Prière d'envoyer une lettre de motivation et un CV détaillant le niveau d'expérience/expertise dans les domaines mentionnés ci-dessus (ainsi que tout autre information pertinente) à peeters_a_t_ircam dot fr avec copie à

vinet_a_t_ircam dot fr, roebel_at_ircam_dot_fr

Back  Top

6-7(2012-01-10) TENURE TRACK OR TENURED POSITION IN NATURAL LANGUAGE PROCESSING, INSTITUTE OF INFORMATION SCIENCE, ACADEMIA SINICA (TAIWAN)

TENURE TRACK OR TENURED POSITION IN NATURAL LANGUAGE PROCESSING, INSTITUTE OF INFORMATION SCIENCE, ACADEMIA SINICA (TAIWAN)

 

The Institute of Information Science (http://www.iis.sinica.edu.tw), Academia Sinica (http://www.sinica.edu.tw), Taiwan, R.O.C. is recruiting junior and senior research fellows (equivalent to the rank from assistant professors, associate professors, to full professors) on the areas of natural language processing (NLP), in particular, Chinese language processing. All candidates for the position should have a Ph.D. degree in computer science with specialties in NLP, machine learning or semantic processing and with a good research background as well as publication record.


Academia Sinica is the highest national academic research institution in Taiwan. Led by Dr. Chi-Huey Wong, Academia Sinica conducts research on a broad spectrum of subjects in science and humanities. The Institute of Information Science (IIS) is committed to high quality basic research in computer and information science and engineering. We have a strong track record of Chinese NLP, and are considered by many to be a major base for research on traditional Chinese. In addition to research funding supported by Academia Sinica, external funding through government agencies and industry-sponsored institutions is also available.


Full-time research fellows in the Institute have no teaching obligations, and are free to pursue their own research directions. The areas of our current research include Algorithms and Computation Theory, Parallel Computing, Language and Knowledge Processing, Computer Vision and Multimedia Technologies, Bioinformatics, Ubiquitous Computing, Operating Systems and Networking, and other Internet related research. We will also launch a major collaborative research project on Web intelligence and social network.


Salary is based on individual qualifications. Additional salary support from the Advancement of Outstanding Scholarship is available for applicants with exceptional qualifications. Senior candidates must demonstrate strong leadership, and have an international reputation evidenced by publications, patents, industrial practical experience, or other academic and scholarly achievements.


Application instruction is available at http://www.iis.sinica.edu.tw/page/recruitment/ResearchFellows.html.

 

For additional information, please contact Deputy Director of the Institute, Dr. Hsin-Min Wang (whm@iis.sinica.edu.tw).

Back  Top

6-8(2012-01-10) Research Assistant or Postdoctoral Position-Machine Translation at KIT Karlsruhe RFA

Karlsruhe Institute of Technology (KIT) is the result of the merger of the University of

Karlsruhe and the Research Center, Karlsruhe. It is a unique institution in Germany, which

combines the mission of a university with that of a large-scale research center of the

Helmholtz Association. With 8000 employees and an annual budget of EUR 650 million,

KIT is one of the largest research and education institutions worldwide.

At the

Institute for Anthropomatics, Lehrstuhl Prof. Waibel several positions are to

be filled:

Research Assistant or Postdoctoral Position

in the field of

Machine Translation

with a salary according to

TV-L E13.

The responsibilities include basic research in the area of statistical machine translation, as

well as participation in

application targeted research projects in the area of multimodal

dialog centered Human-machine interaction. The candidate is expected to participate in

the design, development, and exploration of innovative methods, algorithms and

techniques to be successfully integrated in humanoid robots.

Within the framework of the

international center for advanced communication

technology (interACT)

, our center operates in three locations, Karlsruhe Institute of

Technology, Germany and at Carnegie Mellon University, Pittsburgh and Silicon Valley,

USA. This offers a first class research environment, working in advanced research

projects at two of the top computer science research universities in the US and Europe.

International joint and collaborative research

at and between our centers is common

and encouraged, and offers great international exposure and activity.

The focus of our research is to develop better communication and computing services that

take advantage of an understanding of the Human context and activities. One of our

current research interests is the field of simultaneous translation of lectures, video

conferencing, and monologues in general.

We seek qualified candidates with a M.S. or a Phd degree in Computer Science, Electrical

Engineering, or related fields. The position offers the opportunity to work toward a Ph.D.

degree or toward the advancement of an academic career. A record of academic

achievements, relevant experience and knowledge in relevant areas, and excellent

programming skills are expected.

KIT is pursuing a gender equality policy. Women are therefore particularly encouraged to

apply. If equally qualified, handicapped applicants will be preferred.

Questions may be directed to E-Mail:

sebastian.stueker@kit.edu,

http://isl.anthropomatik.kit.edu

The application should be sent to Professor Alex Waibel, Institut für Anthropomatik,

Karlsruhe Institute of Technology, Postfach 6980, 76049 Karlsruhe, Germany

KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

Back  Top

6-9(2012-01-10) Research Assistant or Postdoctoral Position-Automatic Speech Recognition at KIT Karlsruhe, RFA

Karlsruhe Institute of Technology (KIT) is the result of the merger of the University of

Karlsruhe and the Research Center, Karlsruhe. It is a unique institution in Germany, which

combines the mission of a university with that of a large-scale research center of the

Helmholtz Association. With 8000 employees and an annual budget of EUR 650 million,

KIT is one of the largest research and education institutions worldwide.

At the

Institute for Anthropomatics, Lehrstuhl Prof. Waibel several positions are to

be filled:

Research Assistant or Postdoctoral Position

in the field of

Automatic Speech Recognition

with a salary according to

TV-L E13.

The responsibilities include basic research in the area of

automatic speech recognition,

as well as participation in

application targeted research projects in the area of

multimodal dialog centered Human-machine interaction. The candidate is expected to

contribute to the state-of-the art of modern recognition systems. He/she will participate in

the design, development, and exploration of innovative methods, algorithms and

techniques towards

acoustic and language modeling leading to improvements in

recognizer performance.

Within the framework of the

international center for advanced communication

technology (interACT)

, our center operates in three locations, Karlsruhe Institute of

Technology, Germany and at Carnegie Mellon University, Pittsburgh and Silicon Valley,

USA. This offers a first class research environment, working in advanced research

projects at two of the top computer science research universities in the US and Europe.

International joint and collaborative research

at and between our centers is common

and encouraged, and offers great international exposure and activity.

Our current research interests include the simultaneous translation of lectures, video

conferencing and general monologues, key-word spotting in a multitude of languages, and

multilingual speech recognition and speech recognition for under-resourced languages.

We seek qualified candidates with a M.S. or a Phd degree in Computer Science, Electrical

Engineering, or related fields. The position offers the opportunity to work toward a Ph.D.

degree or toward the advancement of an academic career. A record of academic

achievements, relevant experience and knowledge in relevant areas, and excellent

programming skills are expected.

KIT is pursuing a gender equality policy. Women are therefore particularly encouraged to

apply. If equally qualified, handicapped applicants will be preferred.

Questions may be directed to E-Mail:

sebastian.stueker@kit.edu,

http://isl.anthropomatik.kit.edu

The application should be sent to Professor Alex Waibel, Institut für Anthropomatik,

Karlsruhe Institute of Technology, Postfach 6980, 76049 Karlsruhe, Germany

KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

Back  Top

6-10(2012-01-10) Research Assistant or Postdoctoral Position- Dialog Modeling at KIT Karlsruhe, RFA

Karlsruhe Institute of Technology (KIT) is the result of the merger of the University of

Karlsruhe and the Research Center, Karlsruhe. It is a unique institution in Germany, which

combines the mission of a university with that of a large-scale research center of the

Helmholtz Association. With 8000 employees and an annual budget of EUR 650 million,

KIT is one of the largest research and education institutions worldwide.

At the

Institute for Anthropomatics, Lehrstuhl Prof. Waibel several positions are to

be filled:

Research Assistant or Postdoctoral Position

in the field of

Dialog Modeling

with a salary according to

TV-L E13.

The responsibilities include basic research in the area of dialog systems, as well as

participation in

application targeted research projects in the area of multimodal dialog

centered Human-machine interaction. The candidate is expected to participate in the

design, development, and exploration of innovative methods, algorithms and techniques

to be successfully integrated in humanoid robots.

Within the framework of the

international center for advanced communication

technology (interACT)

, our center operates in three locations, Karlsruhe Institute of

Technology, Germany and at Carnegie Mellon University, Pittsburgh and Silicon Valley,

USA. This offers a first class research environment, working in advanced research

projects at two of the top computer science research universities in the US and Europe.

International joint and collaborative research

at and between our centers is common

and encouraged, and offers great international exposure and activity.

The focus of our research is to develop better communication and computing services that

take advantage of an understanding of the Human context and activities. One of our

current research interests is the field of multimodal perception and dialog centered

Human-machine interaction.

We seek qualified candidates with a M.S. or a Phd degree in Computer Science, Electrical

Engineering, or related fields. The position offers the opportunity to work toward a Ph.D.

degree or toward the advancement of an academic career. A record of academic

achievements, relevant experience and knowledge in relevant areas, and excellent

programming skills are expected.

KIT is pursuing a gender equality policy. Women are therefore particularly encouraged to

apply. If equally qualified, handicapped applicants will be preferred.

Questions may be directed to E-Mail:

sebastian.stueker@kit.edu,

http://isl.anthropomatik.kit.edu

The application should be sent to

Professor Alex Waibel, Institut für Anthropomatik,

Karlsruhe Institute of Technology, Postfach 6980, 76049 Karlsruhe, Germany

KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

Back  Top

6-11(2012-01-10) Immediate postdoctoral position openings USC Signal Analysis and Interpretation Lab (SAIL)

Immediate postdoctoral position openings
USC Signal Analysis and Interpretation Lab (SAIL)
http://sail.usc.edu
University of Southern California
The USC Signal Analysis and Interpretation Lab has new
postdoctoral scholar (and PhD student and positions) available.
SAIL conducts fundamental and applied research that bridges human
communication science and engineering in a collaborative, richly diverse
interdisciplinary environment and has several projects underway both on
core technologues and in applications targeting education and human health and well being.

Requirements include  strong fundamentals and interests in one or more of the following: Signal Processing, Computing/Machine Learning, Speech
science, Human communication, Natural Language Processing, Image/Vision, and Affective Computing. Programming skills.
Additionally a desire to work on real problems and data, to take leadership and be proactive, ability to work in a collaborative manner and contribute to the lab's mentoring culture, and a thirst for learning.

Back  Top

6-12(2012-01-12) Ingénieur : Site web et éducation et rééducation de la prononciation du français par des apprenants

 Poste d'Ingénieur Site web et éducation et rééducation de la prononciation du français par des apprenants pour projet LabEx, EFL (PRES Sorbonne Paris Cité) Opération PPC10 (valorisation) du labex EFL Le laboratoire d’excellence « Fondements empiriques de la linguistique : données, méthodes, modèles » recherche un Developpeur/programmeur (Web) pour une mission de création. Contexte Le projet EFL est un « laboratoire d’excellence » créé en 2011 pour 10 ans (2011-2021) par le Ministère de la Recherche. Il regroupe 13 équipes de recherche de 5 établissements (Paris 3, Paris 5, Paris 7, Paris 13 et INALCO) appartenant au PRES Sorbonne Paris-Cité, autour de la linguistique et des disciplines connexes dont l’objet d’étude est le langage. Il représente 150 chercheurs et une centaine de doctorants. Il est organisé autour de 7 axes scientifiques (géré chacun par un établissement partenaire) et de volets formation et valorisation. Le développeur/programmeur (web) aura pour tâche de créer un site internet/logiciel dans le cadre d'un projet lié à la conception et développement Web pour l’enseignement à distance. Le poste est disponible pour 6 mois, temps complet. Lieu d’affectation : Laboratoire de Phonétique et Phonologie, ILPGA. 19 rue des Bernardins, 75005 Paris. Rémunération : Ingénieur, selon expérience. Le poste Le CDD concernera un spécialiste de l’enseignement à distance, si possible dans le domaine de l’enseignement des langues, pour étudier la faisabilité et mettre en place un prototype. Tous les membres du Labex seront invités à tester ce prototype. L’ingénieur aura à sa charge deux tâches principales: Évaluer les solutions existantes et tester la faisabilité du projet, faire un rapport sur les solutions disponibles. Élaborer un site web de test pour la mise en place de ces cours en ligne (avec en partie, contact direct via internet, avec les futurs enseignants, tous phonéticiens ou orthophonistes diplômés) , avec utilisation à distance de certains capteurs spécialisés (détection de nasalité, d’aspiration, de qualité de voix, etc.) et diagnostics en ligne et en différé des problèmes à résoudre. L’ingénieur choisit sera amené à manipuler en plus de fichiers avec des formats courants, des fichiers avec différents formats : vidéos, acoustiques, textuels. Il lui sera demandé de convertir ces fichiers dans des formats génériques permettant un stockage pérenne via internet. La personne recrutée doit une bonne connaissance de la programmation web (MySQL, PHP), incluant le développement d’interface (IHM), une bonne connaissance des plateformes pédagogiques. Profil recherché - Expérience de 1 à 5 ans dans la conception de site web - Une première expérience dans la construction d'un site pour les langues, la linguistique est recommandée - Des connaissances des formats utilisés en parole pourraient être un plus. Compétences - Pratique de l'anglais et du français courante (oral et écrit) - Rapidité, efficacité, adaptation au monde de la recherche et de l'université Pour candidater envoyer : - un CV - une lettre de motivation - des lettres de recommandation sont souhaitées La candidature doit être envoyée en un seul fichier au format .pdf ou .doc avec comme nom de fichier prénomnom_cv Date limite de candidature : 1 Mars 2012 Début du contrat : dès disponibilité à partir du 1er Mars 2012 A : jacqueline.vaissiere@univ-paris3.fr

Back  Top

6-13(2012-01-20) Scientist in Learning, data mining and interaction at the University Grenoble
Profil : Apprentissage, Fouille de Données, Interaction
Concours : 46-1
Etat du poste : susceptible d'être vacant (date de vacance : 01/09/2012)
 
Profil complet accessible sur le sur le site du LIG : http://liglab.imag.fr/
 
Descriptif recherche :
 
Le laboratoire encourage des candidatures d’excellence permettant de renforcer son encadrement de recherche et participer à la dynamique du laboratoire dans le domaine de l’accès à l’information. Le candidat devra développer un projet scientifique sur une des thématiques suivantes :
 
Apprentissage : ingénierie des connaissances, apprentissage automatique, apprentissage humain
Fouille de données : fouille de données massives et complexes, fouille de données textuelles, recherche d’information multimédia
Interaction : méthodes, modèles et outils pour la conception d’interfaces homme-machine innovantes, techniques d’interaction, dialogue homme-machine, multimodalité, multilinguisme, systèmes perceptifs et adaptatifs, mondes virtuels

Le candidat devra s’intégrer dans l’une des équipes du LIG et élaborer un projet de recherche sur l’un des aspects du profil du poste. 

 
 
Back  Top

6-14(2012-01-20) Proposition de stage (internship) M2R (fouille de données et parole) at LIMSI
Proposition de stage M2R (fouille de données et parole)

Contact : Sophie Rosset (rosset@limsi.fr)

Lieu : LIMSI - CNRS, bat 508, BP 133, 91403 Orsay Cedex, groupe Traitement du Langage Parlé

Titre : Fouilles de données appliquées à des données audio : erreurs et entités nommées

Contexte 
Ce stage de M2 s'inscrit dans les domaines du Traitement Automatique des
Langues (TAL) et de la Parole (TAP) ainsi que celui de la fouille de
données. Nous nous intéressons plus particulièrement à la
caractérisation des erreurs d'un système de transcription de la parole
dont les sorties sont utilisées par un système de reconnaissance
d'Entités Nommées. Il s'agit de mettre en place une méthode permettant
de classifier et de caractériser les erreurs de plusieurs systèmes de
transcription de la parole en quantifiant leur impact sur un (ou
plusieurs) systèmes de reconnaissance d'Entités Nommées. Cette méthode
devra être généralisable à d'autres types d'applications comme la
traduction automatique ou un système de dialogue homme/machine.

Sujet
Les systèmes de reconnaissance de la parole sont évalués en utilisant
le taux d'erreurs de mots (WER ou Word Error Rate) qui considère
chaque mot comme ayant une importance égale. Or on constate que cette
métrique d'évaluation ne permet de mesurer la difficulté qu'aura un
système d'extractions d'information. Autrement dit, si on applique un
même système de détection d'entités nommées sur deux sorties de
système de reconnaissance ayant pourtant un même WER, le taux d'erreur
du système de détection d'entités nommées sera différent.

L'objectif de ce stage est donc de caractériser les erreurs d'un
système de reconnaissance de la parole en fonction d'une tâche de
détection d'entités nommées et de l'impact qu'ont ces erreurs.

Nous nous focaliserons au cours de ce stage sur la parole
journalistique en utilisant les données d'une campagne d'évaluation
récente. Cette campagne a mis en évidence une très grosse perte de
résultats des systèmes de reconnaissance d'entités nommées sur des
sorties de système de reconnaissance automatique de la parole (30% de
perte) [1].

Les sorties de trois systèmes de transcription seront étudiées. Leur
impact devra être étudié sur au moins un système d'identification
d'Entités Nommées également fourni par le LIMSI. Ces systèmes sont à
l'état de l'art et pourront donc servir de première référence.

[1]  Olivier Galibert; Sophie Rosset; Cyril Grouin; Pierre Zweigenbaum; Ludovic Quintard. Structured and Extended Named Entity Evaluation in Automatic Speech Transcriptions. IJCNLP 2011 (http://aclweb.org/anthology-new/I/I11/I11-1058.pdf)

Informations pratiques

Le stage, d'une durée de 5 mois, se déroulera au LIMSI, dans le groupe
Traitement du Langage Parlé et le stagiaire recevra une gratification
(de l'ordre de 480 euros/mois).
Back  Top

6-15(2012-01-20) Professeur en Linguistique générale et linguistique française. Linguistique de terrain, Université de Lyon France

Dans le cadre de la campagne de recrutement d'enseignants-chercheurs 2012, un poste de professeur en 7ème section CNU sera ouvert à l'Université Lumière Lyon 2, avec affectation au département de Sciences du langage et rattachement au laboratoire Dynamique Du Langage
L'intitulé de ce poste est  'Linguistique générale et linguistique française. Linguistique de terrain'.
Il correspond à un profil hybride en termes de profil d'enseignement et de profil de recherche (voir ci-dessous). La date d'ouverture des candidatures sur le portail Galaxie est le 23 février 2012 et la date limite des candidatures est fixée au 27 mars 2012 (16h pour l'application en ligne, minuit pour l'envoi des documents papier).

N'hésitez-pas à me contacter pour toute demande d'information : Francois.Pellegrino@univ-lyon2.fr

Éléments du profil RECHERCHE :
Le Professeur recruté sera rattaché au laboratoire Dynamique Du Langage (UMR 5596). Il/Elle devra avoir démontré sa capacité à mener des recherches de haut niveau dans un contexte scientifique internationalisé.  Ses activités s’intègreront à l’axe « Description, Typologie et Variation » et éventuellement au thème transversal « Langues En Danger : Terrain, Description, Valorisation ». En cohérence avec les thèmes scientifiques principaux de l’unité, une attention particulière – quoique non exclusive – sera portée aux candidats menant des activités de recherche de terrain visant à la production de descriptions et de grammaires de langues.

Éléments du profil ENSEIGNEMENT :
Le poste est ouvert à tout candidat qui témoigne d’une activité scientifique soutenue en matière d’analyse linguistique et qui conjugue si possible une compétence en linguistique générale et en linguistique française. La polyvalence est privilégiée, tant sur le plan des domaines de recherche que des méthodologies (linguistique de terrain et / ou de corpus, notamment).

Merci par avance de diffuser cette annonce auprès des collègues potentiellement intéressé-e-s.
 François Pellegrino

-- 
 Directeur du Laboratoire Dynamique Du Langage
 UMR 5596 CNRS - Université Lumière Lyon 2
 Tel. (+33/0)4 72 72 64 94
 DDL - ISH
 14, av. Berthelot
 69363 Lyon Cedex 7 / FranceD
Back  Top

6-16(2012-02-02) Research and Development Opportunities for Next Generation Technology at Microsoft

 

Research and Development Opportunities for Next Generation Technology at Microsoft

Do you want to impact billions of people all over the world with speech technology that you create?

 

We are looking for PhD level scientists and senior scientists, who will work on research problems in spoken language understanding, statistical dialog modeling, natural language generation, machine learning, statistical language modeling, and acoustic modeling.

 

Microsoft is all-in on the Natural User Interface to bring computing to larger audiences in more applications. To drive this mission we are bringing together scientists and engineers in the areas of speech recognition, natural language understanding, dialog modeling, machine learning and synthesis to develop and deliver robust, natural and scalable solutions across a rich set of scenarios and languages.

 

Join the excitement to be part of the newly formed team of scientists within Microsoft and to impact the lives of billions of people all over the world. We’re talking about Bing, Windows, XBOX, Mobile, Exchange Server and Tellme, just to name a few. Microsoft is dedicated to improving everyday life using speech. And not just in a few countries - but around the world.

 

How to apply:

MICROSOFT CORPORATION

Attention: Recruiting,

One Microsoft Way, STE 303, Redmond WA 98052-8303

 

Or email resume to: Tom Swanson toswanso@microsoft.com

Please reference Speech

in the subject line.

Back  Top

6-17(2012-02-05) NSF-Supported Summer Research for Undergraduates

NSF-Supported Summer Research for Undergraduates
   
The Center for Language and Speech Processing at the Johns Hopkins University is seeking outstanding members of the current junior class to join a summer research workshop on language engineering from June 11 to August 7, 2012.

The 8-week workshop provides an intense intellectual environment.  Undergraduates work closely alongside more senior researchers as part of a multi-university research team, which has been assembled for the summer to attack some problem of current interest.  The teams and topics for summer 2012 are described here:

 http://www.clsp.jhu.edu/internships/

We hope that this stimulating and selective experience will encourage students to pursue graduate study in human language technology, as it has been doing for many years.
 
The summer workshop provides:

* An opportunity to explore an exciting new area of research
* A two-week tutorial on current speech and language technology
* Mentoring by an experienced researcher
* Participation in project planning activities
* Use of a computing cluster and personal workstation
* A $5,000 stipend and $2,520 towards per diem expenses
* Private furnished accommodation for the duration of the workshop
* Travel expenses to and from the workshop venue

Applications should be received by WEDNESDAY, FEBRUARY 29, 2012, INCLUDING one letter from a faculty nominator.  Apply online here:

 http://www.clsp.jhu.edu/internships/
 
Applicants are evaluated only on relevant skills, employment experience, past academic record, and the strength of letters of recommendation.  No limitation is placed on the undergraduate major.  Women and minorities are encouraged to apply.

Back  Top

6-18(2012-02-02) Research and Development Opportunities for Next Generation Technology at Microsoft

Research and Development Opportunities for Next Generation Technology at Microsoft

Do you want to impact billions of people all over the world with speech technology that you create?

 

We are looking for PhD level scientists and senior scientists, who will work on research problems in spoken language understanding, statistical dialog modeling, natural language generation, machine learning, statistical language modeling, and acoustic modeling.

 

Microsoft is all-in on the Natural User Interface to bring computing to larger audiences in more applications. To drive this mission we are bringing together scientists and engineers in the areas of speech recognition, natural language understanding, dialog modeling, machine learning and synthesis to develop and deliver robust, natural and scalable solutions across a rich set of scenarios and languages.

 

Join the excitement to be part of the newly formed team of scientists within Microsoft and to impact the lives of billions of people all over the world. We’re talking about Bing, Windows, XBOX, Mobile, Exchange Server and Tellme, just to name a few. Microsoft is dedicated to improving everyday life using speech. And not just in a few countries - but around the world.

 

How to apply:

MICROSOFT CORPORATION

Attention: Recruiting,

One Microsoft Way, STE 303, Redmond WA 98052-8303

 

Or email resume to: Tom Swanson toswanso@microsoft.com

Please reference Speech

in the subject line.

 

 

Back  Top

6-19(2012-02-15) Maître de Conférences contractuel, ESPCI ParisTech
Un poste de Maître de Conférences contractuel (1 an renouvelable) est disponible au laboratoire SIGMA (SIGnaux, Modèles, Apprentissage statistique)  de l'ESPCI ParisTech à partir d'avril 2012.
Enseignement : électronique et automatique (niveau 1ère année).
Recherche : apprentissage statistique appliqué à la conception d'une interface de parole silencieuse ou à la prédiction des propriétés/activités de molécules.
Rémunération : 33800€ par an + heures supplémentaires

Description détaillée du poste : http://www.neurones.espci.fr
Back  Top

6-20(2012-02-10) Poste MCF Informatique [27MCF519 Paris-Sorbonne]

Le poste requiert une double compétence : un haut niveau d’excellence scientifique en Informatique et en applications de l’Informatique aux sciences humaines et sociales (notamment le traitement paralinguistique de la parole et du langage, la sociologie computationnelle, …). L’intérêt porté aux applications de la théorie informatique aux sciences humaines et sociales constitue une des spécificités de l’enseignement de l’Informatique à l’Université Paris-Sorbonne. Le candidat enseignera l’Informatique dans différentes formations de licence (LFTI) et de master (ILGII, IILGI). Il s’impliquera également dans l’encadrement de nouvelles licences bi-cursus (licence Sciences-Sciences du langage, …) en projet au sein du PRES Sorbonne Universités.

Le (la) MCF rejoindra le département de Mathématiques et d’Informatique appliquées aux Sciences de l’Homme (actuellement 6 MCF et 2 Pr) de l’Institut des Sciences Humaines Appliquées (ISHA) de l'Université Paris-Sorbonne. Ce département participe à des formations pluridisciplinaires de licence et de master. Il est également chargé de l’enseignement de l’informatique pour les étudiants de lettres et de sciences humaines. Le candidat sera rattaché à une des équipes de recherches de l’ISHA et devra présenter un programme de recherche qui s’insère dans les perspectives d’une de ces équipes.

Contact : Claude.Montacie@paris-sorbonne.fr
Laboratoires d'accueil : UMR 8598 (GEMASS), EA 4509 (STIH)

Back  Top

6-21(2012-02-15)PhDs at Tilburg Center for Cognition and Communication (TiCC) research program 'Language, Communication and Cognition' (LCC), The Netherlands

For the Tilburg Center for Cognition and Communication (TiCC) research program 'Language, Communication and Cognition' (LCC), we are looking for two new, enthusiastic and competent PhD colleagues.

 

If you are interested in one of these positions, you will need to identify a potential research topic related to one of the research themes of the LCC program. Current themes include:

 


- Social media and interpersonal communication.


- Professional communication (medical, business, etc.).


- Alignment and adaptation in communication.


- Social exclusion and other social aspects of interaction.


- Emotion and speech.


- Language acquisition and learning.


- Multimodality and communication.


- Language and speech production.


- Visual communication (diagrams, metaphors, etc.).


- Gesture and other forms of non-verbal behavior

 


For the positions we seek candidates with a background in a relevant discipline, including Psycholinguistics, Communication & Information Sciences, Linguistics, Cognitive Science, Psychology or some related area, with experience in doing experiments and analyzing data.

 

The PhD candidates have a good (research) master degree in one of the aforementioned areas, a strong interest in doing research, excellent writing skills and a good command of English. Developing and defending a research plan is part of the procedure.

 

Tilburg University is rated among the top Dutch employers, offering excellent terms of employment. The collective labour agreement of Tilburg University applies. The selected candidates will start with a contract for one year, concluded by an evaluation. Upon a positive outcome of the first-year evaluation, the candidate will be offered an employment contract for the remaining years. Candidates with a Research Master (MPhil) will be offered a 1+2 years-contract. Master students might be offered a 1+3 years-contract. It is also possible to work 80% instead of fulltime. The PhD candidates will be ranked in the Dutch university job ranking system (UFO) as a PhD-student (promovendus) with a starting salary of € 2.042,-- gross per month in the first year, up to € 2.612,-- in the fourth year (amounts fulltime). The selected candidate is expected to have written a PhD thesis by the end of the contract (which may be based on articles).

 

Research in the Department of Communication and Information Sciences is located in the Tilburg Center for Cognition and Communication (TiCC). TiCC consists of two research programs: Language, Communication and Cognition (LCC) and Creative Computing (CC). There is a strong emphasis on experimental research and interdisciplinary cooperation. More information about the research programs can be found at http://www.tilburguniversity.edu/research/institutes-and-research-groups/ticc/. There is a strong emphasis on experimental work and interdisciplinary cooperation. The department DCI is responsible for a flourishing academic programme Communication and Information Sciences (CIW), that annually attracts about 120 Bachelor students, 130 Pre-master and 200 Master students. The department is also co-responsible for the Research Master Language and Communication. More information about the DCI department can be found at www.tilburguniversity.nl/faculties/humanities/dci/.

 


For more information on the positions, please contact one of LCC program leaders prof.dr. Emiel Krahmer (E.J.Krahmer@uvt.nl, +311346630700) or prof.dr. Marc Swerts (M.G.J.Swerts@uvt.nl, +31134662922).

 

Applications should include.

 


- a cover letter.


- a Curriculum Vitae.


- a 2-page research proposal on a selected theme, plus names of potential supervisor and promotor.


- names of two references.

 

The only way to apply is via the online link at the bottom of this vacancy: 'apply direct'. If you receive this vacancy via eg. E-mail, please look at the vacancy located at: http://www.tilburguniversity.edu/about-tilburg-university/working-at/wp/. Applications should be sent before the application deadline of March 24, 2012. Interviews are expected to be held in April 2012. Starting dates are flexible, so applicants who expect to graduate in the summer of 2012 are also invited to apply.

Back  Top

6-22(2012-02-15) Maitre de Conférences, l'Université Sorbonne Nouvelle, Paris

Un poste de Maitre de Conférences est ouvert au recrutement pour la rentrée 2012 à l'Université Sorbonne Nouvelle Paris 3. Voici le descriptif ci-dessous. (plus de détails sur http://lpp.univ-paris3.fr/postes/offres.htm)


Enseignement :

L’enseignant(e) recruté(e) pour ce poste devra se consacrer à la formation en sciences phonétiques et initier les étudiants à des domaines traditionnels, mais aussi plus modernes au sein du cursus de Licence (http://www.ilpga.univ-paris3.fr/documents/descriptifs-licence.pdf) et de Master (http://www.ilpga.univ-paris3.fr/masters.html) de Sciences du Langage.
Le/la candidat(e) enseignera dans les domaines suivants : phonétique acoustique et articulatoire ; aspects perceptifs/cognitifs et communicationnels de la parole ; didactique de la phonétique ; prosodie ; phonétique comparée des langues ; phonétique clinique, etc. Il est également attendu que le/la candidat(e) puisse éventuellement enseigner dans d’autres domaines de la linguistique.

Composante / UFR / Lieu(x) d’exercice :
Université Sorbonne Nouvelle Paris 3
Institut de Linguistique et Phonétique Générales Appliquées
19, rue des Bernardins, 75005 Paris.


Recherche :

Le candidat sera accueilli par le Laboratoire de Phonétique et Phonologie (LPP, Unité Mixte de Recherche 7018), dont le rôle est d’assurer un enrichissement mutuel entre les sciences phonétiques et la phonologie, et l’application de cet enrichissement dans un grand nombre de domaines, tels que l’apprentissage d’une langue nouvelle, l’acquisition du langage, la remédiation, la typologie des langues, etc. La recherche du candidat s’effectuera en synergie avec celles des membres du LPP et contribuera ainsi au développement des thèmes de recherche du laboratoire. Il est également attendu que le candidat participe à la vie administrative du laboratoire.

Laboratoire de rattachement : Laboratoire de Phonétique et Phonologie (LPP), UMR 7018 (http://lpp.univ-paris3.fr/)

Contacts : Jacqueline Vaissière; Cédric Gendrot
Téléphone : 01 44 32 05 74
E-mail : jacqueline.vaissiere@univ-paris3.fr ; cedric.gendrot@univ-paris3.fr

Back  Top

6-23(2012-02-15) Postdoc at University of Trento, Italy - Machine Translation/Social Computing

Postdoc at University of Trento, Italy - Machine Translation/Social Computing 

The Signals and Interactive Systems Lab is looking for top-candidates to fill a postdoc position. 
Candidates with significant research experience in Statistical Machine Translation and at least 
one of the following topics are invited two apply:

- Spoken Dialog Systems
- Machine Learning
- Social Computing

The candidate will work with SIS lab members and European partners on
an upcoming research project addressing multilingual portability of spoken
conversational systems via machine translation and social computing.

The SIS Lab research is driven by interdisciplinary approach to the
analysis and interpretation of diverse signals ( e.g. speech, text,
biosignals, multimodal ) to support human-human and human-machine
interactive systems. The SIS Lab has state-of-the-art technology
infrastructure and collaborations with premiere national, international
research centers and industry research labs.

For more info SIS lab's research projects
visit the lab website: http://sisl.disi.unitn.it.

LANGUAGE

The official language of the Department and the lab is English.

SALARY

Postdoc salary are in the range of 30K-50K/year (gross) depending on
background and experience. Research fellowship may benefit from tax exemptions. 

DEADLINE:

May 1 , 2012 

HOW TO APPLY:

All applicants should have good very programming and math skills and used
to project team work. Interested applicants should send their

1) CV along with
2) their transcripts
3) have three reference letters sent to:

Prof. Dr.-Ing. Giuseppe Riccardi Email: sisl-jobs@disi.unitn.it 
For more info:
Lab web page: http://sisl.disi.unitn.it/
Department : http://disi.unitn.it/
Local Information: http://international.unitn.it/welcome-services/cost-living


University of Trento
The University of Trento (ww.unitn.it) is constantly ranked as premiere
Italian graduate and undergraduate university institution.
University of Trento is an equal opportunity employer.

Back  Top

6-24(2012-02-21) Two Postdoctoral Research Associates in Speech Technology,University of Edinburgh

Two Postdoctoral Research Associates in Speech Technology
Centre for Speech Technology Research
University of Edinburgh

The School of Informatics, University of Edinburgh invites applications for two Postdoctoral Research Associates in Speech Technology supported by the EPSRC Programme Grant Natural Speech Technology (NST, http://www.natural-speech-technology.org) and the EU Integrated Project EU-Bridge (http://www.eu-bridge.eu). NST is a collaboration between the Universities of Edinburgh, Cambridge, and Sheffield, whose objective is to significantly advance the state-of-the-art in speech technology by making it more natural, approaching human levels of reliability, adaptability and conversational richness. EU-Bridge is a large scale European collaboration which aims to develop automatic transcription and translation technology to permit the development of innovative multimedia captioning and translation services of audiovisual documents between European and non-European languages.  The researchers will be part of the Centre for Speech Technology Research (CSTR, http://www.cstr.ed.ac.uk).

The successful candidate should have or be near completion a PhD in speech processing, computer science, linguistics, engineering, mathematics, or a related discipline. They must have a background in statistical modelling and machine learning, research experience in speech recognition and/or speech synthesis, excellent programming skills, and research publications in international journals or conferences.

Experience in acoustic modelling or language modelling for speech recognition or speech synthesis is essential. A background in one or more of the following areas is also desirable: multilingual speech recognition; subspace Gaussian mixture models; adaptation techniques for acoustic or language modelling; experience of the design, construction and evaluation of speech recognition or speech synthesis systems; distant speech recognition; deep neural networks; and familiarity with software tools including HTK, Kaldi, HTS, or Festival.

Fixed Term: 30 months
Closing date for applications: 10 April 2012

For further details and to apply: http://www.jobs.ed.ac.uk/vacancies/index.cfm?fuseaction=vacancies.detail&vacancy_ref=3015397



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Back  Top

6-25(2012-02-21) Senior Researcher in Speech Technology, University of Edinburgh

Senior Researcher in Speech Technology
Centre for Speech Technology Research
University of Edinburgh

The School of Informatics, University of Edinburgh invites applications for the post of Senior Researcher in Speech Technology on the EPSRC programme grant Natural Speech Technology (NST, http://www.natural-speech-technology.org/). NST is a collaboration between the Universities of Edinburgh, Cambridge, and Sheffield, whose objective is to significantly advance the state-of-the-art in speech technology by making it more natural, approaching human levels of reliability, adaptability and conversational richness.

You should have a PhD (or equivalent experience) in speech processing, computer science, linguistics, engineering, mathematics, or a related discipline. You must have a background in statistical modelling and machine learning, research experience in speech recognition and/or speech synthesis, excellent programming skills, and a strong publications record in international journals and conferences. In addition, experience of project development and project leadership in a research context, together with excellent communication, presentation, and organisational skills are highly desirable.

You will be part of the Centre for Speech Technology Research (CSTR, http://www.cstr.ed.ac.uk), leading work on novel statistical modelling and machine learning for speech technology. We are interested in research on either large vocabulary conversational speech recognition, or on natural expressive speech synthesis. This will include design, implementation and evaluation of novel algorithms and models for speech recognition or speech synthesis, and testing of algorithms on 'real-world' data and tasks obtained from the NST user group. The work will involve close collaboration with other researchers across the three NST partners.

Fixed Term: 3 years
Closing date for applications: 10 April 2012

For further details and to apply: http://www.jobs.ed.ac.uk/vacancies/index.cfm?fuseaction=vacancies.detail&vacancy_ref=3015400




-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Back  Top

6-26(2012-02-16) 4 PhD positions in spoken dialogue systems research / Charles University in Prague

4 PhD positions in spoken dialogue systems research

Applications are invited for 4 PhD fellowships in the area of
statistical spoken dialogue systems funded by the Czech Government.
The students will join the Institute of Formal and Applied
Linguistics, Charles University in Prague, Czech Republic with an
anticipated start date of October 1st, 2012.

Topic description: In recent years, it has been suggested that
statistical approach to spoken dialogue system offer a framework to
naturally handle inherent uncertainty in the human speech. The two
main advantages of statistical methods are increased robustness in
noisy conditions and more natural behaviour learned from data.
However, the current methods need large corpora effectively preventing
these methods to be used for complex dialogue systems occurring in
real-life. The successful PhD candidates will investigate and
implement statistical models and methods with the aim of increasing
the efficiency of the learning process and reducing the need for large
corpora. The conducted research will cover areas of spoken language
understanding, dialogue management, and natural language processing.

Skills: Candidates should hold a master degree in a relevant area,
such as computer science, mathematics, engineering  or linguistics. A
strong mathematical background, excellent programming skills (e.g.
C/C++, Java, MATLAB, and various scripting languages under Linux
environment), aptitude for creative research and autonomy are
expected. Experience in machine learning, Bayesian methods, and
natural language processing is a plus.

The Institute of Formal and Applied Linguistics is a top-level
research group working in the area of computational linguistics and
natural language processing. During the fellowship, there will be good
opportunities to attend international conferences and workshops. The
formal applications should be submitted before June 1. Prospective
candidates are strongly encouraged to contact Dr Filip Jurcicek
(jurcicek@ufal.mff.cuni.cz) as soon as possible to obtain details
about the application process, the institute, and the research
opportunities.
Additional information is available here:
http://ufal.mff.cuni.cz/~jurcicek/jobs.

Back  Top

6-27(2012-02-27) PhD Student 'Increasing Robustness of Speech Recognition' Radbout University, Nijmegen, The Netherlands

 

PhD Student 'Increasing Robustness of Speech Recognition' (1,0 fte)

Faculty of Arts Vacancy number: 23.02.12 Closing date: 1 April 2012

Responsibilities

As a PhD student you will participate in the FP7 Marie Curie Initial Training Network Investigating Speech Processing In Realistic Environments (INSPIRE). This network provides research opportunities for 13 PhD students and 3 postdocs. You will become a member of an international team of researchers whose aim is to gain a better understanding of how listeners recognize speech, even under non-ideal circumstances. You will contribute to urgently needed solutions that help alleviate the serious communication problems that arise, especially for older and hearing-impaired persons, when different combinations of 'adverse' conditions affect the speech processing system. As a PhD student you will conduct research as part of a project called ’Increasing robustness of speech recognition by using multiple signal representations’. Speech processing in the human brain presumably involves competition between multiple, intermediate signal representations. The redundancy of these different representations are assumed to help improve the robustness of recognition. In some cases, however, they may lead to conflicting interpretations resulting in intelligibility problems. The goal of this PhD project is to investigate to what extent human recognition errors with regard to speech in ’adverse’ conditions can be replicated by machines that were trained on multiple input representations which are partially redundant.

Work environment

The project will be carried out at the Centre for Language and Speech Technology (CLST). CLST is a research unit within the Faculty of Arts of Radboud University Nijmegen and hosts a large international group of senior researchers and PhD students who conduct research at the frontier of science and develop innovative applications.

What we expect from you

You should: - hold a Master's degree in engineering or science; - have a strong background in machine learning (experience with dynamic Bayesian networks would be an advantage), mathematical and/or statistical modelling, and signal processing; - have excellent programming skills; - be willing to spend several months at the Technical University of Denmark. Prior exposure to courses in linguistics or speech- or hearing-related fields would be an advantage. Furthermore, you should comply with the rules set forward by the FP7 Marie Curie ITNs, i.e. you should: - not have resided of performed your main research activity in the Netherlands for more than 12 months in the last three years; - be willing to work in at least one other country in the INSPIRE network; - have less than 4 years of research experience since you obtained your Master’s degree, and not hold a PhD.

What we have to offer

We offer you: - employment: 1,0 fte; - in addition to the salary: an 8% holiday allowance and an 8.3% end-of-year bonus; - the starting salary is €2,042 per month on a full-time basis; the salary will increase to €2,492 per month in the third year; - in addition to the salary, you will receive travel and training allowances on the basis of generous Marie

Curie ITN provisions; - duration of the contract: 18 months with the possibility of extension by another 18 months.

Are you interested in our excellent employment conditions

(http://www.ru.nl/newstaff/working_at_radboud/conditions_of/)?

The Radboud University is an equal opportunity employer. Female researchers are strongly encouraged to apply for this vacancy.

Would you like to know more?

Further information on: Investigating Speech Processing In Realistic Environments (http://www.ru.nl/clst/projects/speech/inspire/) Dr. Bert Cranen, assistant professor Speech science Telephone: +31 24 3612904 E-mail: B.Cranen@let.ru.nl

Applications

Are you interested? Please include with your application: - a CV; - a 2-page description of your research interests explaining why the INSPIRE goals appeal to you, how the INSPIRE team may benefit from your participation, and your career perspectives as expected from INSPIRE; - university transcripts; - names and email addresses of two potential referees (or alternatively letters of recommendation). It is Radboud University Nijmegen's policy to only accept applications by e-mail. Please send your application, stating vacancy number 23.02.12, to vacatures@let.ru.nl, for the attention of drs. M.J.M. van Nijnatten, before 1 April 2012.

Back  Top

6-28(2012-03-03) 3 Post-doctoral positions at the Bruno Kessler Foundation, Center for Information Technology, Trento Italy

 

3 Post-doctoral positions available in the 'Human Language Technologies - HLT' Research Unit at the Bruno Kessler Foundation, Center for Information Technology.

Workplace description:

The Human Language Technology is a multi-disciplinary research unit that addresses the automatic processing of human language for a range of tasks. In particular, the research unit focuses on: automatic speech recognition, machine translation and content processing.

The HLT unit has been developing state-of-the-art technology in all the main research areas it operates in. The group has consistently performed well in several international evaluations, and is currently engaged in international projects for open source software development (e.g. the Moses platform for statistical machine translation). The unit also provides technological support and high-level services in order to optimize the internal research activities, namely a shared and efficient computing environment, software tools, up to the creation and management of large scale linguistic resources.

The HLT group is part of the larger network of research labs focusing on Human Language Technologies and related domains in the Trento region, that is quickly becoming one of the areas with the highest concentration of researchers in HLT and related fields anywhere in Europe.

More information about the HLT Unit is available at http://hlt.fbk.eu

The HLT Research Unit, is looking for 3 candidates to carry out research activities in the field of

Textual Inferences, Machine Translation and Speech Recognition. Each research position will be funded through the following European research projects:

MateCat: http://www.matecat.com

EU-Bridge: http://www.eu-bridge.eu

EXCITEMENT: website in progress

Open positions:

A Postdoctoral position in Textual Inferences

(Ref.Code HLT_PostDoc2012_TI)

The candidate is expected to carry out research activities in the context of the EU-funded project EXCITEMENT on multilingual semantic processing. The goal of the EXCITEMENT project is to develop generic semantic 'engines' or platforms for robust textual inference that are applicable across languages and linguistic frameworks. These inference platforms will be leveraged for unsupervised text exploration on customer interaction data. Concrete systems will be developed

for English, German, and Italian. Project partners are Bar-Ilan University, DFKI Saarbrücken, University of Heidelberg, Almawave S.r.l, NICE Systems, and OMQ GmbH.

The selected candidate will join the FBK research group with the aim of advancing the state of the art on component-based textual entailment.

A Postdoctoral position in Machine Translation

(Ref.Code HLT_PostDoc2012_MT)

The candidate is expected to contribute original research results inside leading edge international projects. The aim is to advance the state of the art in the integration of statistical MT in computer assisted translation and in adaptive MT, by drawing ideas and contributions from different areas, such as machine learning, statistical language processing, high performance computing, etc.

A Postdoctoral position in Speech Recognition

(Ref.Code HLT_PostDoc2012_SR)

The candidate is expected to contribute original research results inside a leading edge international project. The aim is to advance the state-of-the-art in multilingual speech processing by improving acoustic modelling, language modelling, and adaptation to different domains, conditions and genres. The contribution will be evaluated on application scenarios that include both efficient annotation of audiovisual archives and live processing of audio streams.

Job requirements:

Applicants should have a PhD degree related to any of the specific research areas mentioned (computational linguistics, speech processing or related fields)

 Experience in statistical modelling, speech processing or machine learning (preferable on approaches applied to NLP tasks)

 Experience in distributed software development (open source)

 Skills in experimental work and development of algorithms

 Ability to work and deliver in funded research projects

 Oral and written proficiency in English

In adherence to FBK's policy to promote equal opportunity and gender balance, in case of equal applications, female candidates will be given preference.

Employment:

Contract type: Full time, 30-month contract (may be extended up to 6 months).

Number of positions: 3

Gross salary: from 37,500 to 41,500 €per year (depending on the candidate’s experience)

Benefits: 28 vacation days per year, flexi-time, company subsidized cafeteria or meal vouchers, internal car park, welcome office support for visa formalities, accommodation, social security, etc., reductions on bank accounts, public transportation, sport, accommodation and language courses fees.

Start date: Spring 2012

Location: Povo, Trento (Italy)

Application process:

To apply online, please send your detailed CV (.pdf format) including a list of publications, a statement of research interests and contact information for at least 2 references. Please include in your CV your authorization for the handling of the your personal information as per the Personal data Protection Code, Legislative Decree no. 196/2003 June 2003.

Applications must be sent to jobs@fbk.eu

Emails should have the reference code related to the position of interest (

HLT_PostDoc2012_TI, HLT_PostDoc2012_MT or HLT_PostDoc2012_SR)

Application deadline: 9 April 2012

Short-listed candidates will be contacted for an interview. Non-selected applicants will be notified of their exclusion at the end of the selection process.

Please note that FBK may contact short-listed candidates who were not selected for the current openings within a period of 6 months for any selection process for similar positions.

For transparency purposes, the name of the selected candidate, upon his/her acceptance of the position, will be published on the FBK website at the bottom of the selection notice

Back  Top

6-29(2012-03-08) Post-docs at the Speech Processing and Transmission Lab ,Universidad de Chile,Santiago,Chile

The Speech Processing and Transmission Lab (LPTV, Laboratorio de Procesamiento y Transmisión de Voz) at Universidad de Chile,Santiago,Chile, is looking for post-doc researchers in the following fields:

 

Robust speech recognition

Robust speaker verification

Second language learning assessment

 

 

The grants are funded by Conicyt (Chilean funding Agency):  http://www.conicyt.cl

 

The applicant are required to present a brief research proposal prepared in collaboration with the director of the LPTV. For further information, contact:

 

 

Néstor Becerra Yoma, Ph.D.

Professor

Speech Processing and Transmission Laboratory

Department of  Electrical  Engineering

Universidad de Chile

Av. Tupper 2007,POBox412-3

Santiago,Chile

 

Tel. +56 2 978 4205

Fax. +56 2 695 3881

E-mail: nbecerra@ing.uchile.cl

http://www.cec.uchile.cl/~labptvoz/

Back  Top

6-30(2012-03-10) Senior Researcher/Research Associate in Statistical Dialogue Systems at Cambridge UK
Senior Researcher/Research Associate in Statistical Dialogue Systems
 
Applications are invited at either the Senior Research Associate or Research Associate
level to work on an EU-funded project called Parlance which aims to build mobile voice-
driven systems for interactive hyper-local search.
 
Candidates should have a PhD or comparable research experience in spoken dialogue 
systems and noise robust automatic speech recognition and understanding. Good 
programming skills are essential and familiarity with HTK would be an advantage. 
Appointment at the senior level will require at least 3 years post-doctoral experience 
and evidence of independent standing.  Salary range is from £27578 to £46846.
 
This is an exciting opportunity to join one of the leading groups in statistical speech and 
language processing. Cambridge provides excellent research facilities and there are
extensive opportunities for collaboration, visits and attending conferences.
 
Contact Prof Steve Young (sjy@eng.cam.ac.uk) for further information.
 
Application details can be found at: http://www.jobs.cam.ac.uk/job/-14472
 
 
 
 
Back  Top

6-31(2012-03-12) Postdoc position: Acoustic to articulatory mapping of fricative sounds LORIA Nancy France

Postdoc position:     Acoustic to articulatory mapping of fricative sounds

 

 

15 months, start between September and December 2012 at LORIA (Nancy, France).

 

Contact : Yves.Laprie@loria.fr

 

Context

This subject deals with acoustic to articulatory mapping [Maeda et al. 2006], i.e. the recovery of the vocal tract shape from the speech signal possibly supplemented by images of the speaker’s face. This is one of the great challenges in the domain of automatic speech processing which did not receive satisfactory answer yet. The development of efficient algorithms would open new directions of research in the domain of second language learning, language acquisition and automatic speech recognition.

 

The objective is to develop inversion algorithms for fricative sounds. Indeed, there exist now numerical simulation models for fricatives. Their acoustics and dynamics are better known than those of stops and it will be the first category of sounds to be inverted after vowels for which the Speech group has already developed efficient algorithms.

The production of fricatives differs from that of vowels about two points:

  • The vocal tract is not excited by the vibration of vocal cords located at larynx but by a noise. This noise originates in the turbulent air flow downstream the constriction formed by the tongue and the palate.
  • Only the cavity downstream the constriction is excited by the source.

 

The approach proposed is analysis-by-synthesis. This means that the signal, or the speech spectrum, is compared to a signal or a spectrum synthesized by means of a speech production model which incorporates two components: an articulatory model intended to approximate the geometry of the vocal tract and an acoustical simulation intended to generate a spectrum or a signal from the vocal tract geometry and the noise source. The articulatory model is geometrically adapted to a speaker from MRI images and is used to build a table made up of couples associating one articulatory vector and the corresponding acoustic image vector. During inversion, all the articulatory shapes whose acoustic parameters are close to those observed in the speech signal are recovered. Inversion is thus an advanced table lookup method which we used successfully for vowels [Ouni  & Laprie 2005] [Potard et al. 2008].

 

Activities

The success of an analysis by synthesis method relies on the implicit assumption that synthesis can correctly approximate the speech production process of the speaker whose speech is inverted. There exist fairly realistic acoustic simulations of fricative sounds but they strongly depend on the precision of the geometrical approximation of the vocal tract used as an input. There also exist articulatory models of the vocal tract which yield very good results for vowels. On the other hand, these models are inadequate for those consonants which often require a very accurate articulation at the front part of the vocal tract. The first part of the work will be about the elaboration of articulatory models that are adapted to the production of consonants and vowels. The validation will consist of piloting the acoustic simulation from the geometry and of assessing the quality of the synthetic speech signal with respect to the natural one. This work will be carried out for some X-ray films, whose the acoustic signal recorded during the acquisition of them is sufficiently good.

 

The second part of the work will be about several aspects of the inversion strategy. Firstly, it is now accepted that spectral parameters implying a fairly marked smoothing and frequency integration have to be used, which is the case of MFCC (Mel Frequency Cepstral Coefficients) vectors. However, the most adapted spectral distance to compare natural and synthetic spectra has to be investigated. Another solution consists in modeling the source so as to limit its impact on the computation of the spectral distance.

 

The second point is about the construction of the articulatory table which has to be revisited for two reasons: (i) only the cavity downstream the constriction plays an acoustic role, (ii) the location of the noise source is an additional parameter but it depends on the other articulatory parameters. The third point concerns the way of taking into account the vocal context. Indeed, the context is likely to provide important information about the vocal tract deformations before and after the fricative sound, and thus constraints for inversion.

 

A very complete software environment already exists in the Speech group for acoustic-to-articulatory inversion, which can be exploited by the post-doctoral student.

 

References

-           [S. Ouni  and Y. Laprie 2005] Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion, Journal of the acoustical Society of America, Vol. 118, pp. 444-460

-          [B. Potard, Y. Laprie and S. Ouni], Incorporation of phonetic constraints in acoustic-to-articulatory inversion, JASA, 123(4), 2008 (pp.2310-2323).

-          [Maeda et al. 2006] Technology inventory of audiovisual-to-articulatory inversion  http://aspi.loria.fr/Save/survey-1.pdf

 

Expected skills

Knowledge of speech processing and articulatory modeling, Acoustics, Computer sciences, Applied mathematics

 

 

 

 

 

 

Back  Top

6-32(2012-03-15) Ingénieur de recherche à IRISA Lannion France

Un poste d'ingénieur de recherche (CDD 24 mois) est ouvert dans l'équipe de recherche Cordial de l'Irisa à Lannion. Ce recrutement, dont le profil de recherche se situe en traitement de la parole et du signal de parole, est effectué dans le cadre du projet ANR Phorevox. Le poste est à pourvoir dès que possible. Le profil détaillé est disponible en suivant le lien :
http://www.irisa.fr/doc/emploi/emploi_cordial-12_03.pdf

Back  Top

6-33(2012-03-15) INGENIEUR D’ETUDES ET RECHERCHE au Labo Nat. Métrologie et essais.

DIRECTION DES ESSAIS

Pôle Essais en Environnement

 

INGENIEUR D’ETUDES ET RECHERCHE

EN TRAITEMENT AUTOMATIQUE DES LANGUES  H/F

 Réf : CL/TAL/DE

 

Contexte :

 

Le Laboratoire National de Métrologie et d’Essais propose des prestations d’évaluation de la performance des systèmes de traitement automatiques des langues et de la parole pour une tâche donnée (transcription, traduction, extraction d’informations,…).

Au sein du Département CEM, Sécurité Electrique et Technologies de l’information, l’équipe de traitement de l’information multimedia travaille sur les différentes étapes qui définissent une évaluation. Ses principales missions sont :

 

-        De définir des tâches pertinentes à évaluer en fonction des besoins applicatifs et/ou théoriques,

-        De déterminer des caractéristiques des données à utiliser pour répondre à la tâche considérée,

-        D’établir des mesures qui permettent de rendre compte de la pertinence d’un système pour une tâche donnée.

 

 Missions :

 

Dans le cadre de programmes d’études et recherche, vous aurez pour mission de contribuer au développement de l’activité, notamment au travers des éléments suivants :

 

-        Le montage et la gestion de projets de recherche et développement dans le domaine du    multimédia,

 

-        L’élaboration des protocoles pour répondre aux problématiques de l’évaluation en Traitement  Automatique du Langage :

  • Définitions des mesures
  • Implémentation des outils informatiques
  • Organisation et animation des colloques de restitution
  • Participation à la dissémination des connaissances (entretien site web, publications scientifiques,…)

 

- Le développement des partenariats au niveau international afin de renforcer la position du LNE dans le domaine.

 

Profil :

 

Docteur Ingénieur en Informatique, spécialisé en Traitement Automatique des Langues (TAL).

Vous possédez une première expérience professionnelle (3 à 5 ans en plus de la thèse), durant laquelle vous avez travaillé sur l’évaluation des systèmes automatiques.

Vous maîtrisez la gestion de projet et êtes à l’aise dans l’approche clients et l’organisation et l’animation de réunions et/ou séminaires.

Vous avez des connaissances solides en programmation et analyse de données (fouille de données).

Rigoureux, dynamique, déterminé et d’un relationnel facile, vous saurez rapidement vous intégrer au sein des équipes et démontrer le leadership et l’expertise nécessaires à la réussite de votre mission.

Anglais courant impératif.

Déplacements à prévoir (une dizaine par an de 1 à 3 jours, majoritairement en France).

Poste en CDI basé à Trappes (78).

 

Contact :

 

Postuler sous la référence CL/TAL/DE   A l’attention de Mlle Christelle LEBRAULT - Par mail : recrut@lne.fr

Back  Top

6-34(2012-03-25) Audio Indexing Researcher W/M position at IRCAM – 3DTV project

If you already applied for this position, please just send us a quick email telling us you are still interrested and we get back to you.

Audio Indexing Researcher W/M position at IRCAM – 3DTV project

Starting :  April 2012 (as soon as possible)

Duration : 18 months

Introduction to IRCAM

IRCAM is a leading non-profit organization associated to Centre Pompidou, dedicated to music production, R&D and education in acoustics and music. It hosts composers, researchers and students from many countries cooperating in contemporary music production, scientific and applied research. The main topics addressed in its R&D department include acoustics, audio signal processing, computer music, interaction technologies, musicology. Ircam is located in the centre of Paris near the Centre Pompidou, at 1, Place Igor Stravinsky 75004 Paris.

Introduction to 3DTVs project

The goal of the 3DTVS project is to devise scalable 3DTV AV content description, indexing, search and browsing methods across open platforms, by using mobile and desktop user interfaces and to incorporate such functionalities in 3D audiovisual content archives. 3D multichannel audio analysis targets audio event detection based on fusion techniques that combine the feature analysis performed in the individual channels as well as source localization and separation algorithms for the detection of moving audio sources. The results will be used in 3D audio/cross-modal indexing and retrieval. Multimodal 3D audiovisual content analysis will built on the results of 3D video and audio analysis. 3DTV content description and search mechanisms will be developed to enable fast reply to semantic queries.

Role of IRCAM in the 3DTV Project

In the 3DTVs project, IRCAM is in charge of the research and development of technologies related to

-        Audio event detection using multi-channel audio scenes

-        Speaker diarization

-        Segmentation into Movie scene from the audio signal

-        Sound source separation, localization and identification

Position description

Hired Researcher will be in charge of the development of technologies related to:

  • Audio event detection using multi-channel audio scenes
  • Speaker diarization

The Researchers will also collaborate with the development team and participate in the project activities (evaluation, meetings, specifications).

Required profiles

  • High skill in audio indexing and data mining (statistical modelling, automatic feature selection algorithm …); especially late-fusion algorithms
  • High skill in audio signal processing (spectral analysis, audio-feature extraction, parameter estimation)
  • High-skill in Matlab programming, skills in C/C++ programming

 

  • Good knowledge of Linux, Windows, MAC-OS environments

    ·      High productivity, methodical works, excellent programming style.

Salary

According to background and experience

Applications

Please send an application letter together with your resume and any suitable information addressing the above issues preferably by email to: peeters_a_t_ircam dot fr with cc to vinet_a_t_ircam dot fr, roebel_at_ircam_dot_fr

 

 


L’Ircam recrute un Chercheur  H/F – en CDD de 18 mois et à temps plein – Projet 3DTVs

Poste disponible à partir d'avril 2012

 

Présentation de l’Ircam

L'Ircam est une association à but non lucratif, associée au Centre National d'Art et de Culture Georges Pompidou, dont les missions comprennent des activités de recherche, de création et de pédagogie autour de la musique du XXème siècle et de ses relations avec les sciences et technologies. Au sein de son département R&D, des équipes spécialisées mènent des travaux de recherche et de développement informatique dans les domaines de l'acoustique, du traitement des signaux sonores, des technologies d’interaction, de l’informatique musicale et de la musicologie. L'Ircam est situé au centre de Paris à proximité du Centre Georges Pompidou au 1, Place Stravinsky 75004 Paris.

 

Introduction au projet 3DTVs

L'objectif du projet 3DTVs est de concevoir des descriptions évolutives des contenus 3DTV, leur indexation, leur recherche ainsi que la conception de méthodes de navigation sur toutes des plateformes ouvertes, en utilisant des interfaces utilisateurs mobiles et fixes et d'intégrer de telles fonctionnalités 3D dans les archives de contenus audiovisuels. L’analyse multi canal audio 3D vise la détection d’événements audio basés sur des techniques de fusion combinant l'analyse audio effectuée dans les canaux individuels ainsi que des algorithmes de localisation et de séparation de source pour la détection des mouvements des sources audio. Les résultats seront utilisés pour l’indexation 3D audio et cross modale ainsi que pour la recherche. L’indexation audio/ video multimodale 3D des contenus audiovisuels s’appuiera sur les résultats de l’indexation vidéo 3D et audio 3D. Des méthodes de description de contenu et de recherche seront développées afin de permettre des réponses rapides aux recherches sémantiques.

 

Rôle de l’Ircam dans le projet Quaero

Dans le projet 3DTVs, l'Ircam est en charge de la recherche et du développement des technologies relatives à la

-        Détection des événements audio en utilisant les scènes audio multi canal

-        Segmentation en tours de parole

    -        Séparation, localisation et identification des sources sonores

 

Missions

Le Chercheur sera en charge du développement des technologies liées à:

-        Détection des événements audio en utilisant les scènes audio multi canal

Le chercheur collaborera également avec l'équipe de développement et participera aux activités du projet (évaluation, réunions, spécification).

 

Profil recherché

  • Grande expérience de recherche en indexation audio (modélisation statistique, sélection automatique de descripteurs …) ; grande connaissance en techniques de fusion tardives
  • Grande expérience de recherche en traitement du signal (analyse spectrale, extraction de descripteurs audio, estimation de paramètres) 
  • Très bonne connaissance du langage Matlab

 

  • Connaissance des environnements Linux, Windows et Mac OS-X.
  • Connaissance des langages C et C++
  • Haute productivité, travail méthodique, excellent style de programmation, bonne communication rigueur

 

Salaire

Selon formation et expérience professionnelle

 

Candidatures

Prière d'envoyer une lettre de motivation et un CV détaillant le niveau d'expérience/expertise dans les domaines mentionnés ci-dessus (ainsi que tout autre information pertinente) à peeters _a_t_ ircam dot fr avec copie à

vinet _a_t_ ircam dot fr, roebel _at_ ircam _dot_ fr


 

If you already applied for this position, please just send us a quick email telling us you are still interrested and we get back to you.

 

 

Back  Top

6-35(2012-03-27) Post Doctoral Fellow or Research Associate, Toronto, Canada

Position:         Post Doctoral Fellow or Research Associate (scientific)

 Site:               Toronto Rehabilitation Institute, University Centre, Toronto, Canada
Start Date:      Immediately
Status:           1 year


 We are searching for a motivated and skilled individual to take part in signal and audio processing of biomedical data for 1 year at the Toronto Rehabilitation Institute, University Centre. He/she will work on analysis and processing of audio signals to detect disease-specific patterns. This position will offer an opportunity to work in a resourceful and multi-disciplinary environment and publish findings in scholarly journals.

 

KEY RESPONSIBILITIES:

  • Problem analysis and conceptual design of solutions
  • Literature review
  • Study and analysis of biomedical data
  • Developing code for biomedical signal and audio processing
  • Feature extraction and pattern classification
  • Evaluation of results and tuning of the model parameters to optimize the performance.
  • Documentation of procedures and outcomes

 

KEY REQUIREMENTS:

  • At least 3 years of practical experience either in graduate studies or in industry in audio or speech processing
  • Proficiency in speech/audio processing tools using matlab or stand alone toolkits such as HTK
  • Theoretical Knowledge of pattern classification algorithms such as HMM, support vector machines, and neural networks
  • Good technical writing skills and sound publication record
  • Excellent interpersonal, organizational, and multi-tasking skills

ASSET REQUIREMENTS:

  • Have good programming skills (Python, C, Matlab, ...)
  • Experience in processing of physiological signals
  • Preference will be given to PhD graduates

 

 


Please quote job reference “Sleep Apnea DSP” and send résumé to:

Alshaer dot Hisham at torontorehab dot on dot ca

 

Back  Top

6-36(2012-04-02) Research position in Spoken Language Dialogue Systems Development for Serious Games ; University of Ulm Germany

Research Position with perspective of a PhD degree in Spoken Language
Dialogue Systems Development for Serious Games

The Dialogue Systems Group (www.dialogue-systems.org) in the Faculty of
Engineering and Computer Sciences, University of Ulm is seeking a
researcher at MSc level to work in the area of Spoken Dialogue
Management for Serious Games. The research topic will fit into the
scientific context of the group (including Intelligent, Adaptive and
Proactive Spoken Language Dialogue Interaction, Semantic Analysis, and
Dialogue Modelling) but will be adapted to the expertise of the candidate.

The dialogue management system will manage the communication between
mobile nodes that are connected via a mobile adhoc network (MANET). Due
to the mobility of the nodes and limited range of wireless transmissions
the underlying network topology and link quality frequently changes. In
order to build an adaptive dialogue management system the network can
provide the system with information about the available resources (such
as segmentation and link quality). In turn the dialogue manager can
request different Quality of Service for individual communications (such
as low latency, delay tolerance or high reliability)

Perspective: PhD Thesis.

Requirements: Good programming skills in C, C++, Perl, VoiceXML, Java,
JavaScript and experience with Unix/Linux are highly desirable;
expertise in speech and dialogue technologies would also be appreciated.

The appointment (0,5 TVL) has a fixed duration of 36 months.

Candidates should send their application electronically to
wolfgang.minker@uni-ulm.de. The application should include a short
resume, the names of two referees and a transcript of records with the
results of exams relevant to the MSc Degree. A pdf-version of the MSc
Thesis may also be included.

Dialogue Systems Group
Institute of Communications Engineering
Faculty of Engineering and Computer Sciences
University of Ulm, Germany

-- 
Wolfgang Minker
Ulm University
Communications Engineering - Dialogue Systems
Albert-Einstein-Allee 43
D-89081 Ulm
Phone: +49 731 502 6254/-6251
Fax:   +49 731 501 226254
http://dialogue-systems.org

Back  Top

6-37(2012-04-04) PhD fellowship- Fondazione Bruno Kessler (FBK), Trento, Italy



A PhD fellowship  is available for conducting research  studies in the
field of Automatic Speech Recognition at the Human Language Technology
Research    Unit    (http://hlt.fbk.eu/en/openpositions/phd-ict)    of
Fondazione Bruno  Kessler (FBK), Trento, Italy. Research  work will be
carried out  at FBK as  part of the  PhD Program of  the International
Doctorate  School   in  Information  and   Communication  Technologies
(http://www.ict.unitn.it)   of  the   University  of   Trento,  Italy.
Interested  candidates need to  specify in  the application  form that
they intend  apply for the  project-specific grant offered by  FBK for
the  Automatic  Speech  Recognition  (ASR)  project. FBK  has  a  long
tradition  in developing automatic  transcription systems  for several
languages (information about the research group and ongoing activities
can be  found at  http://hlt.fbk.eu), the aim  of the project  will be
advancing  beyond  the  state-of-the-art  the existing  FBK  automatic
transcription  technology.  Possible research  topics include  but are
not limited  to: improving acoustic modeling for  large vocabulary ASR
(e.g.  discriminative  training algorithms, speaker  adaptive acoustic
modeling,  methods  for  fast  and efficient  adaptation  to  changing
domains, data selection methods for AM training, bootstrap methods for
under resourced  languages) and improving language  modeling for large
vocabulary ASR  (data selection  for LM training,  domain adaptation).
Details   about   requirements  of   candidates   can   be  found   at
http://hlt.fbk.eu/en/openpositions/phd-ict.

Contact: Daniele Falavigna (falavi@fbk.eu)

Back  Top

6-38(2012-04-04) Post-Doctoral Research Position, Aalto University

Post-Doctoral Research Position, Aalto University

 

Title: Statistical speech synthesis

Department: Department of Signal Processing and Acoustics

 

URL: http://spa.aalto.fi/en/

Start date: August-October 2012

Duration: 12-18 months contract

 

Department of Signal Processing and Acoustics, Aalto University (Espoo, Finland), invites applications for a post-doctoral researcher position in speech technology. The position is funded by the Simple4all project (http://simple4all.org/), which is a collaboration between Aalto University, University of Edinburgh (coordinator), University of Helsinki, Universidad Politécnica de Madrid, and Universitatea Tehnica Cluj-Napoca. Simple4All is a 3 year project, funded by EC’s FP7 ICT Programme, whose general aim is to create speech synthesis technology that learns from data with little or no expert supervision and continually improves itself, simply by being used.

 

The work at the Department of Signal Processing and Acoustics focuses on novel vocoding technologies in statistical parametric speech synthesis. More specifically, we are interested in utilizing such speech models in statistical speech synthesis that are closer to the human speech production mechanism and are inherently able to produce many voice qualities. Applicants for the post-doctoral researcher position must have a PhD (or equivalent experience) in speech processing, digital signal processing or computer science. They must have background in statistical speech synthesis, experience in the development of vocoders is particularly appreciated. In addition, experience of project development and project leadership in a research context, together with excellent communication, presentation, and organisational skills are highly desirable.

 

To apply, please send your CV (.pdf format) including a list of publications and your contact information, a statement of research interests and contact information for at least 2 references. Applications must be sent to paavo.alku@aalto.fi using the subject line: Post-doc position in statistical speech synthesis

Application deadline: 30 June 2012

 

Back  Top

6-39(2012-04-15) Full Time Research Programmer, Dialog Research Center, CMU Pittsburgh

Full Time Research Programmer, Dialog Research Center
Language Technologies Institute, CMU Pittsburgh

Minimum Education Level: Bachelor's Degree

The Dialog Research Center (dialrc.org) provides infrastructure support for spoken dialog systems, including distribution of data and software. DIALRC is funded by the US National Science Foundation, and is hosted at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University's main campus in Pittsburgh PA, US. In addition to distributing open source software and dialog data, DIALRC offers live dialog platforms for researchers to evaluate their techniques with real users in live situations. The person filling the position will be responsible for developing and supporting existing dialog software, managing distributions, supporting other researchers using the system and other programming and support tasks as necessary for the center.

Primary Tasks

Work with graduate student researchers to maintain and further develop existing dialog systems based on the open-source Olympus Spoken Dialog System framework.  Maintain and distribute data corpus.  General programming and research support.

Preferred Skills

Two or more years experience in research programming; some experience in supporting research software systems; experience with the CMU Olympus Spoken Dialog System; other spoken dialog systems; and/or speech recognition and speech synthesis.

Experience in Speech Processing/Natural Language Dialog: Python, Perl, C/C++

More information is avilable from Maxine Eskenazi (max@cs.cmu.edu) and Alan W Black (awb@cs.cmu.edu).

Or goto http://www.cmu.edu/jobs/postings/index.html and search for Job Number 9039

Back  Top

6-40(2012-04-20) PhD grant: Prosodic markers at IRIT Toulouse

Modélisation de trajectoires de marqueurs prosodiques et linguistiques ; application à la caractérisation des intentions des intervenants dans les discours audiovisuels

 

Contact

Jérôme Farinas, jfarinas@irit.fr équipe SAMOVA http://www.irit.fr/recherches/SAMOVA/

 

Description du sujet

Dans le domaine du traitement automatique de l'audio, les systèmes actuels sont parvenus à une assez grande maturité pour extraire de façon plutôt fiable des informations sur les locuteurs présents, la langue utilisée et la transcription de la parole. Un des objectifs de la recherche actuelle consiste à utiliser ces informations afin de structurer les interventions des locuteurs et plus largement le contenu radiophonique et télévisuel.

 

Dans ce contexte, l'équipe SAMOVA de l'IRIT a acquis ces dernières années de fortes compétences en modélisation et segmentation automatique en locuteurs [Louradour 2007, El Khoury 2010], en identification automatique de langues [Pellegrino 1998, Farinas 2002, Rouas 2005], en segmentation parole/musique/chant [Pinquier 2004, Lachambre 2009], en extraction de jingle [Pinquier 2004], en transcription de la parole [Campagne ESTER 2004], en recherche de zones de parole conversationnelle [Projet EPAC 2010] et de mots-clés [Le Blouch 2009]. En s'appuyant sur ces travaux, l'équipe travaille sur la structuration des émissions en se basant sur les interventions des locuteurs et leurs interactions [Bigot 2011] ainsi que sur la vidéo [Ercolessi 2011].

 

A partir d'une caractérisation du rôle des intervenants (présentateur, locuteur dominant...) notre objectif est d'étudier plus précisément les interactions entre locuteurs afin de distinguer ce qui dans le message relève de l'interaction (ouverture, clôture, présentation d'un invité, gestion des tours de parole) et des échanges d'opinion. Plus largement, le sujet de thèse proposé vise à étudier l'intention dans les interventions audiovisuelles de personnes.

La modélisation des intentions est principalement basée sur la modélisation de la prosodie, qui a travers l'intonation et le rythme permet d'influer sur la forme du discours. Cette modélisation devra prendre en compte la prosodie à court ou long terme [Farinas2002,Rouas2004]. Deux niveaux de modélisations seront donc mis en œuvre afin de caractériser la modalité de la phrase et la modification de la prosodie des mots. Cela passera par la choix de paramètres prosodiques appropriée (F0, energie) et la modélisation statistique de ces paramètres. L'évolution temporelle pourra être prise en compte en utilisant des modélisations stochastiques, des modélisations de trajectoires.

Cette étude se déroulera en deux phases :

  • dans un premier temps elle portera sur la détermination de marqueurs linguistiques (par le biais de la détection d'expressions clés) et prosodiques (emphase, modalité de la phrase, intonation locale) caractéristiques de certaines fonctions communicatives présentes dans les interactions entre personnes. Ces indicateurs permettront de localiser les zones du document dans lesquelles les informations sur l'intervenant (son nom, son statut) sont potentiellement présentes et apporteront des précisions sur le contexte dans lequel la personne intervient (interview, débat, …). Ces informations pourront d'une part aider à mieux décrire le contenu et d'autre part renforcer les résultats issues de la reconnaissance de la parole particulièrement difficile en situation de débat et de parole spontanée.

  • dans un second temps, à partir des informations disponibles sur les intervenants, l'étude portera sur l'analyse de leurs intentions. Par exemple, pour un présentateur il s'agira de déterminer les zones qui correspondent à la présentation des invités, la gestion des tours de parole, l'ouverture ou la clôture des débats tandis que pour un invité, il s'agira plutôt de qualifier ses tours de parole afin de caractériser l'objectif de son intervention (donner un avis, apporter une contradiction...) à travers, entre autres, du message, du ton, du comportement, du mode d'expression, de la prosodie locale mais également d'indications issues de la vidéo (texte incrusté, ...).

 

Les applications de cette recherche concernent la structuration de contenus audiovisuels pour aider à l'archivage documentaire et la recherche d'information dans ces contenus. Cette structuration et caractérisation de zones d'interaction présente également un intérêt pour la constitution de résumés audio-visuels.

 

Le candidat devra posséder un Master avec de fortes compétences en informatique. Des connaissances en traitement du signal, en reconnaissance de la parole seraient souhaitables (reconnaissance de la parole et prosodie).

 

Références

[Louradour 2007] Noyaux de séquences pour la vérification du locuteur par Machines à Vecteurs de Support. Thèse de doctorat, Université Paul Sabatier, janvier 2007

[El Khoury 2010] Unsupervised Video Indexing based on Audiovisual Characterization of Persons. Thèse de doctorat, Université de Toulouse, juin 2010

[Pellegrino 1998] Une approche phonétique en identification automatique des langues : la modélisation acoustique des systèmes vocaliques. Thèse de doctorat, Université Paul Sabatier, décembre / december 1998.

[Farinas 2002] Une modélisation automatique du rythme pour l'identification des langues. Thèse de doctorat, Université Paul Sabatier, novembre 2002.

[Rouas 2005] Caractérisation et identification automatique des langues. Thèse de doctorat, Université Paul Sabatier, mars 2005.

[Pinquier 2004] Indexation sonore : recherche de composantes primaires pour une structuration audiovisuelle. Thèse de doctorat, Université Paul Sabatier, décembre 2004.

[Lachambre 2009] Caractérisation de l'environnement musical dans les documents audiovisuels. Thèse de doctorat, Université de Toulouse, décembre 2009.

[Campagne ESTER 2004] G. Gravier, J.F. Bonastre, S. Galliano, E. Geoffrois, K. Mc Tait and K. Choukri. ESTER, une campagne d'évaluation des systèmes d'indexation d'émissions radiophoniques, Proc. Journées d'Etude sur la Parole, Avril 2004.

[projet EPAC 2010] Yannick Estève, Thierry Bazillon, Jean-Yves Antoine, Frédéric Béchet, Jérôme Farinas. The EPAC corpus: manual and automatic annotations of conversational speech in French broadcast news (regular paper). Dans : Language Resources and Evaluation Conference (LREC 2010), Valletta, Malte, 19/05/2010-21/05/2010, Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk (Eds.), European Language Resources Association (ELRA), p. 1686-1689, 2011.

[Le Blouch 2009] Décodage acoustico-phonétique et applications à l'indexation audio automatique. Thèse de doctorat, Université Paul Sabatier, juin 2009.

[Bigot 2011] Benjamin Bigot, Isabelle Ferrané, Julien Pinquier, Régine André-Obrecht. Speaker Role Recognition to help Spontaneous Conversational Speech Detection (regular paper). Dans : International workshop on Searching Spontaneous Conversational Speech SCSS (SCSS 2010), Firenze, Italy, 25/10/2010-29/10/2010, ACM, p. 5-10, octobre 2010.

[Ercolessi 2011] Philippe Ercolessi, Hervé Bredin, Christine Sénac and Philippe Joly, Segmenting TV series into scenes using speaker diarization, WIAMIS 12th International Workshop on Image Analysis for Multimedia Interactive Services, Delft, Pays-Bas,13-15 avril 2011.

 

Mots clés

Traitement automatique de la parole, décodage phonétique, recherche de mots clés, prosodie, acoustique, structuration en émissions, vidéo

 

Kewords

Automatic Speech Processing, Phonetic Decoding, Keyword Spotting, Prosody, Acoustic, Structuring Programs, Video

Back  Top

6-41(2012-04-20) Ingénieur at INRIA France

Inria recherche un ingénieur jeune diplômé pour développer sa boîte à outils de séparation de sources audio FASST (http://bass-db.gforge.inria.fr/fasst/) et effectuer un travail de recherche sur la reconnaissance de la parole robuste au bruit.

Qualification: ingénieur ou master 2, diplôme obtenu en 2011 ou 2012
Durée: 2 ans
Lieu de travail: Nancy, France
Date prévisionnelle d’embauche: 01/12/2012
Salaire: 2527 € brut mensuel

Les candidatures seront examinées au fil de l'eau. Informations supplémentaires et formulaire de candidature sur
http://www.inria.fr/institut/recrutement-metiers/offres/ingenieurs-jeunes-diplomes/%28view%29/details.html?nPostingTargetID=11988

Back  Top

6-42(2012-05-01) PhD Reconnaissance automatique de la parole continue : parole spontanée LORIA Nancy France

Sujet de thèse :Reconnaissance automatique de la parole continue : parole spontanée

 

Encadrants pour ce sujet :
Irina Illina, Maitre de conférences, HDR, Université de Lorraine, bureau C147, tel. 03 83 59 84 90, mel. illina@loria.fr
– Denis Jouvet, Directeur de recherches INRIA, HDR bureau C147, tel. 03 54 95 86 26, mel. denis.jouvet@inria.fr
Type de financement  CONTRAT DOCTORAL

Lieu : Inria-LORIA Nancy

 

Le sujet est affiché sur le site de l'école doctorale IAEM http://www.iaem.uhp-nancy.fr/ , rubrique 'propositions contrats doctoraux'.

Date limite du depot de candidature : le 1-er juin

 


 

Conetxte : La reconnaissance de la parole est un processus par lequel un ordinateur transforme le signal acoustique de la parole prononcée en texte. Pendant ce processus, le système de reconnaissance utilise des modèles acoustiques, des modèles de langage et un lexique de prononciations.
La parole spontanée est définie comme un énoncé perçu et conçu au fil de son élocution. Par rapport à la parole préparée, la parole spontanée se caractérise par :
– des insertions (hésitations, répétitions, pauses, reprises, faux départs) ;
– des variations de prononciations (contraction de mots ou de phonèmes) ;
– des environnements difficiles (rires, parole superposée) ;
– des phrases agrammaticales.
La parole spontanée est présente sous plusieurs formes : interviews, débats, dialogues. Ces spécificités sont peu ou pas prises en compte dans les systèmes de reconnaissance de la parole.
Afin d’améliorer la performance de systèmes de reconnaissance il est nécessaire de s’attaquer à deux problèmes ouverts :
– d’un part, détecter automatiquement ces événements de la parole spontanée ;
– et d’autre part, les prendre en compte dans le système de reconnaissance au niveau acoustique ainsi qu’au niveau linguistique.
Pour caractériser et détecter la parole spontanée, [Dufour et al.2009] propose un ensemble de caractéristiques acoustiques (la durée et le débit phonétique) et linguistiques (morphèmes spécifiques, répétitions et faux départs). Concernant la prise en compte de la parole spontanée, certaines pistes de recherche se sont avérées intéressantes telles que l’analyse de prononciation latente avec les connaissances à priori [Lin2007], l’utilisation de dictionnaires avec des prononciations multiples issues de la parole spontanée et l’étude de différents contextes acoustiques de phonèmes [Dupont et al.2005].

 

 

L’objet de cette thèse est d’apporter des éléments de solution à ce problème en proposant de nouvelles méthodes qui permettent de mieux prendre en compte les caractéristiques de la prononciation spontanée dans le cadre de la reconnaissance automatique de la parole.
Le premier objectif de cette thèse concerne l’augmentation de nos connaissances de la variabilité de la parole spontanée dans différents types de parole (interviews, dialogues, etc.). Nous allons surtout nous intéresser aux aspects segmentaux et acoustiques du problème. Les aspects prosodiques pourraient également être envisagés.
Le second objectif concerne la détection et la localisation de ces phénomènes de parole spontanée, et surtout leur prise en compte pour améliorer la reconnaissance de la parole. Ceci reposera sur l’enrichissement des modèles pour tenir compte des connaissance acquises, ainsi que sur la mise en ouvre de techniques de détection de ces phénomènes. Le travail s’effectuera au sein de l’équipe PAROLE au LORIA en utilisant le système ANTS [Brun et al.2005]. Après une étude bibliographique, l’étudiant aura à analyser des corpus de parole, à développer des modules de traitement de la parole spontanée et à les intégrer dans notre système de reconnaissance de la parole. Puis il devra d’évaluer les améliorations sur différents
corpus de parole. Notre équipe possède déjà un corpus riche en parole spontanée : le corpus d’émissions radiophoniques et télévisées, issu des campagnes d’évaluation ESTER et ETAPE.
Les validations éventuelles sur un corpus de parole de personnes âgées (dans un but d’assistance aux personnes à domicile) nous permettraient probablement de dégager et d’étudier d’autres phénomènes de la parole spontanée.
Les domaines abordés par ce sujet sont : la reconnaissance automatique de la parole,
la modélisation probabiliste, la parole spontanée, modélisation acoustique, modèle de langage.

 

 

Références : [Brun et al.2005] A. Brun, C. Cerisara, D. Fohr et I. Illina. ANTS : le système de transcription automatique du LORIA. WorkShop ESTER, 2005.
[Dufour et al.2009] R. Dufour, V. Jousse, Y. Estève, F. Bechet et G. Linares. Spontaneous speech characterization and detection in large audio database. SpeCom, 2009.
[Dupont et al.2005] S. Dupont, C. Ris, L. Couvreur et J.-M. Boite. A study of implicit and explicit modeling of coarticulation and pronunciation variation. Interspeech, 2005.
[Lin2007] L.-S. Lin, C.-K. Lee. Pronunciation modeling for spontaneous speech recognition using latent analysis (LPA) and prior knowledge. ICASSP, 2007

Back  Top

6-43(2012-05-13) PhD position: Caractérisation de l'ambiance sonore dans des enregistrements ethnomusicologiques IRIT Toulouse France

Titre : Caractérisation de l?ambiance sonore dans des enregistrements ethnomusicologiques

 

Responsables : Régine André-Obrecht et Julien Pinquier (IRIT, équipe SAMoVA) obrecht@irit.fr et pinquier@irit.fr

 

Cette thèse concerne le traitement de données ethnomusicologiques issues des archives du CNRS-Musée de l?Homme, gérées par le Centre de Recherche en EthnoMusicologie (CREM) du Laboratoire d'Ethnologie et de Sociologie Comparative (LESC). Il s?agit de documents en cours de numérisation et d?informatisation (3500 heures d?enregistrements inédits, de 1900 à nos jours, de musiques traditionnelles et d?enquêtes ethnographiques du monde entier et 3500 heures de documents anciens et rares). Cette collection est d?une grande importance historique et est unique au monde. Dans ce contexte applicatif, il est nécessaire de mettre au point un ensemble d'outils de traitement automatique de l'audio (parole, musique, chant, bruits?) afin de produire une indexation (semi)automatique pour un accès intelligent à la collection d'enregistrements sonores. Ce travail est principalement à destination de chercheurs (experts) en ethnomusicologie.

 

L?étude envisagée a pour objectif une caractérisation plus fine des composantes  Parole, Musique, Chant, Bruits afin de définir l?environnement sonore générique. De plus, l?introduction d?une approche semi-supervisée (prise en compte de métadonnées disponibles ou de l?utilisateur) doit permettre la caractérisation d?environnements sonores spécifiques.

 

Après s?être approprié les différents systèmes précédemment développées à l?IRIT, concernant la détection de parole et de musique, le doctorant aura en charge leur adaptation au corpus du projet. L?analyse des zones de parole et de voix chantée détectées doit conduire à une segmentation en tours de parole et en tours de chant, suivie du regroupement de ces segments par recherche de similarité des voix. Dès lors que les enregistrements sonores sont effectués dans des conditions naturelles et lorsque les zones de parole, de musique et de chant sont identifiées, restent des zones sonores digne d?un intérêt pour un ethnomusicologique car leur écoute permet de préciser le contexte sonore de la session de l?enregistrement, ce que l?on appelle « l?ambiance sonore ». Il est proposé de localiser ces zones de bruit d?intérêt et de spécifier un étiquetage. Pour ce faire, deux stratégies sont envisagées :

- un mode supervisé en utilisant les attributs acoustiques classiques (approche générique),

- un mode non-supervisé en introduisant des connaissances issues des ethnomusicologues (approche spécifique) via la plateforme Telemeta (http://crem.telemeta.org/).

 

Ce doctorat sera financé par le projet ANR DIADEMS qui démarrera en octobre 2012. Il serait appréciable que le candidat ait des connaissances en reconnaissance de formes et en traitements de la parole et de la musique.


Date limite de réponse
: 15 juin 2012

Back  Top

6-44(2012-06-01) Two positions at Nuance Belgium

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world. Our technologies, applications and services make the user experience more compelling by transforming the way people interact with information and how they create, share and use documents. Every day, millions of users and thousands of businesses, experience Nuance by calling directory assistance, getting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched. Making each of those experiences productive and compelling is what Nuance is all about.

Speech Recognition Specialist

Merelbeke, Belgium Permanent role Response to Craig.Robertson@Nuance.com

Nuance Mobile builds innovative, intelligent and intuitive touch and speech interfaces to simplify and enhance the way people interact with mobile devices, applications, and services.  Nuance Mobile solutions make mobile devices and in-car systems easier to use, automate customer self-service, and optimize the access and discovery of even the most advanced mobile applications and content - regardless of technical know-how, location, environment, or physical and literacy capabilities.

As a contributing member of Nuance, you will work within a dynamic team environment to develop, support, market and sell our award-winning software applications. We offer competitive compensation packages and a challenging technical but casual work environment. Join our dynamic, entrepreneurial team that operates worldwide (Europe, US, APAC). Be a part of our fast growing track of continuing success.

For more information, please see www.nuance.com.

Nuance is an equal opportunity employer.

Responsibilities

As a Speech Recognition specialist at Nuiance you will work with peers from other teams arround  the world to investigate new & best usage of speech recognition for music and/or POI vertical domains. You will be closely working together with our R&D dpt to understand what is and what is not doable with the current limitation of the technology, and help customer's and Nuance internal integration teams to include Nuance technologies into successful products in an efficient way.

Representative tasks will include:

  • Investigate new & best usage of our speech recognition technologies for entering a POI (Point of Interest) by voice, considering the platform & technology constraints
  • Investigate new & best usage of our speech recognition technologies for accessing music by voice, considering the platform & technology constraints
  • Crunch navigation POI data from map providers to build proof of concepts and experiments
  • Contribute to research and technology agendas by providing input and improvement requests to our r&d dpt
  • Support customer projects integrating ASR technologies for POI and/or music

 Qualifications

  • Bachelors or Graduate University degree in Electrical Engineering, Computer Engineering, Computer Science or equivalent / related Technical Degree
  • first working experience
  • Strong C/C++ programming skills; proven software/system problem-solving skills.
  • Excellent oral and written communication skills in English is a must
  • Good listener and communicator, who can represent Nuance professional services at the customer’s premises or in written and oral communications with customers.
  • Positive 'can-do' attitude, well organized, focusing on achieving results cost-effectively
  • Ability and willingness to travel 
  • Ability to work independently, including at customer premises, but always as part of the embedded team. 
  • Self learner, with sense of initiative, and perseverance to deliver high quality work.

 

Preferred:

•  Experience with embedded hardware platforms, embedded operating systems, and embedded software development is desirable

•  Experience with Python and SQLite is highly desirable

•  Windows CE or Linux or QNX OS

 

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world. Our technologies, applications and services make the user experience more compelling by transforming the way people interact with information and how they create, share and use documents. Every day, millions of users and thousands of businesses, experience Nuance by calling directory assistance, getting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched. Making each of those experiences productive and compelling is what Nuance is all about.

NLP Processing Engineer

Merelbeke, Belgium Permanent role Response to Craig.Robertson@Nuance.com

Qualifications

Excellent background in statistics, pattern recognition, and/or signal processing

• Expertise in natural language processing, computational linguistics, statistical language modeling, search, and/or machine translation

• Strong programming skills, ideally in Python, Java, and/or C.

• Skills related to text processing, scripting languages, regular expressions

• Excellent oral and written communications skills in English.

• Ability to carry out focused and goal-oriented research and development, ability to assume responsibility for one’s work

• Ability to work in an international team as well as independently in fast-paced environment

• Ability to creatively solve problems while leveraging existing technology with an eye for efficiency.

 

 

PhD or equivalent research experience are a strong asset

 

• Good knowledge of speech recognition theory, acoustics, and/or psychoacoustics

 

• User interface, human—machine interaction, and dialogue system development experience

 

• Operational knowledge of languages other than English

 

MSc, ideally PhD in computer science, engineering, physics, mathematics, or other technical field

 

Craig Robertson

Recruitment Manager EMEA

 

Back  Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA