ISCA - International Speech
Communication Association


ISCApad Archive  »  2013  »  ISCApad #175  »  Jobs

ISCApad #175

Thursday, January 10, 2013 by Chris Wellekens

6 Jobs
6-1(2012-07-05) PhD at LIG (Grenoble-France)

PhD proposal : Collaborative Annotation of multi-modal, multi-lingual and multimedia documents
Project objective
This PhD will be proposed and funded in the context of the CHIST-ERA / ANR Camomile Project (Collaborative Annotation of multi-MOdal, MultI-Lingual and multi-mEdia documents) Human activity is constantly generating large volumes of heterogeneous data, in particular via the Web. These data can be collected and explored to gain new insights in social sciences, linguistics, economics, behavioral studies as well as artificial intelligence and computer sciences. In this regard, 3M (multimodal, multimedia, multilingual) data could be seen as a paradigm of sharing an object of study, human data, between many scientific domains. But, to be really useful, these data should be annotated, and available in very large amounts. Annotated data is useful for computer sciences which process human data with statistical-based machine learning methods, but also for social sciences which are more and more using the large corpora available to support new insights, in a way which was not imaginable few years ago. However, annotating data is costly as it involves a large amount of manual work, and in this regard 3M data, for which we need to annotate different modalities with different levels of abstraction is especially costly. Current annotation frameworks involve some local manual annotation, with the help sometimes of some automatic tools. The Camomile Project aims at developing a first prototype of collaborative annotation framework on 3M data, in which the manual annotation will be done remotely on many sites, while the final annotation will be localized on the main site. Furthermore, with the same principle, some systems devoted to automatic processing of the modalities (speech, vision) present in the multimedia data will help the transcription, by producing automatic pre-annotations.
PHD proposal
This PhD is dedicated to the proposal of semi-supervised and unsupervised methods for the annotation of MMM data. Different scenarios of semi-supervised annotations will be experimented, for different type of videos. More precisely, we shall study: ? innovative retraining / adaptation strategies to update the different systems using new annotations. Since we consider a real scenario where new annotations are produced continuously, we will specially focus on iterative learning techniques where models are updated instead of being fully retrained; ? new data selection methods for active learning strategies ; we will focus on active learning for multimodal and heterogeneous systems which makes the data selection task much more difficult. As a case study we shall focus our work on developing technologies in order to answer to the questions ?who is seen??, ?who is speaking?? in videos. Depending on the type of video and the feedback from the supervision group, we may extend our work to the automatic annotation of objects (?what is seen??) or activities (?what is going on??).
Required Skills
The applicant must have a master degree in either computer science or computer engineering and have some knowledge in speech, image or video processing and in machine learning. We also search for a candidate with very good programming skills.
LIG GETALP and MRIM collaboration
PHD work is to be carried out between the GETALP and MRIM teams of LIG. LIG / GETALP website http://getalp.imag.fr LIG / MRIM website http://mrim.imag.fr
Contacts Laurent Besacier Laurent.Besacier@imag.fr Georges Quénot Georges.Quenot@imag.fr
Targeted starting date: fall 2012

Back  Top

6-2(2012-07-08) Faculty position in Phonetic Science and Speech Technology at Nanjing Normal University, China

Faculty position in Phonetic Science and Speech Technology at Nanjing Normal University, China

(Urgent job announcement)

The Institute of Linguistic Science and Technology at Nanjing Normal University, China,

invites applications for a faculty position in the area of Phonetic Science and Speech

Technology. The position can be Lecturer, Associate Professor, or Professor, depending on

the qualifications and experience of the applicant.

Nanjing Normal University (NNU) is situated in Nanjing, a city in China not only

famous for its great history and culture but also pride for excellence in education and

academy. With Chinese-style buildings and garden-like environment, the Suiyuan Campus of

NNU is often entitled the “

Most Beautiful Campus in the Orient.”

Nanjing Normal University is among the top 5 universities of China in the area of

Linguistics. Placing strong emphasis on interdisciplinary research, the Institute of Linguistic

Science and Technology at NNU is unique in that it bridges the studies of theoretical and

applied linguistics, phonetics, cognitive sciences, neural sciences, and information

technologies. The phonetic laboratory is very well equipped, with sound-proof recording

studio, professional audio facilities, physiological instruments (e.g., WAVE system,

PowerLab, EGG, EPG, airflow and pressure module, and nasality sensor), EEG for ERP

studies, eye tracker, etc. The laboratory just organized an international symposium TAL 2012

www.TAL2012.org

very successfully at the end of May.

We welcome interested colleagues to join us. The research can cover any areas in

phonetic sciences and speech technologies, including but not limited to speech production,

speech perception, prosodic modeling, speech synthesis, automatic speech recognition and

understanding, spoken language acquisition, computer-aided language learning, and ERP

study for spoken languages. Outstanding research support will be offered.

Requirements:

* A PhD degree (or an expected one) in related disciplines (e.g., linguistics, psychology,

physics, applied mathematics, computer sciences, and electronic engineering);

* Good publication/patent record in phonetic sciences or speech technologies;

* Good oral and written communication skills in both Chinese and English;

* Team work spirit in a multidisciplinary group.

Interested candidates should submit a CV, a detailed list of publication, the copies of the best

two or three publications, and the contact information of two references to:

Prof. Wentao GU

Email:

wtgu@njnu.edu.cn; wentaogu@gmail.com

Phone: (office) +86-25-8359-8624, (mobile) +86-189-3687-2840

The position will keep open until it is filled. An earlier application is strongly recommended

Back  Top

6-3(2012-07-26) Offre de thèse en correction orthographique par traduction statistique, Univ. Le Mans, France
Offre de thèse financée au sein du laboratoire d'Informatique de l'Université du Maine (LIUM) dans le domaine de la correction orthographique automatique par méthodes de traduction statistique. Lieu : LIUM (Le Mans) Date : 1/10/2012 Durée : 3 ans Cette thèse s'inscrit dans le projet 'investissement d'avenir' PACTE, porté par l'entreprise Diadeis, et dont sont également partenaires l'équipe Alpage (INRIA et Paris 7), et les entreprises A2ia et Isako. PACTE a pour objectif l'amélioration de la qualité orthographique des textes issus de différentes méthodes de capture textuelle. L'accent est mis sur les sorties d'OCR (reconnaissance optique de caractères sur des textes imprimés scannés), mais concerne également des données obtenues par reconnaissance d'écriture manuscrite, par saisie manuelle, et par rédaction directe. Les techniques qui seront utilisées sont à la fois statistiques et hybrides, faisant usage d'outils et de ressources de linguistique computationnelle. Le domaine d'application principal du projet est celui de la numérisation du patrimoine écrit, dans un contexte multilingue. Une deuxième thèse démarrera à Alpage avec un accent sur l'utilisation des connaissances linguistiques pour aider à optimiser automatiquement ou quasi-automatiquement la qualité orthographique des textes. Dans le cadre du projet PACTE, une étroite collaboration aura lieu entre le LIUM, Alpage et la société Diadeis. Dans ce contexte, l'enjeu de la thèse au LIUM est d'analyser comment utiliser les techniques de traduction automatique statistique pour la correction d'erreur. En effet, on peut considérer la correction d'erreur comme un processus de passage d'une langue erronée vers une langue correcte. Une approche similaire a déjà été utilisée avec succès pour corriger les sorties des systèmes de traduction par règles, connue sous le nom 'statistical post-editing (SPE)'. Dans le cadre de cette thèse, il s'agit donc d'étudier comment une approche similaire peut être utilisée pour la correction orthographique. Un aspect important de cette thèse concerne le développement de modèles de langue efficaces, donnant de bons résultats avec une faible empreinte mémoire. Les modèles n-grammes à repli seront privilégiés, mais d'autres méthodes seront également explorées, notamment la modélisation dans l'espace continu (continuous space language models). Nous nous intéresserons aussi à l'intégration de connaissances morphosyntaxiques, en collaboration avec l'équipe Alpage. Les langues étudiées seront prioritairement le français et l'anglais, ainsi que l'allemand. Une application à l'espagnol, l'italien, voire d'autres langues européennes est possible. Profil recherché : - bonnes compétences en informatique (la maîtrise de Linux est indispensable, programmation en C++, utilisation de scripts, Perl, etc); - des connaissances en traduction automatique statistique sont souhaitées, ou, à défaut, en apprentissage automatique; - une expérience avec l'outil Moses est un plus. La thèse se déroulera au sein de l'équipe LST du LIUM. Le LIUM est connu au niveau international pour ses recherches dans le domaine de la traduction statistique, et nous avons de nombreuses collaborations avec des universités et entreprises en Europe et aux États-Unis. Contact : Holger Schwenk Holger.Schwenk@lium.univ-lemans.fr 
Back  Top

6-4(2012-08-03) PhD Studentship in Speaker Diarization at EURECOM, Sophia Antipolis, Alpes Maritimes, France

PhD Studentship in Speaker Diarization at EURECOM

Department: Multimedia Communications
URL:        http://www.eurecom.fr/mm
Start date: 01/10/12
Duration:   Duration of the thesis

Description:

EURECOM’s Multimedia Communications Department invites applications for a PhD studentship in speaker diarization within its Speech and Audio Processing Research Group.  Speaker diarization is commonly referred to as the task of detecting ‘who spoke when’ in a multiple-speaker audio signal.  In its most general form it is performed without any prior
knowledge regarding the number of speakers or speaker identities. applications include speech recognition, speaker recognition (biometrics), multimedia indexing, content structuring and general multimedia document processing.

As with any modelling or statistical pattern recognition task, performance is affected by unwanted nuisance variation and by the amount of data available for any given class.  In the case of speaker diarization performance  can be affected by background noise, varying
linguistic content and differences in speaker floor times.   Our recent work has developed new normalization approaches to marginalise linguistic variation in order to increase speaker discrimination and improve speaker diarization performance.

This fully-funded PhD position aims to extend this work to further improve the robustness of speaker diarization in the case of linguistic  variation and varying speaker floor times.  The work will develop a novel phone adaptive training algorithm and investigate other, new normalisation and marginalization approaches to improve speaker modelling.  The position is an opportunity to make a contribution in an increasingly important field of speech and audio processing.  You will join a small, but dynamic research group which participates in a growing number of European, national and industrially-funded research projects and will have the opportunity for international travel and participation in competitive evaluations.


Requirements:

The successful candidate will have a Master’s degree in engineering, mathematics, computing or a related, relevant discipline.  You will be highly motivated to undertake challenging research, have strong expertise in mathematics and programming and have excellent communication skills.  Knowledge of C/C++ and Matlab is strongly desirable.  Good English language speaking and writing skills are essential.  Knowledge of French is a bonus.  Application  Screening of applications will begin immediately, and the search will continue until the position is filled. Applicants should send, to the address below (i) a one page statement of research interests and motivation, (ii) your CV and (iii) contact details for three referees.

Applications should be submitted by e-mail to secretariat@eurecom.fr

Contact:         Dr. Nicholas Evans
Postal address:  2229 route des Crêtes
                 B.P. 193
                 06904 Sophia Antipolis
                 France
Email:           evans@eurecom.fr
Web page:        http://www.eurecom.fr/mm
Phone number:    +33 4 93 00 81 14
Fax number:      +33 4 93 00 82 00

EURECOM is a graduate school and a Research Centre in Communication Systems, located in Sophia Antipolis technology park, in close proximity with a large number of research units of leading multinational corporations in the telecommunications, semiconductor and biotechnology sectors, as well as other outstanding research and teaching institutions. EURECOM was founded in 1991 by TELECOM ParisTech (Ecole Nationale Supérieure des Télécommunications) and EPFL (Swiss federal institute of Lausanne) in a consortium form, combining academic and industrial partners.

EURECOM deploys its expertise around three major fields: Networking and security, Multimedia Communications and Mobile Communications and has a strong international scope and strategy. EURECOM is particularly active in research in its areas of excellence while also training a large number of doctoral candidates. Its contractual research is recognized across Europe and contributes largely to its budget.

Back  Top

6-5(2012-08-08) Poste de chercheur junior en sciences du langage-Univ Mons-Hainaut Belgique

Offre d’emploi : « Chercheur en sciences de la parole »

_____________________________________________________________________________________

 

Service de Métrologie et Sciences du langage, Laboratoire de phonétique,

 

Université de Mons, Mons, Belgique

_____________________________________________________________________________________

 

Le service de métrologie et sciences du langage  constitue une réserve ouverte de recrutement en vue de l’engagement possible dans des postes de chercheur junior (M/F) boursier en sciences de la parole.

Profil du candidat (M/F) :

  • Niveau à l’entrée : « bac +5» (master 300 crédits) au moins, ou équivalent au 1er octobre 2012.
  • Formation initiale attestant d’intérêt et de compétences dans le domaine des sciences du langage et plus spécialement des sciences de la parole.
  • Formation complémentaire en sciences de la parole soit accomplie soit concomitante à l’engagement.
  • Aptitude au travail en équipe, créativité, autonomie, curiosité scientifique.
  • Des compétences en traitement statistique des données, une maîtrise de l’anglais scientifique, de même que des connaissances des langues étrangères constituent des atouts complémentaires.

Profil de poste :

Le titulaire du poste (M/F) contribue aux efforts de recherche du Service dans le cadre de l’une des thématiques ressortissant à l’un des deux projets décrits en annexe. Il prépare une thèse de doctorat articulée avec l’un de ces projets. Il peut être amené à prendre une part mineure aux activités d’encadrement pédagogique du service.

Certains postes sont d’ores et déjà disponibles au service, d’autres sont susceptibles de lui être attribués moyennant une candidature au sein d’un concours (interne et/ou externe à l’UMONS), avec support du Service.

Bourse de recherche d’une durée de quatre ans, par tranches renouvelables d’un an, avec prise de fonctions au plus tôt le 1er octobre 2012.

Les personnes intéressées sont priées d’adresser au plus vite un dossier comportant :

  • une lettre de motivation,
  • un curriculum vitae,
  • la préférence (argumentée) pour l’un ou l’autre des deux projets décrits en annexe,
  • tout document jugé utile.

au format pdf (exclusivement) à l’adresse : bernard.harmegnies@umons.ac.be

Annexe 1 : Projet PAROLPATHOS

Titre du projet

 

Evaluation acoustique et auditive du signal de parole de locuteurs francophones en situation de handicap. Apports de la phonétique clinique au développement de procédures d’évaluation holistique du sujet communicant situé dans son écosystème.

 

Résumé et objectifs du projet

 

Les phénomènes acoustiques que produit le locuteur dans l’actualisation de son intention de communication ne comportent pas seulement la manifestation matérielle des formes linguistiques prescrites par les systèmes de la langue. Le signal de parole charrie aussi quantité d'éléments sans rapport avec les signes linguistiques, mais causalement liés à divers aspects du fonctionnement ou de l'état du locuteur, et susceptibles, dès lors, d'avoir valeur d'indices.

 

Dans les cas où le sujet se trouve en situation de handicap, des éléments liés à un contexte pathologique peuvent ainsi affleurer dans le signal de parole. Les productions vocales du locuteur peuvent dès lors porter la marque tant de problèmes localisés au niveau de l'une ou l'autre des procédures de traitement du langage que de difficultés non langagières mais ayant sur le langage des répercussions plus ou moins directes.

 

La phonétique clinique, discipline en émergence depuis deux décennies dans le monde anglo-saxon, et depuis quelques années seulement en Francophonie, se centre sur ces phénomènes avec le projet de mettre ses méthodes et techniques de laboratoire au service de la compréhension du fonctionnement du locuteur, là où l'existence d'une pathologie lui fait affronter la situation de communication comme situation de handicap. Aujourd'hui, si les approches de la phonétique clinique apparaissent, au crible de ces travaux, d'un très haut intérêt, force est cependant de reconnaître que l'état des connaissances scientifiques en la matière demeure embryonnaire et inégal. D'une part, les recherches se centrent le plus souvent sur des phénomènes observés dans des échantillons de parole de langue anglaise: la langue française fait, de ce point de vue, figure de parent pauvre. D'autre part, certains secteurs pathologiques ont fait l'objet de bien moins d'efforts que d’autres (sinon aucun). En outre, souvent, des équipes différentes ont recours à des méthodologies différentes, même si elles travaillent sur des cadres pathologiques identiques ; il est donc malaisé d'évaluer la fiabilité et la validité des différentes approches métrologiques. Par ailleurs, dans de nombreux cas, les approches quantitatives à base acoustique, quelque sophistiquées qu’elles soient, ne parviennent pas à une finesse clinique comparable à celles de l'expert humain. Enfin, la plupart des outils aujourd'hui disponibles s'assortissent de contraintes techniques et méthodologiques qui en rendent l'utilisation difficile en contexte écologique de communication.

 

Notre projet vise, en conséquence, à dégager une synthèse générale des moyens d’évaluation disponibles, valable pour les productions des locuteurs francophones s’exprimant dans des situations de handicap comportant une dimension pathologique. Il recourt à une large variété de tableaux cliniques (troubles de l’articulation, troubles de la fluence, laryngopathies acquises, carcinomes des voies aéro-digestives supérieures, pathologies neurologiques non-spécifiquement liées à la sphère du langage, vieillissement langagier physiologique vs. pathologique) et à un panel d'approchesméthodologiques diversifié, afin de pouvoir étudier le croisement des tableaux cliniques et des méthodologies dans un large territoire conceptuel. Il compare ces approches métrologiques à entrée acoustique aux évaluations formées par des auditeurs humains dotés de types et de niveaux d’expertise variables. Ce faisant, il vise non seulement à faire oeuvre d’intervalidation, mais interroge également les processus cognitifs permettant à l’observateur de construire et d’exercer son expertise. Il étudie en outre la généralisabilité des mesures et des évaluations pratiquées en contexte artificiel (laboratoire, hôpital) aux contextes écologiques de vie (famille, services, institutions) et propose une approche intégrée de l’évaluation de la contribution de la qualité communicationnelle à la qualité de vie.


 

 

Annexe 2 : Projet COGNIPHON

Titre du projet

 

 

Contrôle cognitif de la production des sons de parole en phase d’acquisition de la L2

 

Résumé et objectifs du projet

 

 

L’individu qui, ayant acquis la maîtrise du langage au travers de sa seule langue maternelle, souhaite en apprendre une autre, se trouve confronté à la nécessité de traiter, en L2, (en perception comme en production) des sons similaires à ceux de la L1 d’une manière différente (par exemple, certaines réalisations du /e/ de l’espagnol peuvent être acoustiquement fort similaires de celles du /ɛ/ du portugais),  voire de percevoir et produire en L2 des sons inexistants en L1 (par exemple, la production du /ʔ/ en arabe ne correspond à la réalisation d’aucun phonème de l’anglais). L’idée que, dans le cadre de cet apprentissage, le sujet recourt mal à propos aux stratégies qui lui sont habituelles en L1 a, depuis longtemps, inspiré les linguistes, les pédagogues et, singulièrement, les scientifiques intéressés à la cognition humaine, qui peuvent y voir la mise en œuvre inappropriée de stratégies routinisées par le biais de l’usage de la L1.

 

Les pédagogues de l’oral en langue étrangère (particulièrement le courant verbo-tonal), soucieux de prendre en charge cette propension de l’apprenant, ont proposé divers moyens d’intervention dans le cadre de démarches qu’il est de coutume de rassembler sous l’appellation correction phonétique. De cela résulte un corps technique rassemblant des procédés didactiques appuyés essentiellement sur l’expertise des praticiens.

Les enseignants de langue s’accordent certes en général sur l’intérêt de ces techniques. Néanmoins, l’étude objective non seulement de leur efficacité mais aussi, plus profondément, de leur mode de fonctionnement, n’a fait l’objet que de fort peu de développements, hormis les travaux de quelques trop rares équipes.

La situation actuelle est donc paradoxale, car même si les échanges verbaux sont désormais au centre des pratiques de classe en L2, la question du traitement pédagogique de l’acquisition des processus de gestion cognitive de la matière phonique a largement été scotomisée dans le domaine de la recherche empirique et très peu de connaissances fiablement basées sur des évidences expérimentales sont en fait disponibles. C’est d’autant plus regrettable que la demande sociale pour des prestations orales multilingues de qualité s’accroît continûment et que diverses officines vendent aujourd’hui au prix fort un savoir-faire souvent banal, prétendument fondé sur des évidences scientifiques, en vérité non démontrées.

Notre projet vise au développement d’un programme de recherche dont la finalité est précisément de contribuer à combler cette lacune par la mise en œuvre de dispositifs expérimentaux susceptibles d’apprécier le poids des divers éléments causaux impliqués dans l’acquisition de nouvelles compétences de contrôle phonique et susceptibles d’être appliqués à tout sujet, quelles que soient ses caractéristiques intrinsèques.

Si, dans son origine, notre réflexion part de constats opérés dans le cadre de situations concrètes d’enseignement-apprentissage et si, par ailleurs, elle se nourrit de l’expertise des enseignants, nous nous inscrivons cependant résolument dans une perspective de recherche fondamentale. Notre objet d’étude n’est autre que l’ensemble des facteurs extrinsèques susceptibles d’être manipulés en vue de favoriser la maîtrise, par l’apprenant de langue étrangère, de nouvelles possibilités de contrôle phonique, que ces facteurs aient ou non été identifiés et/ou délibérément exploités dans le cadre pédagogique. Nous nous trouvons donc bien ici au cœur de ces processus cognitifs “ mis en jeu au cours de l’acquisition, la perception, la compréhension et la production du langage parlé […] ” que Ferrand & Grainger (2004, p. 11) définissent précisément comme constituant l’objet-même des préoccupations scientifiques de la psycholinguistique cognitive.

 

 

 

 
 
Back  Top

6-6(2012-08-24) Post-docs call for application at Brain and Language Research Institute, Aix en Provence France

Brain and Language Research Institute

               Post-docs call for application

                      http://www.blri.fr

>>> Deadline extension

The 'Brain and Language Research Institute' is a new 'Laboratoire d'Excellence' at Aix-Marseille Université. It federates 6 top-level labs in the domain of language studies, neurosciences, cognitive psychology, medicine and computer science. BLRI is now starting different interdisciplinary research programs investigating language production and perception and its cerebral correlates.

Applications are sought for one-year positions of Postdoctoral Research Fellow. Five research subjects are proposed (see detailed presentation available from http://www.blri.fr), all to be started fall 2012.

1. Introducing humour in vocal human-machine interaction systems
   Domains: linguistics and computer science
   Location: Aix-en-Provence (LPL) or Avignon (LIA)
   Contact: bea.priego-valverde@lpl-aix.fr; fabrice.lefevre@univ-­avignon.fr

2. Handwriting sonification: a tool for early diagnostics and treatment of micrographia in Parkinson Disease
   Domains: neurosciences and medicine
   Location: Marseille (LNC)
   Contact: jean-luc.velay@univ-amu.fr; serge.pinto­@lpl-aix.fr

3. Recording and processing the vocalizations of baboons
   Domains: psychology and linguistics
   Location: Marseille (LPC)
   Contact: arnaud.rey@univ-amu.fr; thierry.legou@lpl-aix.fr; joel.fagot@univ-amu.fr

4. Oculomotor and Visuo-attentional Prerequisites to Reading Development in Preschool Children in 4- and 5-year-olds
   Domains: psychology and linguistics
   Location: Aix-en-Provence (LPL)
   Contact: stephanie.ducrot@lpl-aix.fr; jonathan.grainger@univ-amu.fr

5. Phonetic alignment: Analysis and simulation
   Domains: computer science and linguistics
   Location: Aix-en-Provence (LPL) or Avignon (LIA)
   Contact: brigitte.bigi@lpl-aix.fr; georges.linares@univ-­avignon.fr


Applications
------------
Candidates should send a detailed CV plus a 3-pages research project corresponding to one of the subjects. Elaborating the project should be done in coordination with the project supervisors (see contacts for each project).

  . Position: 1 year
  . Salary: 1,950euros net per month (all taxes deduced, excepted income tax)
  . Deadline: September 15th
  . Starting date: not later than November 2012

Application should be sent to philippe.blache@blri.fr with copy to nadera.bureau@blri.fr.

Back  Top

6-7(2012-08-30) Post-doc position at LIMSI-CNRS in the Spoken Language Processing group, Paris

Post-doc position at LIMSI-CNRS Post-doc positionin the Spoken Language Processing     group    

A post-doc position will be proposed at LIMSI-CNRS, in the       context of the ANR-funded CHIST-ERA CAMOMILE Project       (Collaborative Annotation of multi-MOdal, MultI-Lingual and       multi-mEdia documents).
   

   

Description

    Human activity is constantly generating large volumes of     heterogeneous data, in particular via the Web. These data can be     collected and explored to gain new insights in social sciences,     linguistics, economics, behavioural studies as well as artificial     intelligence and computer sciences.
    In this regard, 3M (multimodal, multimedia, multilingual) data could     be seen as a paradigm of sharing an object of study, human data,     between many scientific domains. But, to be really useful, these     data should be annotated, and available in very large amounts.     Annotated data is useful for computer sciences which process human     data with statistical-based machine learning methods, but also for     social sciences which are more and more using the large corpora     available to support new insights, in a way which was not imaginable     few years ago. However, annotating data is costly as it involves a     large amount of manual work, and in this regard 3M data, for which     we need to annotate different modalities with different levels of     abstraction is especially costly. Current annotation framework     involves some local manual annotation, with the help sometimes of     some automatic tools (mainly pre-segmentation).
    The project aims at developing a first prototype of collaborative     annotation framework on 3M data, in which the manual annotation will     be done remotely on many sites, while the final annotation will be     localized on the main site. Furthermore, with the same principle,     some systems devoted to automatic processing of the modalities     (speech, vision) present in the multimedia data will help the     transcription, by producing automatic annotations. These automatic     annotations are done remotely in each expertise point, which will be     then combined locally to produce a meaningful help to the     annotators.
    In order to develop this new annotation concept, we will test it on     a practical case study: the problem of person annotation (who is     speaking?, who is seen?) in video, which needs collaboration of high     level automatic systems dealing with different media (video, speech,     audio tracks, OCR, ...). The quality of the annotated data will be     evaluated through the task of person retrieval.
    This new way to envision the annotation process, should lead to some     methodologies, tools, instruments and data that are useful for the     whole scientific community who have interest in 3M annotated data.
   

Skills

    A PhD in a field related to the project is required.
   

Contacts

   

   

Agenda

   

         
  • Starting date:  Fall 2012
  •      
  • Duration of the project: 36 months
  •    
Back  Top

6-8(2012-08-30) A Post-doc position at Bruno Kessler Foundation, Center for Information Technology (Trento-Italy)

A Post-doc position is available in the Speech-acoustic scene analysis and interpretation - SHINE Unit at Bruno Kessler Foundation, Center for Information Technology.

The Bruno Kessler Foundation (FBK) conducts research activities in Information Technology, Materials and Microsystems, Theoretical Physics, Mathematics, Italian-Germanic historical studies, Religious studies and International Relations. Through its network, it also develops research in the fields of international relationships, conflict causes and effects, European economic institutions, behavioral economics and evaluative assessment of public policies.

Workplace description

The SHINE unit conducts research on acoustic signal processing and interpretation, mainly concerning speech signals acquired by multi-microphone systems in indoor environment. The research aims to progress in the scientific areas of Acoustic Scene Analysis and Speech Interaction under noisy and reverberant conditions, in particular with a speaker at distance from the microphones.

More information about SHINE unit is available at the following link: http://shine.fbk.eu 

 

Job description

The SHINE Research Unit is looking for a candidate to carry out research activities in the field of Distant Speech Recognition. Applications are invited for a post-doctoral researcher who will work under the DIRHA project funded by the EU (http://dirha.fbk.eu) and other internal research activities. This project aims to study voice-based systems in domestic environments supporting natural speech interaction using distant microphones e.g. for supporting motor-impaired persons. Main field of research are multi- channel acoustic processing, distant speech recognition and understanding, speaker identification and verification, spoken dialogue management.

page1image14568
page1image14840

Job requirements

  • PhD degree in computer science or engineering, involving speech processing;

  • background in one or more of the following areas: speech enhancement; noise robust speech recognition; adaptation techniques for acoustic modelling; experience of the design and construction of speech recognition systems; familiarity with software tools such as HTK, Kaldi or Praat; 

  • strong research track record with significant publications at leading international conferences or in journals;

  • skills in experimental work and development of algorithms;

  • highly motivated to undertake challenging applied research;

  • oral and written proficiency in English.

    In adherence to FBK's policy to promote equal opportunity and gender balance, in case of equal applications, female candidates will be given preference.

Employment

Type of contract: 30-month contract Number of position: 1 Gross salary: from 33.000 to 41.000 € per year (depending on the candidate’s experience)

Benefit: company subsidized cafeteria or meal vouchers, internal car park, welcome office support for visa formalities, accommodation, social security, etc., reductions on bank accounts, public transportation, sport, accommodation and language courses fees.

Start date: Autumn 2012 Place: Povo, Trento (Italy)

Application process

To apply online, please send your detailed CV (.pdf format) including a list of publications, a statement of research interests and contact information for at least 2 references. Please include in your CV your authorization for the handling of your personal information as per the Italian Personal data Protection Code, Legislative Decree no. 196/2003 June 2003.

Applications must be sent to jobs@fbk.eu

Emails should have the following reference code: SHINE_PostDoc2012_DSR

Application deadline: September 25th 2012

For more information, please contact: Maurizio Omologo (e-mail: omologo@fbk.eu)

Those candidates who will pass the preliminary curricula screening will be contacted shortly for an interview. Those applicants who will not be selected, will be notified of the exclusion at the end of the selection process.

Please note that FBK may contact short-listed candidates who were not selected for the current openings within a period of 6 months for any selection process for similar positions.

For transparency purposes, the name of the selected candidate, upon his/her acceptance of the position, will be published on the FBK website at the bottom of the selection notice. 

Back  Top

6-9(2012-09-07) 4 positions as Google's Dublin office as Speech Linguistic Project Managers
There are four temporary positions opening at Google's Dublin office as Speech Linguistic Project Managers for French, Italian, German and Spanish (see description below). The role would suit someone with an advanced degree in (Computational) Linguistics (Master's degree or Ph.D.) and a native speaker of one of these languages.
 
These positions were recently advertised on the Linguist List (http://linguistlist.org/jobs/get-jobs.cfm?JobID=98660&SubID=4551801) where all the relevant information can be found. A description can be found below as well.
Job title:
Speech Linguistic Project Manager (French, German, Italian, Iberian Spanish)
 
Job description:
As a Linguistic Project Manager and a native speaker of one of the target languages, you will oversee and manage all work related to achieving high data quality for speech projects in your own language.
 
You will be based in the Dublin office, managing a team of Data Evaluators and working on a number of projects towards Speech research: ASR, TTS, and NLP
 
This includes:
- managing and overseeing the work of your team
- creating verbalisation rules, such as expanding URLs, email addresses, numbers
- providing expertise on pronunciation and phonotactics
- building and maintaining a database of speech recognition patterns
- creating pronunciations for new lexicon entries, maintaining the lexicon
- working with QA tools according to given guidelines and using in-house tools
 
Job requirements:
- native-level speaker of one of the target languages (with good command of the standard dialect) and fluent in English
- keen ear for phonetic nuances and attention to detail; knowledge of the language's phonology 
- must have attended elementary school in the country where the language is spoken 
- ability to quickly grasp technical concepts
- excellent oral and written communication skills
- good organizational skills, previous experience in managing external resources
- previous experience with speech/NLP-related projects a plus
- advanced degree in Linguistics, Computational Linguistics preferred
- also a plus: proficiency with HTML, XML, and some programming language; previous experience working in a Linux environment
 
Project duration: 6-9 months (with potential for extension)
 
For immediate consideration, please email your CV and cover letter in English (PDF format preferred) with 'Speech Linguistic Project Manager [language]' in the subject line.
 
Email Address for Applications: DataOpsMan@gmail.com 
Contact Person: Linne Ha
Closing date: open until filled
Back  Top

6-10(2012-09-26) Speech Recognition/Machine Learning Engineers ar Cantab Research, Cambridge,UK

Speech Recognition/Machine Learning Engineers

Cantab Research was founded in 2006 and exists to supply automatic speech recognition products to a wide variety of customers.   It has previously
supplied SpinVox (now Nuance) to create a voicemail to text system that handles over a million calls a day.  It currently supplies speech recognition to a medical transcription company and to several other companies with high potential novel applications.

We are expanding and seeks a graduate/post-graduate with the following skills:

* Good degree in a numerate discipline (ideally MSc/MPhil/PhD)
* Excellent analytical ability, ability to conduct research experiments as well as develop code
* Practical experience of machine learning on large datasets
* C/C++, perl/python, matlab/octave, Linux
* Previous experience in speech recognition or large scale web information mining an advantage
* Keen to learn automatic speech recognition operation and applications

This role offers an interesting combination of the application of research, experimentation, and product development. The post involves extending Cantab's existing speech recognition software and applying it to new tasks. You will be joining a small but rapidly expanding team and enjoy the challenges and rewards of a startup culture.

Location: Cambridge, UK.  Full/part time and employment/contracting may be negotiated depending on experience.

Contact: Dr Tony Robinson (tonyr@cantabResearch.com)

Back  Top

6-11(2012-10-05) ASSOCIATE RESEARCH SCIENTIST POSITION at ETS Princeton, NJ, USA

ASSOCIATE RESEARCH SCIENTIST POSITION

Speech

Educational Testing Service

Headquartered in Princeton, NJ, ETS is the world’s premier educational measurement institution and a leader in

educational research. As a nonprofit corporation and an innovator in developing tests for clients in education,

government, and business we are dedicated to advancing educational excellence for the communities we serve.

ETSs Research & Development division has an opening for a researcher in the NLP & Speech Group. The Group

currently consists of about 15 Ph.D. level research scientists in areas related to NLP and speech. Its main focus is on

foundational research as well as on development of new capabilities to automatically score written and spoken test

responses in a wide range of ETS test programs including TOEFL(R)iBT and GRE(R).

PRIMARY RESPONSIBILITIES

Provide scientific and technical skills in conceptualizing, designing, obtaining support for, conducting,

managing, and disseminating results of research projects in the field of speech technology, or portions of largescale

research studies or programs in the same field.

Develop and/or modify speech theories to conceptualize and implement new capabilities in automated scoring

and speech-based analysis and evaluation systems which are used to improve assessments, learning tools and test

development practices.

Apply scientific, technical and software engineering skills in designing and conducting research studies and

capability development in support of educational products and services.

Develop and oversee the conduct of selected portions of research proposals and project budgets.

Design and conduct complex scientific studies functioning as an expert in major facets of the projects.

Assist in the conduct of research projects by accomplishing directed tasks according to schedule and within

budget.

Participate in dissemination activities through the publications of research papers, progress and technical reports,

the presentation of seminars or other appropriate communication vehicles.

Develop professional relationships as a representative, consultant or advisor to external advisory and policy

boards and councils, research organizations, educational institutions and educators.

REQUIREMENTS

A Ph.D. in Language Technologies, Natural Language Processing, Computer Science or Electrical Engineering, with

strong emphasis on speech technology and preferably some education in linguistics is required.

Evidence of substantive research experience and/or experience in developing and deploying speech capabilities is

required. Demonstrable contributions to new and/or modified theories of speech processing and their implementation in

automated systems. Demonstrable expertise in the application of speech recognition systems and fluency in at least one

major programming language (e.g., Java, Perl, C/C++, Python).

HOW TO APPLY

Please apply online at

www.ets.org/careers – position #124337. ETS offers competitive salaries, outstanding benefits, a

stimulating work environment, and attractive growth potential. ETS is an Equal Opportunity, Affirmative Action Employer.

Back  Top

6-12(2012-10-05) Researcher in Speech Technology at Vicomtech-IK4, San Sebastian, Spain

Researcher in Speech Technology
Vicomtech-IK4, an international applied research centre in Visual Interaction and Communication Technologies located in San Sebastian (Spain) is looking for a Researcher in Speech Technology. We are looking for someone who combines experience in speech processing and software engineering, with research interests in multilingual Speech Recognition, Speech Synthesis, Voice Transformation and Conversion, and also motivated by the transfer of this knowledge into real world applications by building advanced research prototypes developed to solve real needs. The selected candidate will have an important role in the Human Speech and Language Technologies Department, including advanced research, project management responsibilities and technical leadership in high profile projects.
Requirements: - Masters/Ph.D. degree (or equivalent) in Speech Technology or related field - Experience in Speech Recognition, Speech Synthesis, Voice Transformation or Voice Conversion is desired. - Good written and spoken Spanish and English. Any other languages will be valued.
If you are: - An accomplished researcher with management abilities and interests. - A team player with an ambitious and creative personality.
We offer: - A multi-cultural research environment. - A multidisciplinary research team. - A group active in the international Human Speech and Language Technologies research field. - The opportunity to develop management as well as research skills.
To apply, please submit your CV and a cover letter describing your experience and interest in the position to:
Dr.-Ing. Jorge Posada (Associate Director) E-mail: jposada@vicomtech.org Telephone: +34 943 30 92 30 http://www.vicomtech.org
Deadline for submission: Open until filled

Back  Top

6-13(2012-10-10) Dolby Research Beijing looking for world-class talent!

Dolby Research Beijing looking for world-class talent!

 

Be part of the exciting future of entertainment and add your talents to those of an amazing team. For more than 40 years, Dolby has led the way in developing innovative entertainment products and technologies used by consumers and professionals worldwide. Innovations from Dolby can be heard in consumer audio and video products, entertainment software, and professional sound applications, including music recording, broadcasting, and sound for motion pictures.


Our company philosophy encourages creativity, collaboration and a strong focus on creation, development and delivery of innovative technology solutions that enhance the entertainment experience. Our team-oriented research environment offers the opportunity for market-savvy innovators to apply their theoretical knowledge, awareness of technology trends and alertness to emerging market opportunities to help create technology solutions that are broadly applied in the marketplace through Dolby’s global market reach. We offer great benefits, including an assortment of life insurance and health coverage options, and the opportunity for innovators to make a difference and to experience the satisfaction of seeing technology solutions to which they contributed, in the marketplace!

 

 

Senior Research Engineer

(Audio/speech algorithm architecture and design)

Dolby Sound Technology Research, Beijing

 

 

 

Position Summary

 

This position is in the Research Organization of Dolby Laboratories (www.dolby.com) and is located inBeijing, China. The senior staff research engineer position focuses on the creation of audio signal processing technologies including the whole range of research from the underlying theoretical concepts to the development of prototypes that provide a proof of concept. As a part of an international team, the senior staff research engineer will work on ideas exploring new horizons in the audio processing, analysis, replay and organization. The researcher is responsible for performing fundamental new research, transfer technology to product groups, and draft patent applications. The position includes project leaderships for projects being part of Dolby’s global technology initiatives. This requires efficient interactions which different functional divisions within the company. The position also requires the mentoring for more junior staff.

 

Dolby Laboratories is looking for a self-motivated, highly talented individual interested in applying his or her skills in technologies involving a fundamental understanding of the way that sound sources (audio and voice) are captured, manipulated, coded, delivered, enhanced and played back using digital signal processing techniques. Applications include pre-processing, coding and post-processing solutions in market areas such as consumer electronics, mobile, broadcast, PC and digital cinema applications and other technologies crucial to Dolby Laboratories’ success. The position involves working in cooperation with other technology developers/researchers within Dolby’s global research network, and the opportunity to propose new ideas for further investigation.

 

 

Education, Skills, Abilities, and Experience Required

 

  • M.S.E.E. (required) plus 3 years of applicable, hands-on commercial experience (strongly preferred), or Ph.D. in Electrical Engineering (desired) plus 3 years of closely relevant academic post-doc Research and Development experience
  • Demonstrated ability to create fundamentally new, novel (patentable) signal processing technologies and to envision applications those technologies in the form of innovative product solutions
  • Strong innovator
  • Project leadership skills
  • Mentoring skills
  • Experienced in global project and collaboration work
  • Proficient in advanced theory and application of audio signal processing techniques
  • Highly skilled in C/C++ language and Matlab programming
  • Team-oriented work ethic and interest to work in cross-continental teams
  • Strong personal interest sound technologies and in learning, researching, and creating relevant new technologies with high commercial impact
  • Independent, self-motivated worker requiring minimal supervision
  • Fluent in Chinese and English. Excellent communication skills
  • Good understanding of general acoustics

 

Strongly Desired

 

  • Experience working in a software development team, including software version control tools
  • Real-time windows programming
  • Real-time audio processing
  • Willing to do occasional international travel
  • Personal interest in audio in entertainment applications

 

Please send your English and Chinese resume to cb@dolby.com

 

Back  Top

6-14(2012-10-11) Faculty Position at the Center for Spoken Language Understanding, Portland, Oregon


    Job title: Assistant, Associate, or Full Professor
    Institution: Oregon Health & Science University, Portland, Oregon
    Department: Center for Spoken Language Understanding
           

The Institute on Development & Disability/Center for Spoken Language Understanding invites applications at all ranks for a faculty position in Natural Language Processing, to include technologies for analysis of speech, language, or both. Special interest in applications to behavioral  manifestations of neurological disorders is essential.

       

The primary interest is to extend our existing program in developing behavioral technologies that allow early detection and remediation of a wide range of neurological disorders, in  including Autism and Parkinson’s.

       

The Institute on Development & Disability/ Center for Spoken Language Understanding is at the forefront of this new, exciting area of research. The faculty member will be expected to teach courses supporting the research program and appropriate background areas such Machine Learning and Computational Linguistics. We seek a researcher with a well-developed program in Natural Language Processing, to collaborate with the CSLU team and with clinicians throughout OHSU. The appointee will be expected to maintain an independent, extramurally funded research program.

       

Requirements:          

  •  Ph.D.         
  •      
  • Experience with Computational Linguistics, Machine Learning and Natural Language Processing.         
  •    

   

Please contact: Jan van Santen, vansantj@ohsu.edu

    
   

Back  Top

6-15(2012-10-11) Research Programmer at the College of Pharmacy at the University of Minnesota
 
 
Brief Description: The College of Pharmacy at the University of Minnesota is seeking a talented, pro-active and innovative individual for a Research Programmer position to work on several projects in Center for Clinical and Cognitive Neuropharmacology (C3N). C3N is engaged in conducting interdisciplinary research focused on cognitive effects of medications and neurodegenerative disorders such as Alzheimer's disease. Computerized assessment is used to measure these cognitive effects. The successful candidate for this position will be responsible for a variety of computer-related tasks including creating and maintaining innovative computer-based neuropsychological testing applications that involve spontaneous speech and language collection and analysis. The successful candidate will also be responsible for creating and maintaining databases used to store and organize experimental samples and web-enabled interfaces to the databases and data analysis tools. The successful candidate will also be expected to work with graduate and undergraduate students on specific programming and research projects to meet the needs of the Center.  
 
 
Full Description is available here on the official University of Minnesota job posting site:
Back  Top

6-16(2012-10-12) Postdoctoral research position: Automatic identifcation of South African languages

Postdoctoral research position: Automatic identifcation of South African languages  U.Stellenbosch South Africa

 

A postdoc position focussing on automatic language identification for the eleven official languages of South Africa is available in the Digital Signal Processing Group of the Department of Electrical and Electronic Engineering at the  University of Stellenbosch. Specific project objectives include the developemt of a research system, the production of associated publishable outputs, and the development of a web-based demonstrator. The position is part of a bilateral project grant between the Netherlands and South Africa.

Applicants should hold a PhD in the field of Electronic/Electrical Engineering, Information Engineering, or Computer Science, or other relevant disciplines. Suitable candidates must have strong computer programming, analytical and mathematical skills, and be familiar with a Linux computing environment. Candidates must also be self-motivated and able to work independently. Finally, candidates must have excellent English writing skills and have an explicit interest in scientific research and publication.

The position will be available for one year, with a possible extension to a second year, depending on progress and available funds. The proposed starting date is not later than 15 January 2013.

Applications should include a covering letter, curriculum vitae, list of publications, research projects, conference participation and details of three contactable referees and should be sent as soon as possible to: Prof Thomas Niesler, Department of Electrical and Electronic Engineering, University of Stellenbosch, Private Bag X1, Matieland 7602. Applications can also be sent by email to: trn@sun.ac.za. The successful applicant will be subject to University policies and procedures.

Interested applicants are welcome to contact me at the above e-mail address for further information regarding the project.

 

Back  Top

6-17(2012-10-18) Dolby’s technology group in Beijing looking for software engineers!

Dolby’s technology group in Beijing looking for software engineers!

 

Job Title:                                Embedded SW-Engineer (Audio)                    

                                                                                                                                                                         

 

Summary Description

This position is in the Engineering organization of Dolby Laboratories and located in Beijing, China.  The main focus of this position is to implement Dolby’s audio technologies, including creating the reference code, porting to the embedded platforms such as ARM cores or TI DSPs. The position requires a deep knowledge in signal processing algorithms, fixed-pointed algorithms and optimization technical including the use of assembly language, as well as an excellent understanding of DSP architectures.

 

We are looking for a highly motivated individual for whom working with different tool chains under various operating systems in hardware close environments is fun and not a challenge.

 

The candidate will be part of the engineering team in Beijing and work closely together with other Dolby engineering entities in the US, Germany and Australia. We expect the candidate to build-up expert knowledge on highly efficient Dolby audio engines. Working in an international environment requires excellent verbal and written English communication skills.

Essential Job Functions:

  • Implement Dolby’s audio signal processing algorithms for both floating-point and fixed-point platforms.
  • Port and optimize audio signal processing algorithms to embedded fixed-point devices in a timely manner based on assigned portions of projects and existing architectures.
  • Write code, following best practices in embedded SW-engineering, leading to well documented, reliable and easy to maintain SW-components.
  • Validate and maintain correct behavior of SW-components via automated unit tests.
  • Serve as a team member with responsibility for maintaining embedded sub-components.
  • Work together with the development teams in the US, Germany and Australia to improve Dolby’s products.
  • Be a local resource on embedded Dolby audio technologies by combining an understanding of algorithmic behavior with a good knowledge of processor architectures.
  • Seek to increase knowledge by attending internal and external trainings and conferences.
  • Contribute ideas for new technologies, tools, or methodologies.
  • Share relevant information within the project team.
  • Provide technical assistance to non-engineering teams such as Research.
  • Promote a positive work environment.
  • Practice sensitivity in working with others.
  • Accept input from other team members.

Teamwork & Communications

                                                                                                                                                                                                                                                              

Education, Skills, Abilities, and Experience:

  • M.S. in Electrical Engineering, Computer Science or comparable field is required.
  • Professional experience in porting and optimization of signal processing algorithms to embedded platforms is a strong plus.
  • C/C++ programming skills under Windows and Linux environments is required.
  • Good understanding of development/debugging on embedded simulators/hardware devices is required.
  • Excellent English spoken and written communication skills are required.
  • Ability to meet timelines is required.
  • Good understanding of at least one assembly language is a strong plus.
  • Knowledge of scripting languages such as Perl or Python is a plus.
  • Familiar with embedded real time operating systems is a plus.

 

 

Please send your CV to cv.engineerin.beijing@dolby.com

 

 

 

 

Back  Top

6-18(2012-10-19) Post-doc at LABEX EPL Paris 3

Dans le cadre du  LABEX EFL 'Empirical foundations of Linguistics' (http://www.labex-efl.org/), un projet d’une durée  de 10 ans initié en 2011, nous proposons un projet post-doctoral de 6 mois à  temps complet dans l’opération de recherche « Assessing phonetic and  phonological complexity in motoric speech 
disorders ».  

Le  candidat  participera à une recherche sur la complexité phonétique et phonologique dans le contexte des troubles moteurs de la parole.   

Il travaillera sous la responsabilité de Cécile Fougeron et en collaboration avec le Dc Lise  Crevier-Buchman au Laboratoire de Phonétique et Phonologie, à Paris .

Le  candidat sera titulaire d’un doctorat en phonétique ou en Speech and Hearing  Sciences, avec  une expérience en phonétique clinique. Une formation en  phonétique acoustique  et dans l’expérimentation avec des patients est requise.  Une expérience en investigation physiologique  avec l’ultrason sera un plus.

Les dossiers de candidatures contiendront les documents suivants et seront  envoyés à  martine.adda-decker_at_univ-paris3.fr avant le 21/11/2012 :     

  • une  lettre de motivation
  •      
  • un curriculum vitae à jour avec une liste des publications
  •      
  • l’adresse  du site Internet où l’on peut accéder aux publications
  •      
  • les  noms et adresses électroniques de deux personnes pouvant fournir des références
  •    

Pour plus d'informations, voir  http://www.labex-efl.org/?q=fr/node/136
       

 Contact: Cécile Fougeron (LPP-P3)

   

Adresse  du responsable: cecile.fougeron@univ-paris3.fr

   

Université: Université  Paris 3

   

Niveau: chercheur postdoctorant

   

Durée: 06 mois

   

Salaire: 12 000€ net / 6 mois

   

Spécialités: phonétique clinique, acoustique, échographie linguale, dysarthrie

   

Date  limite de candidature: 21 novembre 2012

   

Adresse pour la candidature: martine.adda-decker_at_univ-paris3.fr

   

Référence  de candidature: EFL-PPC5

   

 
         

Back  Top

6-19(2012-10-20) Large-Scale Audio Indexing Researchers/Engineers: 2 W/M positions at IRCAM-Paris

Job Openings: Large-Scale Audio Indexing Researchers/Engineers: 2 W/M  positions at IRCAM

   

Starting :  January,  2013

   

Duration : 18 months

   

 

   

 

   

Position  description A

   

The hired Researcher will  be in charge of the research  and the development of scalable technologies for supervised learning (i.e. scaling GMM, PCA  or SVM algorithms) to be applicable to millions of annotated data.

   

He/she will then be in  charge of the application of  the developed technologies for the training of large-scale music genre and  music mood models and their application to large-scale music  catalogues.

   

 

   

Required  profile for A:

   

  • High  skill in audio indexing and data mining  
  • Previous  experience into scalable machine-learning models
  • High-skill in Matlab programming, skills in C/C++ programming
  • Skill  in audio signal processing (spectral analysis, audio-feature extraction, parameter estimation)
  • Good  knowledge of Linux, Windows, MacOS  environments
  • High productivity, methodical works, excellent programming style.

                       

 

   

Position  description B

   

The hired  Engineer/Researcher will be in charge of the  development of the framework for scalable storage, management         and access of   distributed data (audio and meta-data). He/she will be also in charge of the  development of scalable search algorithms.

   

 

   

Required profile for B:

   

  • High skill in database management systems
  • High  skill in indexing technologies (hash-table, m-trees, …)
  • Good knowledge of Linux, Windows, MacOS  environments
  •  High productivity, methodical  works, excellent programming style.

               

 

   

The   hired Engineers/Researchers A and B will also collaborate with the development team and  participate in the project activities (evaluation of technologies, meetings, specifications, reports).

   

 

   

 

   

Introduction to IRCAM

   

IRCAM is a leading  non-profit organization associated to Centre Pompidou, dedicated to music production, R&D and education in sound and music technologies. It hosts composers, researchers and students from many countries cooperating in contemporary music production, scientific and applied research. The main topics addressed in its R&D department include  acoustics, audio signal processing, computer music, interaction technologies, musicology. Ircam is located in the centre of Paris near the Centre Pompidou at 1, Place Igor Stravinsky 75004 Paris.

   

 

   

Salary

   

According to background and  experience

   

 

   

Applications

   

Please send an application letter together with your resume and any suitable information addressing the above issues         preferably by email to: peeters_a_t_ircam dot fr with cc to vinet_a_t_ircam dot fr,  roebel_at_ircam_dot_fr

   

 

   

 

Back  Top

6-20(2012-11-03) Research Scientist at Yandex Zurich

Title: Research Scientist

Opening in our Zurich (Switzerland) office. (Working language: English)

Yandex (company.yandex.com) is the leading search engine in Russia with an extensive set of added value services.

We are seeking an experienced and motivated person to join our team developing basic technology and applications.
The ideal candidate will be able to take responsibility for developing and implementing advanced modules resulting in improved speech recognition performance.

Responsibilities:
-Ability to independently develop new and improved algorithms for speech recognition
-Analyze speech recognition performance and implement solutions to provide optimum accuracy
-Use, improve and create research tools to create, update and optimize speech recognition systems for multiple domains
-Work with the team to design future products


Required:
-Higher degree in speech science, machine learning, or related field
-Experience developing ASR applications - training, tuning, and optimization
-Software development experience
-Programming experience in C/C++
-Excellent communication skills in English
-Willing to relocate to Switzerland


Desirable:
-Experience in advanced aspects of speech recognition (e.g. noise robust ASR, adaptation, discriminative training, decoding methods, etc.)
-Knowledge of scripting languages (especially python)
-Experience in commercial projects in the area of speech recognition and other speech technologies.
-Experience with applications for GPU is a plus
-Background in natural language processing, machine learning and/or computational linguistics is a plus


Interested candidates, please send application to:
barbara@yandex-team.ru

Back  Top

6-21(2012-11-08) Postdoc position at IBM Language and Knowledge Center, Trento , Italy

The newly established IBM Language and Knowledge Center, Trento , Italy has a postdoc position in the following area:

-Natural Language Dialog

The postdoc scholar will be part of a Êresearch project aiming at designing machines that interact with humans and support them in complex and large scale knowledge and decision making tasks. The team includes researchers Êfrom IBM and the TrentoRise Human Language Technology Center founded by the University of Trento and FBK. Candidates with strong research background in at least one of the following:

- Conversational Dialogue Systems - Statistical models of Dialogue - Natural Language Understanding - Machine Learning - Question Answering Systems

are invited to apply.

The official postdoc position application site:

http://www.trentorise.eu/call-for-participation/bando-di-selezione-l-call-po sitions

If you would like to enquiry about the position send an email along with the CV addressed to

Prof. Giuseppe Riccardi sisl-jobs@disi.unitn.it Subject: Postdoc Position at IBM, Trento

Deadline: February 5, 2013

Back  Top

6-22(2012-10-12) PhD position on The influence of robots on the development of language, New Zealand

PhD Position: The influence of robots on the development of language

 

Job Posting

 

Project description

 

The ‘Wordovators’ project is a three-year project funded by the John Templeton Foundation. The project will conduct large-scale experiments in the form of computerized word games. These games will be designed to probe the factors underpinning word creation and creativity, and how these develop through the life-span. One strand of the project will probe particular issues surrounding interactions between people and humanoid Robots. How are new words created and adopted in contexts involving such interactions? This PhD position is for a highly motivated student to join the project team, and conduct work that explores the ways that robots might shape human languages. These studies will analyze the factors and processes that might contribute to the influence of robots on the vocabularies of English and of artificial languages in imaginary worlds.

 

This project is a collaboration between University of Canterbury, New Zealand and Northwestern University, USA. The PhD candidate will enroll for a PhD degree in the HIT Lab NZ at University of Canterbury, and will be primarily supervised by Dr Christoph Bartneck. (the HIT Lab NZ). Other associated faculty are Professor Jen Hay (NZILBB), Janet Pierrehumbert (Northwestern University / Adjunct Professor NZILBB), and Professor Stephanie Stokes (NZILBB). The PhD student will be encouraged to regularly visit Northwestern University.

 

 

Your skills

You should have an interest in human language and have a strong background in robotics or computer science.

The HIT Lab NZ

The Human Interface Technology Laboratory New Zealand (HIT Lab NZ) is world leading research institutions developing and commercializing technology that improves human computer interaction. The HIT Lab NZ has over 50 staff and students and has extensive experience in Human Computer Interaction and Science & Technology Studies. The HIT Lab NZ is located at the University of Canterbury in Christchurch, New Zealand. The University of Canterbury has the top Engineering School in New Zealand, including a highly ranked Department of Computer Science. For more information about the HIT Lab NZ see http://www.hitlabnz.org/.

 

NZILBB

The HIT Lab NZ at the University of Canterbury is affiliated with the New Zealand Institute of Language, Brain and Behaviour (NZILBB). NZILBB is a multi-disciplinary centre dedicated to the study of human language. The researchers come from a wide range of disciplines, forging connections across linguistics, speech production and perception, language acquisition, language disorders, social cognition, memory, brain imaging, cognitive science, bilingual education, and interface technologies. More information is available at: http://www.nzilbb.canterbury.ac.nz/.

 

Christchurch

 

Christchurch is the second largest city in New Zealand and offers an exciting and easy lifestyle for students. It is the most affordable major city to live in. It is easy to get around whether you are biking, walking, driving or using the excellent public transport system. Christchurch also offers outstanding opportunities for outdoor activities, and is close to both surf beaches and ski-fields.

Appointment and Scholarship Support

The PhD scholarship is full time for a duration of three years with an annual scholarship of $25,000 NZD. The scholarship will also cover the tuition fees.

The research in this project must be concluded with writing a PhD thesis within the Human Interface Technology PhD program of the HIT Lab NZ. For more information about the PhD program in Human Interface Technology, please see http://www.hitlabnz.org/index.php/education/phd-program.

Further Information and Application

Further information can be obtained by contacting Christoph Bartneck (christoph.bartneck@canterbury.ac.nz). Information about the HIT Lab NZ is available at: http://www.hitlabnz.org. Please upload your application as one PDF file at http://www.hitlabnz.org/index.php/jobs/job/37/ Your application must include a letter explaining your specific interest in the project, an extensive curriculum vitae, your academic records, and a list of two references. Applications will be accepted until November 15th, 2012 or until position is filled.

 

International applicants will be required to arrange for their NZ student visa after an offer of a place. Please check http://www.immigration.govt.nz for information about what type of visa might be most suitable and the process of acquiring it. The university has various types of accommodation available on campus. Please check http://www.canterbury.ac.nz/accom/ for information about the options and prices. International students should also consult the International Student website at http://www.canterbury.ac.nz/international/ to learn about the cost of living, fees, and insurances.

 

 

 

Back  Top

6-23(2012-11-15) Faculty positions at CSLP at the Johns Hopkins University in Baltimore, USA

The Center for Language and Speech Processing at the Johns Hopkins University in Baltimore, USA,
http://www.clsp.jhu.edu
seeks applicants for a tenure-track or tenured faculty member in speech and language processing. Rank will be dependent on the experience and accomplishments of the candidate.

Applicants must have a Ph.D. in a relevant discipline and will be expected to establish a strong, independent, multidisciplinary, internationally recognized research program. Commitment to quality teaching at the undergraduate and graduate levels is required. We are committed to building a diverse educational environment; women and minorities are especially encouraged to apply.

Prospective candidates should email rscully@jhu.edu for further information.

>>>>>>>>>>>>>>

Hynek Hermansky
Julian S. Smith Professor in Electrical Engineering
Director
Center for Language and Speech Processing
The Johns Hopkins University
3400 N. Charles Street, Hackerman Hall
Baltimore, Maryland 21218
410-516-6766
hynek@jhu.edu

Admin. support:
Ruth Scally
410-516-4237
rscally1@jhu.edu

Back  Top

6-24(2012-11-26) INRIA-Internship for Master2 Students

INRIA-Internship for Master2 Students

Title:

Speech analysis for Parkinson's disease detection

Description:

Parkinson's disease (PD) is one of the most common neurodegenerative disorders and its clinical

diagnosis, particularly early one, is still a difficult task. Recent research has shown that the speech

signal may be useful for discriminating people with PD from healthy ones, based on clinical evidence

which suggests that the former typically exhibit some form of vocal disorder. In fact, vocal disorder

may be amongst the earliest PD symptoms, detectable up to five years prior to clinical diagnosis. The

range of symptoms present in speech includes reduced loudness, increased vocal tremor, and

breathiness (noise). Vocal impairment relevant to PD is described as dysphonia (inability to produce

normal vocal sounds) and dysarthria (difficulty in pronouncing words). The use of sustained vowels,

where the speaker is requested to sustain phonation for as long as possible, attempting to maintain

steady frequency and amplitude at a comfortable level, is commonplace in clinical practice. Research

has shown that the sustained vowel “aaaaa” is sufficient for many voice assessment applications,

including PD status prediction.

The first goal of this internship is to implement/improve some state-of-the-art algorithms for dysphonia

measures and use them within an appropriate classifier (like SVM) to discriminate between disordered

and healthy voices. These measures are based on linear and nonlinear speech analysis and are well

documented in [1]. The experiments will be carried on on the well established Kay Elemetrics

Disordered Voice Database (

http://www.kayelemetrics.com/).

The second goal is to try to develop new dysphonia measures based on novel nonlinear speech

analysis algorithms recently developed in the GeoStat team [2]. These algorithms have indeed shown

significant improvements w.r.t. state-of-the-art techniques in many applications including speech

segmentation, glottal inverse filtering and sparse modeling.

The work of this internship will be conducted in collaboration with Dr. Max Little (MediaLab of MIT

and Imperial College of London) and should lead to a PhD fellowship proposition.

References:

[1] A. Tsanas,

M.A. Little, P.E. McSharry, J. Spielman, L.O. Ramig. Novel speech signal processing

algorithms for high-accuracy classification of Parkinson’s disease. IEEE Transactions on Biomedical

Engineering,

59(5):1264-1271. 2012

[2] PhD thesis of Vahid Khanagha. INRIA Bordeaux-Sud Ouest. January 2013.

Prerequisites:

Good level in mathematics and signal/speech processing is necessary, as well as Matlab

and C/C++ programing. Knowledge in machine learning would be an advantage.

Supervisor:

Khalid Daoudi (khalid.daoudi@inria.fr), GeoStat team (http://geostat.bordeaux.inria.fr).

Location:

INRIA- Bordeaux Sud Ouest (http://www.inria.fr/bordeaux). Bordeaux, France.

Starting date:

Fev/Mars 2012.

Duration

: 6 months

Salary:

1200 euros / month

Back  Top

6-25(2012-12-07) Doctoral and Post-doctoral Positions in Signal Processing for Hearing Instruments, Bochum, Germany

Doctoral and Post-doctoral Positions in Signal Processing for Hearing Instruments

Position Description

The ITN ICanHear, starting on 1 January 2013, will provide cutting-edge research projects for 12 doctoral and 5 post-doctoral research fellows in digital signal processing for hearing instruments. ICanHear aims to develop models based on emerging knowledge about higher-level processing within the auditory pathway and exploit that knowledge to develop creative solutions that will improve the performance of hearing instruments.

Attractive grants and a wide variety of international training activities, including collaborations with ICanHear Associated Partners in the U.K., Switzerland, Belgium, U.S.A., and Canada, will be made available to successful candidates, who will stay in the network for a period of 12 to 36 months.

 

Research and training positions will be available in the following ICanHear labs:

  • Institute of Communication Acoustics, Ruhr-Universität Bochum (DE)

  • Experimental Oto-rhino-laryngology (ExpORL), Katholieke Universiteit Leuven (BE)

  • Institute of Sound and Vibration Research, University of Southampton (UK)

  • Hearing Systems Group, Denmark Technical University, Lyngby (DK)

  • Laboratory for Experimental Audiology, University Hospital Zurich (CH)

  • Signal Processing Group, Siemens Hearing Instruments (DE)

  • Cochlear Research and Development, Cochlear Ltd., Mechelen (BE)

 

Requirements for Candidates and Application procedure:

Early-stage (doctoral) Research Fellows have less than four years experience in research after obtaining a Masters degree in engineering, computer science, or similar.

Experienced (post-doctoral) Researcher Fellows are already in possession of a doctoral degree or have at least 4 years but less than 5 years of research experience in engineering and/or hearing research.

In order to ensure transnational mobility candidates may have resided no more than 12 months (during the last 3 years) in the country of the host institution they wish to apply to. For all positions excellent English language skills are required.

To apply please send in the following documents via e-mail to the ICanHear coordination office (icanhear@rub.de): CV, certified copies of all relevant diplomas and transcripts, two letters of recommendation, proof of proficiency in English, letter of motivation (research interest, reasons for applying to programme and host). For further information on research projects available, application details and eligibility please visit the ICanHear web-site (http://www.icanhear-itn.eu) or contact the project coordinator Rainer Martin (rainer.martin@rub.de).

 

Back  Top

6-26(2012-12-15) Technicien en instrumentation scientifique,expérimentation et mesure Aix-en-Provence France
CAMPAGNE NOEMI HIVER 2012-2013 PROFIL DE POSTE Description de l'Unité Code unité : UMR 7309 Nom de l’unité : Laboratoire Parole et Langage Directeur : Noël NGUYEN Ville : Aix-en-Provence Délégation régionale : DR12 Institut : INSHS Description du poste NUMERO NOEMI : T54030 CORPS : Technicien BAP : C Emploi-type : C4B21-Technicien en instrumentation scientifique, expérimentation et mesure Fonction Technicien de Plateforme Technicien de Plateforme Mission Au sein du Laboratoire Parole et Langage (LPL), affecté au Centre d’Expérimentation sur la Parole (CEP), l’agent sera chargé du soutien aux expériences en collaboration avec le coordinateur de la plateforme. Activités L’activité principale consiste à apporter un soutien quotidien au fonctionnement de la plateforme, il peut s’agir notamment : - D’assurer le prêt et le suivi du matériel utilisé sur la plateforme ou à l’extérieur, - D’effectuer le montage, l’assemblage de sous-ensembles (notamment audio et vidéo) pour la réalisation d’expériences, - D’assister les expérimentateurs lors de la passation d’expériences en appliquant un protocole défini, - D’effectuer des modifications ou adaptations de dispositifs expérimentaux, - D’assurer la maintenance et les interventions de premier niveau, la détection et le diagnostique de pannes,. - De réaliser des enregistrements (installation et enregistrement proprement-dit) audio et vidéo, - D’assurer la gestion des consommables nécessaires au déroulement des expériences, - D’utiliser des applications logicielles de contrôle d’instruments. Compétences Le (ou)la candidate devra faire preuve d’une grande motivation pour ce poste de soutien indispensable au fonctionnement du centre d’expérimentation sur la parole. Une formation de base en électronique et/ou en mesures physiques est souhaitée, pour mener à bien la réalisation éventuelle de modules élémentaires de synchronisation entre instruments, ou encore pour synchroniser les systèmes d’enregistrement audio et vidéo, pour effectuer le montage et l’assemblage de sous-ensembles pour la réalisation de dispositifs expérimentaux. Le ou la candidate devra apprécier le travail en équipe puisqu’il ou elle travaillera en lien étroit avec le coordinateur de la plateforme. La personne doit être capable d’apprendre de nouvelles techniques, et avoir goût pour lele sens du contact humain car elle sera en contact avec un grand nombre d’utilisateurs. Elle devra montrer une grande rigueur dans le respect des procédures mises en place. L’adhésion aux règles d’hygiène et sécurité en place ainsi est indispensable. Contexte Le Laboratoire Parole et Langage est une unité de recherche du CNRS et d’Aix Marseille Université. Il accueille des phonéticiens, linguistes, des informaticiens, des psychologues, des neuroscientifiques, des physiciens et des médecins. Les activités du LPL portent sur l’étude des mécanismes de production et de perception du langage et de la parole. Le LPL se distingue par ses méthodes de recherche reposant à la fois sur l’expérimentation, l’investigation instrumentale et la formalisation. Approche originale dans ce champ scientifique, qui émarge à la fois aux domaines des sciences humaines, des sciences du vivant et des sciences pour l’ingénieur. Cette particularité explique, au-delà d’une forte activité de recherche fondamentale, l’importance des applications développées à partir des travaux menés dans les domaines du traitement de l’écrit, de l’intelligibilité du message parlé, de la conversion texte-parole de qualité, ou encore de l’évaluation et de la rééducation des troubles de la voix ou du langage. Ces caractéristiques font du Laboratoire Parole et Langage une unité de recherche adaptée aux défis scientifiques des sciences du langage, tout en étant impliquée dans leurs enjeux technologiques. Le LPL regroupe actuellement plus de 80 personnes statutaires (chercheurs, enseignants-chercheurs, ingénieurs, techniciens, administratifs), auxquelles s’ajoutent 40 doctorants dont 20 boursiers. Il est le laboratoire français le plus important dans ce domaine scientifique et l’un des premiers en Europe. Le LPL dispose désormais d’une plateforme technique regroupant un ensemble d’instruments pour l’investigation de la production et la perception de la parole : électro-encéphalographie, tracking oculaire, articulographie, électro- palatographie, évaluation articulatoire, etc. Cette ressource unique en Europe est mutualisée au sein du Centre d’Expérimentation sur la Parole (http://www.lpl.univ-aix.fr/~cep/), plateforme technique à laquelle le poste sera affecté le poste. 
Back  Top

6-27(2012-12-16) Master project IRISA Rennes France

Computer Science Internship

CORDIAL group

Title :

Voice Conversion from non-parallel corpora

Description :

The main goal of a voice conversion system (VCS) is to transform the speech

signal uttered by speaker (the source speaker) so that it sounds like it was uttered by an other

person (the target speaker). The applications of such techniques are limitless. For example, a

VCS can be combined to a Text-To-Speech system in order to produce multiple high quality

synthetic voices. In the entertainment domain, a VCS can be used to dub an actor with its own

voice.

State of the art VCS use Gaussian Mixture Models (GMM) to capture the transformation

from the acoustic space of the source to the acoustic space of the target. Most of the models are

source-target joint models that are trained on paired source-target observations. Those paired

observations are often gathered from parallel corpora, that is speech signals resulting from the

two speakers uttering the same set of sentences. Parallel corpora are hard to come with. Moreo-

ver, they do not guaranty that the pairing of vectors is accurate. Indeed, the pairing process is

unsupervised and uses a Dynamic Time Warping under the strong (and unrealistic) hypothesis

that the two speakers truly uttered the same sentence, with the same speaking style. This asser-

tion is often wrong and results in non-discriminant models that tends to over-smooth speaker's

distinctive characteristics.

The goal of this Master subject is to suppress the use of parallel corpora in the process of

training joint GMM for voice conversion. We suggest to pair speech segments on high level speech

descriptors as those used in Unit Selection Text-To-Speech. Those descriptors not only contain

the segmental information (acoustic class for example) but also supra-segmental informations

such as phoneme context, speed, prosody, power, ... In a rst step, both source and target

corpora are segmented and tagged with descriptors. In a second step, each class from one corpus

is paired with the equivalent class from the other corpus. Finally, a classical DTW algorithm

can be applied on each paired class. The expected result is to derive transform models that both

could take into account speaker variability and be more robust to pairing errors.

Keywords :

Voice Conversion, Gaussian Mixture Models

Contacts :

Vincent Barreaud (vincent.barreaud@irisa.fr)

Bibliographie :

[1] H. Benisty and D. Malah. Voice conversion using gmm with enhanced global variance. In

Conference of the International Speech Communication Association (Interspeech)

, pages 669{

672, 2011.

[2] L. Mesbahi, V. Barreaud, and O. Boeard. Non-parallel hierarchical training for voice conver-

sion. In

Proceedings of the 16th European Signal Processing Conference, Lausanne, Switzerland, 2008.

[3] Y. Stylianou, O. Cappe, and E. Moulines. Continuous probabilistic transform for voice conver-

sion.

IEEE Transactions on Speech and Audio Processing, 6(2) :131-142, 1998.

 

Back  Top

6-28(2012-12-16) Master project 2 IRISA Rennes France

Computer Science Internship

CORDIAL group

Title :

Unit-selection speech synthesis guided by a stochastic model of spectral and prosodic

parameters.

A Text-To-Speech system (TTS) produces a speech signal corresponding to the vocalization

of a given text. Such a system is composed of a linguistic processing stage followed by an acoustic

one which complies as much as possible with the linguistic directives. Concerning the second step,

the most used approaches are

{ the corpus based synthesis approach which lies on the selection and concatenation of unit

sequences extracted from a large continuous speech corpus. It has been popular for 20

years, yielding an unmatched sound quality but still bearing some artefacts due to spectral

discontinuities.

{ the statistical approach. The new generation of TTS systems has emerged in the last years,

reintroducing the rule based systems. The rules are no longer deterministic like in the

rst systems in the 1950's, but they are replaced by stochastic models. HTS, an HMMbased

speech synthesis system, is currently the most used statistical system. The HTS type

systems yield a good acoustic continuum but with a sound quality strongly depending on

the underlying acoustic model.

Recently, some hybrid synthesis systems have been proposed, combining the statistical approach

with the method of unit selection. It consists in using the acoustic descriptions and the

melodic contours generated by a statistical system in order to drive the cost function during the

natural speech unit selection phase, or also, substituting the poor quality natural speech units

by units derived from a statistical system.

The framework of this subject is the corpus based TTS. Considering the combinatorial problem

due to the search of an optimal unit sequence with a blind sequencing, the work consists

in determining heuristics to reduce the search space and satisfy a real time objective. These

assumptions, based on spectral and prosodic type parameters generated by HTS, will permit to

implement pre-selection lters or to propose new cost functions within the corpus based system

developped by the Cordial group. The production of the hybrid system will be evaluated and

compared via listening tests with standard systems like HTS and a corpus based system.

Keywords :

TTS, Corpus based speech synthesis, Statistical Learning, Experiments.

Contacts :

Olivier Boe
ard, Nelly Barbot, Damien Lolive (prenom.nom@irisa.fr)

Bibliography :

[1] A. W. Black and K. A. Lenzo,

Optimal data selection for unit selection synthesis, 4th ISCA

Tutorial and Research Workshop on Speech Synthesis, 2001.

[2] H. Kawai, T. Toda, J. Ni, M. Tsuzaki and K. Tokuda,

Ximera : a new tts from atr based on

corpus-based technologies

. ISCA Tutorial and Research Workshop on Speech Synthesis, 2004.

[3] S. Rouibia and O. Rosec,

Unit selection for speech synthesis based on a new acoustic target

cost

, Interspeech, 2005.

[4] H. Zen, K. Tokuda and A. W. Black,

Statistical parametric speech synthesis. Speech Communication,

v.51, n.11, pages 1039{1064, 2009.

[5] H. Silen, E. Helander, J. Nurminen, K. Koppinen and M. Gabbouj,

Using Robust Viterbi

Algorithm and HMM-Modeling in Unit Selection TTS to Replace Units of Poor Quality

,

Interspeech 2010.

 

Back  Top

6-29(2012-12-16) Master project 3 IRISA Rennes France

 Computer Science Internship

CORDIAL group

Title: Grapheme-to-phoneme conversion adaptation using conditional random elds

Description:

Grapheme-to-phoneme conversion consists in generating possible pronuncia-

tions for an isolated word or for a sequence of words. More formally, this conversion is a translit-

eration of a sequence of graphemes, i.e., letters, into a sequence of phonemes, symbolic units to

represent elementary sounds of a language. Grapheme-to-phoneme converters are used in speech

processing



either to help automatic speech recognition systems to decode words from a speech signal



or as a mean to explain speech synthesizers how a written input should be acoustically

produced.

A problem with such tools is that they are trained on large and varied amounts of aligned

sequences of graphemes and phonemes, leading to generic manners of pronouncing words in a

given language. As a consequence, they are not adequate as soon as one wants to recognize

or synthesize speci c voices, for instance, accentuated speech, stressed speech, dictating voices

versus chatting voices,

etc. [1].

While multiple methods have been proposed for grapheme-to-phoneme conversion [2, 3], the

primary goal of this internship is to propose a method to adapt grapheme-to-phoneme models

which can easily be adapted under conditions speci ed by the user. More precisely, the use of

conditional random elds (CRF) will be studied to model the generic French pronunciation and

variants of it [4]. CRFs are state-of-the-art statistical tools widely used for labelling problems

in natural language processing [5]. A further important goal is to be able to automatically

characterize pronunciation distinctive features of a given speci c voice as compared to a generic

voice. This means highlighting and generalizing di
erences that can be observed between two

sequences of phonemes derived from a same sequence of graphemes.

Results of this internship would be integrated into the speech synthesis platform of the team

in order to easily and automatically simulate and imitate speci c voices.

Technical skills:

C/C++ and a scripting language (e.g., Perl or Python)

Keywords:

Natural language processing, speech processing, machine learning, statistical learn-

ing

Contact:

Gwenole Lecorve (gwenole.lecorve@irisa.fr)

References:

[1] B. Hutchinson and J. Droppo. Learning non-parametric models of pronunciation. In

Pro-

ceedings of ICASSP

, 2011.

[2] M. Bisani and H. Ney. Joint-sequence models for grapheme-to-phoneme conversion. In

Speech

Communication

, 2008.

[3] S. Hahn, P. Lehnen, and Ney H. Powerful extensions to crfs for grapheme to phoneme

conversion. In

Proceedings of ICASSP, 2011.

[4] Irina Illina, Dominique Fohr, and Denis Jouvet. Multiple pronunciation generation using

grapheme-to-phoneme conversion based on conditional random elds. In

Proceedings of

SPECOM

, 2011.

[5] John D. La
erty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random

elds: probabilistic models for segmenting and labeling sequence data. In

Proceedings of

ICML

, 2001.

Back  Top

6-30(2013-01-14) Ph.D. Researcher in Speech Synthesis, Trinity College, Dublin, Ireland

Post Specification

Post Title:

Ph.D. Researcher in Speech Synthesis

Post Status:

3 years

Department/Faculty:

Centre for Language and Communication Studies (CLCS)

Location:

Phonetics and Speech Laboratory

Salary:

€16,000 per annum (plus fees paid)

Closing Date:

31st January 2013

Post Summary

A Ph.D. Researcher is required to work in the area of speech synthesis at the Phonetics and

Speech Laboratory, School of Linguistic, Speech and Communication Sciences. The position

will involve carrying out research on the topic of Hidden Markov Model (HMM)-based speech

synthesis. Specifically, we are looking for a researcher to work on developing a source-filter

based acoustic modelling for HMM-based speech synthesis which is closely related to the

human speech production process and which can facilitate modification of voice source and

vocal tract filter components at synthesis time.

Background to the Post

Much of the research carried out to date in the Phonetics and Speech Laboratory has been

concerned with the role of the voice source in speech. This research involves the development of accurate voice source processing both as a window on human speech production and for exploitation in voice-sensitive technology, particularly synthesis. The laboratory team is interdisciplinary and includes engineers, linguists, phoneticians and technologists.

This post will the main be funded by the on-going Abair project which has developed the first

speech synthesisers for Irish (www..abair.ie), and the researcher will exploit the current Abair

synthesis platform. In this project the aim is to deliver multi-dialect synthesis with multiple

personages and voices that can be made appropriate to different contexts of use. The post will also be linked to the FastNet project which aims at voice-sensitive speech technologies.

A specific goal of our laboratory team is to leverage our expertise on the voice by improving the naturalness of parametric speech synthesis, as well as making more flexible synthesis platforms which can allow modifications of voice characteristics (e.g., for creating different personalities/characters, different forms of expression etc).

Standard duties of the Post

Initially the researcher will be required to attend some lectures as part of the Masters

programme on Speech and Language Processing. This and a supervised reading

programme will provide a background in the area of voice production, analysis and

synthesis.

* In the very early stages the researcher will be required to develop synthetic voices, using

the Irish corpora, with the standard HMM-based synthesis platform (i.e. HTS). Note that

to work with the Irish corpora does not require a background in the Irish language, as

there will be collaboration with experts in this field.

* The researcher will be required to familiarise themselves with existing speech synthesis

platforms which provide explicit modelling of the voice source (e.g., Cabral et al. 2011,

Raitio et al. 2011, Anumanchipalli et al. 2010).

* The researcher will then need to first implement similar versions of these systems and

then work towards developing novel vocoding methods which would allow full parametric

flexibility of both voice source and vocal tract filter components at synthesis time.

Person Specification

Qualifications

* Bachelors degree in Electrical Engineering, Computer Science with specialisation in

speech signal processing, or related areas.

* Knowledge & Experience (Essential & Desirable)

* Strong digital signal processing skills (Essential)

*Good knowledge of HTS including previous experience developing synthetic voices

(Essential)

* Knowledge of speech production and perception (Desirable)

* Experience in speech recognition (Desirable)

Skills & Competencies

* Good knowledge of written and spoken English.

Benefits

* Opportunity to work with a world-class inter-disciplinary speech research group.

To apply, please email a brief cover letter and CV, including the names and addresses of two

academic referees, to: kanejo@tcd.ie and to cegobl@tcd.ie

 

Back  Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA