ISCA - International Speech
Communication Association

Previous

ISCApad Archive » 2010 » ISCApad #146 » Jobs

ISCApad #146

Tuesday, August 10, 2010 by Chris Wellekens

6 Jobs

6-1

(2010-03-04) Post doctoral Position in Speech Recognition (Grenoble, France)

Post doctoral Position in Speech Recognition (Grenoble, France)
Title: Application and Optimization of Speech Detection and Recognition Algorithms in Smart Homes
URLs: http://getalp.imag.fr/ http://sweet-home.imag.fr/
Start Date: October 2010
Duration and salary: 12 months, 1900 euros ¤
Keywords: Speech Recognition, Home Automation, Smart Homes

Description: The GETALP Team of the Laboratory of Informatics of Grenoble invites applications
for a full-time post-doctoral researcher to work on the SWEET-HOME ('Système
Domotique d’Assistance au Domicile') national French project funded by the
ANR ('Agence Nationale de la Recherche'). This project aims to deliver sufficient
support to people who need support for independent living such as elderly of disabled
persons (e.g., Alzheimer, cognitive deficiency . . . ). This assessment is usually
done through sensing technology (e.g., microphones, infra-red presence sensors,
door-contacts, etc.) which detects critical situations in order to generate the appropriate
action to support the inhabitant (call to an emergency service, call to relatives . . . ).
A few microphones are set in an experimental apartment in order to recognize sounds
and speech in real time. The recognition is challenging given that the speaker may
be far from the microphone and because of additive noise and reverberation. Indeed,
the position requires a significant experience in speech recognition. The project consortium
is composed of the LIG (Joseph Fourier University), the ESIGETEL and the
Theoris, Technosens and Camera-Contact companies. The experimental apartment
DOMUS of the Carnot Institute of Grenoble will be used by the consortium during
this project.
Requirements: The succesful candidate will have been awarded a PhD degree in computer science
or signal processing, involving automatic speech recognition. Expertise in environmental
robustness, independent component analysis (ICA), is a bonus, as any other
experience relevant to signal processing. The candidate will have a strong research
track record with significant publications at leading international conferences or in
journals. She/He will be highly motivated to undertake challenging applied research.
Moderate level in French language is required as the project language will be French.
Applications: Please send to the address below (i) a one page statement of your research interests
and motivation, (ii) yout CV and (iii) references before 1st of July 2010.

6-2

(2010-03-05) Post-doctoral position: Acoustic to articulatory mapping of fricative sounds, Nancy F

Acoustic to articulatory mapping of fricative sounds

Post-doctoral position

Nancy (France)

Environment

This subject deals with acoustic to articulatory mapping [Maeda et al. 2006], i.e. the recovery of the vocal tract shape from the speech signal possibly supplemented by images of the speaker’s face. This is one of the great challenges in the domain of automatic speech processing which did not receive satisfactory answer yet. The development of efficient algorithms would open new directions of research in the domain of second language learning, language acquisition and automatic speech recognition.

The objective is to develop inversion algorithms for fricative sounds. Indeed, there exist now numerical simulation models for fricatives. Their acoustics and dynamics are better known than those of stops and it will be the first category of sounds to be inverted after vowels for which the Speech group has already developed efficient algorithms.

The production of fricatives differs from that of vowels about two points:

The vocal tract is not excited by the vibration of vocal cords located at larynx but by a noise. This noise originates in the turbulent air flow downstream the constriction formed by the tongue and the palate.
Only the cavity downstream the constriction is excited by the source.

The approach proposed is analysis-by-synthesis. This means that the signal, or the speech spectrum, is compared to a signal or a spectrum synthesized by means of a speech production model which incorporates two components: an articulatory model intended to approximate the geometry of the vocal tract and an acoustical simulation intended to generate a spectrum or a signal from the vocal tract geometry and the noise source. The articulatory model is geometrically adapted to a speaker from MRI images and is used to build a table made up of couples associating one articulatory vector and the corresponding acoustic image vector. During inversion, all the articulatory shapes whose acoustic parameters are close to those observed in the speech signal are recovered. Inversion is thus an advanced table lookup method which we used successfully for vowels [Ouni & Laprie 2005] [Potard et al. 2008].

Objectives

The success of an analysis by synthesis method relies on the implicit assumption that synthesis can correctly approximate the speech production process of the speaker whose speech is inverted. There exist fairly realistic acoustic simulations of fricative sounds but they strongly depend on the precision of the geometrical approximation of the vocal tract used as an input. There also exist articulatory models of the vocal tract which yield very good results for vowels. On the other hand, these models are inadequate for those consonants which often require a very accurate articulation at the front part of the vocal tract. The first part of the work will be about the elaboration of articulatory models that are adapted to the production of consonants and vowels. The validation will consist of piloting the acoustic simulation from the geometry and of assessing the quality of the synthetic speech signal with respect to the natural one. This work will be carried out for some X-ray films, whose the acoustic signal recorded during the acquisition of them is sufficiently good.

The second part of the work will be about several aspects of the inversion strategy. Firstly, it is now accepted that spectral parameters implying a fairly marked smoothing and frequency integration have to be used, which is the case of MFCC (Mel Frequency Cepstral Coefficients) vectors. However, the most adapted spectral distance to compare natural and synthetic spectra has to be investigated. Another solution consists in modeling the source so as to limit its impact on the computation of the spectral distance.

The second point is about the construction of the articulatory table which has to be revisited for two reasons: (i) only the cavity downstream the constriction plays an acoustic role, (ii) the location of the noise source is an additional parameter but it depends on the other articulatory parameters. The third point concerns the way of taking into account the vocal context. Indeed, the context is likely to provide important information about the vocal tract deformations before and after the fricative sound, and thus constraints for inversion.

A very complete software environment already exists in the Speech group for acoustic-to-articulatory inversion, which can be exploited by the post-doctoral student.

References

[S. Ouni and Y. Laprie 2005] Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion, Journal of the acoustical Society of America, Vol. 118, pp. 444-460

[B. Potard, Y. Laprie and S. Ouni], Incorporation of phonetic constraints in acoustic-to-articulatory inversion, JASA, 123(4), 2008 (pp.2310-2323).

[Maeda et al. 2006] Technology inventory of audiovisual-to-articulatory inversion http://aspi.loria.fr/Save/survey-1.pdf

Skill and profile

Knowledge of speech processing and articulatory modeling.

Supervision and contact:

Yves Laprie (Yves.Laprie@loria.fr)

Duration:

1 year (possibly extendable)

Important and useful links

The PhD should have been defended no more than a year before the recruitment date.

6-3

(2010-03-12) Invitation to join the graduate team at the CLSP (Johns Hopkins U.) for the summer school

Workshops

Undergraduate Team Members

The Center for Language and Speech Processing at the Johns Hopkins University is seeking outstanding members of the current junior class to participate in a summer workshop on language engineering from June 7th to July 30th, 2010

No limitation is placed on the undergraduate major. Only enthusiasm for research, relevant skills, past academic and employment record, and the strength of letters of recommendation will be considered. Students of Biomedical Engineering, Computer Science, Cognitive Science, Electrical Engineering, Linguistics, Mathematics, Physics, Psychology, etc. may apply. Women and minorities are encouraged to apply. The workshop is open to both US and international students.

An opportunity to explore an exciting new area of research.
A two-week tutorial on speech and language technology.
Mentoring by an experienced researcher.
Use of a computer workstation throughout the workshop.
A $5000 stipend and $2520 towards per diem expenses.
Private furnished accommodation for the duration of the workshop.
Travel expenses to and from the workshop venue.
Participation in project planning activities.

The eight-week workshop provides a vigorously stimulating and enriching intellectual environment and we hope it will encourage students to eventually pursue graduate study in the field of human language technologies.

Click Here to Apply!

The 2010 Workshop Teams

Selection Criteria

Four to eight undergraduate students will be selected for next summer's workshop. It is expected that they will be members of the current junior class. Applicants must be proficient in computer usage, including either C, C++, Perl or Python programming and have exposure to basic probability or statistics. Knowledge of the following will be considered, but is not a prerequisite: Linguistics, Speech Communication, Natural Language Processing, Cognitive Science, Machine Learning, Digital Signal Processing, Signals and Systems, Linear Algebra, Data Structures, Foreign Languages, or MatLab or similar software. .

Equal Opportunity Policy

The Johns Hopkins University admits students of any race, color, sex, religion, national or ethnic origin, age, disability or veteran status to all of the rights, privileges, programs, benefits and activities generally accorded or made available to students at the University. It does not discriminate on the basis of race, color, sex, religion, sexual orientation, national or ethnic origin, age, disability or veteran status in any student program or activity, including the administration of its educational policies, admission policies, scholarship and loan programs, and athletic and other University-administered programs or in employment. Accordingly, the University does not take into consideration personal factors that are irrelevant to the program involved.

Questions regarding access to programs following Title VI, Title IX, and Section 504 should be referred to the Office of Institutional Equity, 205 Garland Hall, (410) 516-8075.

Policy on the Reserve Officer Training Corps.

Present Department of Defense policy governing participation in university-based ROTC programs discriminates on the basis of sexual orientation. Such discrimination is inconsistent with the Johns Hopkins University non-discrimination policy. Because ROTC is a valuable component of the University that provides an opportunity for many students to afford a Hopkins education, to train for a career and to become positive forces in the military, the University, after careful study, has decided to continue the ROTC program and to encourage a change in federal policy that brings it into conformity with the University's policy.

6-4

(2010-03-11) Post doc position in Crete.

Post-doctoral position in speech coding for speech synthesis

A post-doctoral research position in the field of speech synthesis is open at France Telecom-Orange Labs in Lannion, France. This study will involve the design and implementation of new speech coding methods particularly suited for speech synthesis. The objective of this work is twofold: to propose new algorithms for compressing acoustic inventories in concatenative synthesis; to implement the building blocks for speech coding/decoding in the context of parametric synthesis (HMM-based).
This one-year post-doctoral contract lies within a collaboration between Orange Labs (France) and the University of Crete (Greece). Travels between these two entities should thus be expected since the work will be developed in both sites.
Required Skills:
•
Excellent knowledge of signal processing and speech coding;
•
Extensive experience with C, C++ programming;
•
Good familiarity with Linux and Windows development environments.
Knowledge about Sinusoidal Speech modelling and coding will be considered as an advantage.
Salary: around 2300 € net per month depending on experience.
Closing date for applications: May 30th 2010.
Starting date: June/September 2010
Please send applications (CV+ 2 ref letters) or questions to:
Olivier Rosec
Tel: +33 2 96 05 20 67
olivier.rosec@orange-ftgroup.com
Yannis Stylianou
Tel: +30 2810 391713
styliano@ics.forth.gr

6-5

(2010-03-11) Post doc in speech synthesis in Crete

Post-doctoral position in speech synthesis

A post-doctoral research position in the field of speech synthesis is open at Orange Labs in Lannion, France. This study will involve the design and implementation of a new hybrid speech synthesis system combining HMM-based synthesis and unit selection synthesis. The successful candidate will: first, develop a toolkit for training HMM models from the acoustic data available at Orange Labs; second, implement the acoustic parameters generation in the Orange Labs speech synthesizer; third, propose, design and implement an hybrid speech synthesis system combining selected and HMM-based units.
Required Skills:
-
PhD in computer science or electrical engineering
-
Strong knowledge in automatic learning (including HMM)
-
Extensive experience with C/C++ programming
-
Knowledge of HTK/HTS is a plus
Salary: around 2300 € per month depending on experience.
Closing date: April 30th 2010.
Contact:
Cedric BOIDIN
Tel: +33 2 96 05 33 53
cedric.boidin@orange-ftgroup.com

6-6

(2010-03-11) PhD opportunity in speech transformation in Crete.

PhD Opportunity in Speech Transformation

A full-time 3 year PhD position is available at France Telecom – Orange Labs in Lannion, France.
The position is within Orange Labs speech synthesis team and under academic supervision by Prof. Stylianou from Multimedia Informatics Laboratory at the University of Crete in Heraklion, Greece. Both labs conduct world class research in speech processing in areas like speech synthesis, speech transformation, voice conversion and speech coding.

Starting date: September 2010/January 2011
Application dates: March 30th 2010/October 30th 2010
Research fields: Speech processing, speech synthesis, pattern recognition, statistical signal processing, machine learning.

Project Description:
Speech transformation refers to the various modifications one may apply to the sound produced by a person, speaking or singing. It covers a wide area of research from speech production modeling and understanding to perception of speech, from natural language processing, modeling and control of speaking style, to pattern recognition and statistical signal processing. Speech Transformation has many potential applications in areas like entertainment, film and music industry, toys, chat rooms and games, dialog systems, security and speaker individuality for interpreting telephony, high-end hearing aids, vocal pathology and voice restoration.
In speech transformation, the majority of work is dedicated to pitch modification as well as to timbre transformation. Many techniques have been suggested in the literature, among which methods based on PSOLA, Sinusoidal Modeling, Harmonic plus Noise Model, Phase Vocoder and STRAIGHT. The above methods yield high quality for moderate pitch modifications and for well-mastered spectral envelope modifications. For more sophisticated transformations, the output speech cannot be considered natural.
During this thesis, we will focus on the re-definition of pitch and timbre modification in order to develop a high quality speech modification system. This will be designed and developed in the context of a quasi-harmonic speech representation which was recently suggested for high-quality speech analysis and synthesis purposes.
Salary: around 1700 € net per month.
Please send applications (CV+ 2 ref letters) or questions to:
Yannis Stylianou
Tel: +30 2810 391713
styliano@ics.forth.gr
Olivier Rosec
Tel: +33 2 96 05 20 67
olivier.rosec@orange-ftgroup.com

6-7

(2010-03-19) Post doc Speech recognition position at INRIA Rennes

Title : Bayesian networks for modeling and handling variability sources in speech recognition

- Location: INRIA Nancy Grand Est research center --- LORIA Laboratory, NANCY, France

- Project-team: PAROLE

Contact: Denis Jouvet (denis.jouvet@loria.fr)

In state-of-art speech recognition systems, Hidden Markov Models (HMM) are used to model the acoustic realization of the sounds. The decoding process compares the unknown speech signal to sequences of these acoustic models to find the best matching sequence which determines the recognized words. Lexical and grammatical constraints are taken into account during the decoding process; they limit the amount of model sequences that are considered in the comparisons, which, nevertheless remains very large. Hence precise acoustic models are necessary for achieving good speech recognition performance. To obtain reliable parameters, the HMM-based acoustic models are trained on very large speech corpus. However, speech recognition performance is very dependent on the acoustic environment: good performance is achieved when the acoustic environment matches with that of the training data, and performance degrades when the acoustic environment gets different.

The acoustic environment depends on many variability sources which impact on the acoustic signal. This includes the speaker gender (male / female), individual speaker characteristics, the speech loudness, the speaking rate, the microphone, the transmission channel, and of course the noise, to name only of few of them [Benzeghiba et al, 2007]. Using a training corpus which exhibits too many different variability sources (for example many different noise levels, too different channel speech coding schemes, ...) makes the acoustic models less discriminative, and thus lowers the speech recognition performance. On the opposite, having many sets of acoustic models, each one of them dedicated to a specific environment condition raises training problems. Indeed, because each training subset is restricted to a specific environment condition, its size gets much smaller, and consequently it might be impossible to train reliably some parameters of the acoustic models associated to this environment condition.

In recent years, Dynamic Bayesian Networks (DBN) have been applied in speech recognition. In such an approach, certain model parameters are set dependent on some auxiliary features, such as articulatory information [Stephenson et al., 2000], pitch and energy [Stephenson et al. 2004], speaking rate [Shinozaki & Furui, 2003] or some hidden factor related to a clustering of the training speech data [Korkmazsky et al., 2004]. The approach has also been investigated for dealing with multiband speech recognition, non-native speech recognition, as well as for taking estimations of speaker classes into account in continuous speech recognition [Cloarec & Jouvet, 2008]. Although the above experiments were conducted with limited vocabulary tasks, they showed that Dynamics Bayesian Networks provide a way of handling some variability sources in the acoustic modeling.

The objective of the work is to further investigate the application of Dynamic Bayesian Network (DBN) for continuous speech recognition application using large vocabularies. The aim is to estimate the current acoustic environment condition dynamically, and to constraint the current acoustic space used during decoding accordingly. The underlying idea is to be able to handle various range of acoustic space constraints during decoding. Hence, when the acoustic environment condition estimation is reliable, the corresponding specific condition constraints can be used (leading, for example, to model parameters associated to a class of very similar speakers in a given environment). On the opposite, when the acoustic environment condition estimation is less reliable, more tolerant constraints should be used (leading, for example, to model parameters associated to a broader class of speakers or to several environment conditions).

Within the formalism of Dynamic Bayesian Networks, the work to be carried out is the following. The first aspect concerns the optimization of the classification of the training data, and associated methods for estimating the classes that best matches unknown test data automatically. The second aspect involves the development of confidence measures associated to the classification process of test sentences, and the integration of these confidence measures in the DBN modeling (in order to constraint more or less the acoustic space for decoding according to the reliability of the environment condition estimation).

6-8

(2010-03-24) Post docs en synthese et en codage chez Orange Labs Lannion France

Proposition
post-doctorale en synthèse vocale Une offre de recherche post-doctorale dans le domaine de la synthèse vocale est à pourvoir à Orange Labs, Lannion, France.

L'étude portera sur la conception et l'implémentation d'un système de synthèse hybride alliant sélection d'unités et HMM. L'objectif du travail est dans un premier temps de développer un ensemble d'outils d'apprentissage des modèles HMM à partir de bases de données acoustiques d'Orange Labs. Il s'agit dans un deuxième temps d'implémenter les fonctionnalités de génération des paramètres acoustiques dans le synthétiseur temps réel d'Orange Labs. Enfin, la troisième partie consistera à concevoir et implémenter un système hybride combinant unités sélectionnées et unités générées à partir de modèles HMM.

Compétences requises :
- Doctorat en informatique
- Très bonnes connaissances en apprentissage automatique (HMM notamment)
- Très bonne maîtrise de la programmation en C/C++
- La maîtrise d'HTK/HTS est un plus

Rémunération :
environ 2300 € net par mois selon expérience.

Date de clôture : 30 avril 2010.

Contact:
Cedric BOIDIN Tel: +33 2 96 05 33 53
cedric.boidin@orange-ftgroup.com

===========================================================

Proposition post-doctorale en codage de la parole pour la synthèse vocale Une offre de recherche post-doctorale dans le domaine de la synthèse vocale est à pourvoir à Orange Labs, Lannion, France.

L'étude portera sur la conception et l'implémentation de nouvelles méthodes de codage de la parole adaptée à la synthèse vocale. L'objectif du travail est d'une part de proposer de nouveaux algorithmes pour la compression de dictionnaires acoustiques dans le cadre de la synthèse par concaténation. D'autre part, le travail consistera également à implémenter des briques algorithmiques de codage/décodage de la parole dédiées à la synthèse paramétrique (par HMM). Ce contrat post-doctoral d'un an s'inscrit dans le cadre d'une collaboration étroite entre Orange Labs et l'Université de Crète. Des déplacements sont donc à prévoir entre ces deux entités.

Compétences requises :
- Excellentes connaissances en traitement du signal et codage de la parole ;
- Très bonnes connaissances de la programmation en C, C++ ;
- Bonne maîtrise des environnements de développements sous Linux et Windows.

Rémunération :
environ 2300 € net par mois selon experience.

Date de clôture : 30 juin 2010.

Contacts:
Olivier Rosec Tel: +33 2 96 05 20 67 olivier.rosec@orange-ftgroup.com
Yannis Stylianou Tel: +30 2810 391713 styliano@ics.forth.gr

6-9

(2010-03-25) PhD grant at INRIA Loria Nancy France

PhD Thesis position at INRIA Nancy

Motivations

Through collaboration with a company which sells documentary rushes,
we are interested in indexing these rushes using the automatic recognition
of the rush dialogues.

The speech team has developed a system for automatic transcription
of broadcast news: ANTS.

Automatic transcription systems are now
reliable to transcript read or 'prepared' speech such as broadcast
news, but their performance decreases on spontaneously uttered speech[1, 4, 5].
Spontaneous speech is characterized by :
- speech disfluencies (filled pause, repetition, repair, false start and partial word),
- pronunciation variants as word and syllable contractions (/want to/ > /wanna/),
- speaking rate variations (reducing the articulation of some phonemes and lengthening other phonemes)
- live environment (laughs, applause) and simultaneous speech.
In addition to disfluencies, spontaneous speech is characterized by ungrammatical sentences and a language register which
is difficult to model because of the small amount of available transcribed data. Therefore, processing spontaneous speech is one of
the challenges of Automatic Speech Recognition (ASR).

Subject
The purpose of the subject is to take into account the specific phenomena
related to spontaneous speech such as hesitations, pauses, false starts, ... to improve the recognition rate[4,6,7].
To do this, it will be necessary to model these specific phenomena.

We have a speech corpus in which these events were
labeled. This corpus will be used to select parameters, estimate models and evaluate the results.

Scope of Work
The work will be done within the Speech team of Inria-Loria.
The student will use the software ANTS for automatic speech recognition developed by the team.

Profile of candidate
The applicants for this PhD position should be fluent in English or in French. Competence in French is optional, though applicants will be encouraged to acquire this skill during training.
Strong software skills are required, especially Unix/linux, C, Java, and a scripting language such as Perl or Python.

Contact:
fohr@loria.fr or illina@loria.fr or mella@loria.fr

[1] S. Galliano, E. Geoffrois, D.Mostefa , K. Choukri, JF. Bonastre and G. Gravier, The ESTER Phase II Evaluation Campaign for Rich Transcription of French broadcast news, EUROSPEECH 2005,
[2] I. Irina, D. Fohr, O. Mella and C.Cerisara, The Automatic News Transcription System: ANTS some realtime experiments, ISCPL2004
[3] D. Fohr, O. Mella, I. Irina and C. Cerisara, Experiments on the accuracy of phone models and liaison processing in a French broadcast news transcription system, ISCPL2004
[4] J.-L Gauvain, G. Adda, L. Lamel, L. F. Lefevre and H. Schwenk, Transcription de la parole conversationnelle Revue TAL vol 45 n3
[5] M. Garnier-Rizet, G. Adda, F. Cailliau, J.-L. Gauvain, S. Guillemin-Lanne, L. Lamel, S. Vanni, C. Waaste-Richard, CallSurf: Automatic transcription, indexing a nd structuration of call center conversational speech for knowledge extraction and query by content. LREC 2008
[6] J.Ogata, M.Goto, The use of acoustically detected filled and silent pauses in spontaneous speech recognition ICASSP 2009
[7] F. Stouten, J. Duchateau, J.-P. Martens and P. Wambacq, Coping with disfluencies in spontaneous speech recognition: Acoustic detection and linguistic context manipulation, Speech Communication vol 48, 2006

6-10

(2010-04-08) Post-Doctoral Position at EURECOM, Sophia Antipolis, France

Post-Doctoral Position at EURECOM, Sophia Antipolis, France

Title:      Adaptable speech activated robot-interface for the elderly
Department: Multimedia Communications
URL:        http://www.eurecom.fr/mm.en.htm

Start date: Early summer 2010
Duration:   18 months

Description: EURECOM’s Multimedia Communications Department invites applications for a full-time, 18-month post-doctoral position related to a project recently awarded from the EU Ambient Assisted Living joint research and development funding programme. The Adaptable Ambient LIving ASsistant (ALIAS) project aims to develop a mobile robot system that interacts with elderly users, monitors and provides cognitive assistance in daily life, and promotes social inclusion by creating connections to people and events in the wider world. One of ALIAS’s goals involves the development of an adaptable speech interface and is to be developed through this research position. It requires the development of speaker diarization, localization and speech recognition systems in order to identify and track users, in addition to speech synthesis and recognition to communicate and recognize spoken commands. All of these technologies will be integrated into a dedicated dialogue manager.

Requirements: The successful candidate will have been awarded a PhD degree in a relevant field of speech processing prior to their joining Eurecom. You will have a strong research track record with significant publications at leading international conferences and/or in journals. Experience of collaborative research and development projects at the European level is desirable. You will be highly motivated to undertake challenging, applied research and have excellent English language speaking and writing skills.   French language skills are a bonus.

Applications: Please send to the address below (i) a one page statement of research interests and motivation, (ii) your CV and (iii) contact details for two referees (preferably one from your PhD or most recent research supervisor) before 31st May 2010.

Contact:        Dr Nicholas Evans
Postal address: 2229 Route des Crêtes BP 193,
                F-06904 Sophia Antipolis cedex, France
Email address: nicholas.evans@eurecom.fr
Web address:    http://www.eurecom.fr/main/institute/job.en.htm
Phone:          +33/0 4 93 00 81 14
Fax:            +33/0 4 93 00 82 00

EURECOM is located in Sophia Antipolis, a vibrant science park on the French Riviera. It is in close proximity with a large number of research units of leading multi-national corporations in the telecommunications, semiconductor and biotechnology sectors, as well as other outstanding research and teaching institutions. A freethinking, multinational population and the unique geographic location provide a quality of life without equal.

6-11

(2010-04-20) 2year post doc at Telecom Paris

2-year postdoctoral/research position in Audio and Multimedia scene analysis using multiple sensors Post-Doc

DeadLine: 30/09/2010 Firstname.Lastname@telecom-paristech.fr
http://www.tsi.telecom-paristech.fr/en/open-positions-phd-thesis-internships
/ *2-year Post-Doc/Research position in Audio and Multimedia scene analysis using multiple sensors

Place:
TELECOM ParisTech (ENST), Paris, France (http://www.telecom-paristech.fr/) /Duration:
2 years (1 year renewable for a second year)
Start:
Any date from September 1st , 2010
Salary: according to background and experience

* *****Position description**

The position is supported by the European Network of Excellence project “3Dlife” that aims to integrate research conducted within Europe in the field of Media Internet. In this framework, the research conducted encompasses all aspects related to the Analysis/synthesis of 3D audiovisual content for 3D models animation, virtual humans and virtual environments creation.
The role of the PostDoc/researcher will consist, on the one hand, in participating to the network collaborative integration activities, and on the other hand in conducting forefront research in the domain of audio and multimedia scene analysis using multiple sensors. For one of the use cases envisaged (multimedia dance scenes analysis), the signals are of different natures (Music, videos and potentially also electrical sensor output signals) and captured by multiple sensors of potential variable quality. A specific interest will be devoted to the development of innovative statistical fusion approaches capable of processing information on multiple semantic levels (from low level features to high level musical or video concepts). Machine learning methods such as Bayesian networks, support vector machines, Markov and semi-Markov models, or boosting are amongst the statistical frameworks of particular interest for this research.

*Candidate Profile*

As minimum requirements, the candidate will have:
· A PhD in audio or multimedia signal processing, speech processing, statistics, machine learning, computer science, electrical engineering, or a related discipline.
· Some knowledge in audio signal processing
· Programming skills in particular in Matlab (knowledge of Python would be a plus) The ideal candidate would also have: - Solid knowledge of machine learning techniques, in particular classification, temporal sequence segmentation and multi-sensor information fusion; - Ability to take over research project management responsibilities and work in a multi-partner and international collaborative environment; - Strong communication skills in English.
*Contacts*

Interested applicants may contact Gaël Richard or Slim Essid for more information or directly email a candidacy letter including a Curriculum Vitae, a list of publications and a statement of research interests. - Gaël Richard (firstname.lastname@telecom-Paristech.fr) ; +33 1 45 81 73 65 - Slim Essid (firstname.lastname@Telecom-ParisTech.fr).

*More info on 3Dlife at :*
http://www.3dlife-noe.eu/

6-12

(2010-04-21) Post doc at LIMSI Paris

Post-doctoral position at LIMSI – Audio & Acoustic Group

The Audio & Acoustic group at LIMSI (http://www.limsi.fr/Scientifique/aa/) is currently recruiting for a for a 1 year postdoctoral research position on a CNRS grant. Two research subjects are available for this singular position. Candidates should send a CV, letter of motivation for the selected topic, and at least 2 references to Christophe d’Alessandro: cda@limsi.fr. Candidature letters of motivation should cite previous experience, relevance, and specific interests related to the details of the project. These documents should be received by then end of May. Notification should be made by the end of June. The selected candidate should be available to start between 1-July and 1-October.

Research Subject 1- A study on expressive prosody and voice quality using a gesturally driven voice synthesizer.

The modeling and analysis of expressive prosody raise many problems both on the perceptive and on the signal measurement sides. The analysis of voice source quality changes for expressive speech in particular faces the limitation of inversion procedures. The Audio & Acoustic group at LIMSI has developed a real-time version of the CALM voice source synthesizer (Doval et al., 2003; d’Alessandro et al., 2006; Le Beux, 2009), mapped on to several gestural controllers (graphic tablet, joystick, cyber glove...). This device constitutes a powerful tool for the analysis of expressive voice quality in an analysis by synthesis paradigm.

Hand gestures have been proven to be adequate in order to control prosodic variations (d’Alessandro et al., 2007). By playing this speech synthesizer, like a music through gestural device, a user is able to generate language specific interjections, based on vocalic expressive non-words. Such non-words are meaningful in a given language and culture and convey strong cues to meaning during a spoken interaction (see Wierzbicka, 1992, and also Contini, 1989, for Sardinian or Campbell, 2007, for Japanese).

The proposed project aims at acquiring data from such gesturally driven speech production. Then, the analysis of the synthesizer’s parameters in light of perception tests’ results may help to gain a better understanding of the use of voice quality variations during expressive speech. The different stages of the project (gestural production of expressive speech, subjective evaluation of the productions, modeling of the acoustic parameters of the voice) require different skills and the successful candidate will be able to focus on parts of the projects according to his/her own research interests. This project may also be extended towards the use and evaluation of an immersive 3D expressive speech synthesizer (see Research Subject 2).

The successful candidate will have a PhD in either phonetic, language sciences or psycholinguistics or any related field (with a strong emphasis on speech prosody analysis), and/or a PhD in signal processing or natural language processing (with a good knowledge of acoustic voice analysis). Musical training/practice would be an advantage.

References:

d'Alessandro, C., Rilliard, A. & Le Beux, S. (2007). Computerized chironomy : evaluation of hand-controlled intonation reiteration. INTERSPEECH 2007, Antwerp, Belgium : 2007, 1270-1273.

d'Alessandro, N., d'Alessandro, C., Le Beux, S. & Doval, B. (2006). Real-time CALM synthesizer : new approaches in hands-controlled voice synthesis. New Interface for Musical Expression, Paris, France, 266-271.

Contini, M. (1989). L’interjection en Sarde. Une approche linguistique.In Espaces Romans. Études de dialectologie et de géolinguistique offertes à Gaston Tuaillon, Volume 2, ELLUG:Grenoble, 320-329.

Campbell, N. (2007). The role and use of speech gestures in discourse. Archives of Acoustics, 32(4), 803–814.

Doval, B. ; D'alessandro, C. ; Henrich, N. (2003). The voice source as a causal/anticausal linear filter. VOQUAL'03 - Voice Quality : functions, analysis and synthesis, Geneva, Switzerland : 2003.

Le Beux, S. (2009). Contrôle gestuel de la prosodie et de la qualité vocale. PhD thesis, Université Paris Sud/LIMSI, Orsay, France.

Wierbizcka, A. (1992). The semantic of interjection. Journal of pragmatics, 18, 159-192.

Research Subject 2- Vocal directivity, real and virtual, study and practice

Directivity of the human voice has been the topic of recent research efforts. This 1 year post-doc position concerns the development and combination of recent efforts at LIMSI pertaining to the measurement and understanding of vocal directivity, and its integration into an immersive virtual environment. Preliminary studies on vocal directivity patterns have recently been shown to vary significantly between phonemes. Relying predominantly on several recently acquired databases of sung and spoken voice directivity, a detailed analysis shall be carried out. Two branches of research are then open to follow. First, it is hoped that a numerical model could be employed (using Boundary Element Method modeling) to validate any conclusions on the physical basis for the directivity variations. A second direction of the proposed project concerns the incorporation of voice directivity patterns into an immersive virtual environment text-to-speech simulator. These variations have also been found to be perceptible, and there is an interest to study to what degree these variations are important to the perceived quality of immersive environments. As such, there shall be implementation of the effect, as well as the performance of subjective evaluations. This integration can be in the form of a text-to-speech synthesizer, or an expressive voice synthesizer (see Research Subject 1). The proportion of effort allocated to these two aspects of the project will depend on the skills and interest of the chosen candidate.

A thesis in acoustics, audio signal processing, or other similar field is required. Familiarization with measurement and analysis procedures is required. Familiarity with general software tools such as MatLab, real-time processing software, such as Max/MSP or PureData, and BEM software are a benefit. Candidates should be highly motivated, and capable of working both independently and in a multi-disciplinary group environment.

References:

Katz, Brian F.G. & d'Alessandro, Christophe, 'Directivity Measurements of the Singing Voice.' Proceedings of the 19th International Congress on Acoustics (ICA'2007), Madrid, 2-7 September 2007.

Brian F.G. Katz, Fabien Prezat, and Christophe d'Alessandro, 'Human voice phoneme directivity pattern measurements.' Fourth Joint Meeting: ASA and ASJ, Honolulu, November 2006, J. Acoust. Soc. Am., Vol. 120(5), Pt. 2, November 2006.

Martin, J.-C., d'Alessandro, C., Jacquemin, C., Katz, B., Max, A., Pointal, L. and Rilliard, A., '3D Audiovisual Rendering and Real-Time Interactive Control of Expressivity in a Talking Head.' Proceedings of the 7th

International Conference on Intelligent Virtual Agents (IVA'2007), Paris, France September 17-19, 2007.

Martin, J.-C., Jacquemin, C., Pointal, L., Katz, B., d'Alessandro, C., Max, A. and Courgeon, M., 'A 3D Audio-Visual Animated Agent for Expressive Conversational Question Answering.' International Conference on Auditory-Visual Speech Processing (AVSP'2007). Eds. J. Vroomen, M. Swerts, E. Krahmer. Hilvarenbeek, The Netherlands August 31 - September 3, 2007.

Martin, J.-C.; D'Alessandro, C.; Jacquemin, C.; Katz, B.F.G.; Max, A.; Pointal, L.; Rilliard, A., '3D audiovisual rendering and real-time interactive control of expressivity in a Talking Head.' IVA 2007. 7th International Conference on Intelligent Virtual Agents, Paris, 17-19 September 2007.

N. Misdariis, A. Lang, B. Katz and P. Susini, 'Perceptual effects of radiation control with a multi-louspeaker device.' Proceedings of the 155th ASA, 5th Forum Austicum, & 2nd ASA-EAA Joint Conference, Paris, 29Jun-6Jul 2008.

Markus Noisternig, Brian FG Katz, Samuel Siltanen, and Lauri Savioja, 'Framework for Real-Time Auralization in Architectural Acoustics.' Journal of Acta Acustica united with Acoustica, Vol. 94 (2008), pp. 1000-1015, doi 10.3813/aaa.918116

M. Noisternig, L. Savioja and B. Katz, 'Real-time auralization system based on beam-tracing and mixed-order Ambisonics.' Proceedings of the 155th ASA, 5th Forum Austicum, & 2nd ASA-EAA Joint Conference, Paris, 29Jun-6Jul 2008.

M. Noisternig, B. Katz and C. D'Alessandro, 'Spatial rendering of audio-visual synthetic speech use for immersive environments.' Proceedings of the 155th ASA, 5th Forum Austicum, & 2nd ASA-EAA Joint Conference, Paris, 29Jun-6Jul 2008.

The LIMSI is located approximately 30 minutes South of Paris by commuter train (RER B). The laboratory accommodates approximately 120 permanent personnel (researchers, professors and assistant professors, engineers, technicians) and about sixty PhD candidates. It undertakes multidisciplinary research in Mechanical and Chemical Engineering and in Sciences and Technologies for Information and Communication. The research fields cover a wide disciplinary spectrum from thermodynamics to cognition, encompassing fluid mechanics, energetics, acoustics and voice synthesis, spoken language and text processing, vision, virtual reality...

6-13

(2010-04-21) Professor at the University of Amsterdam (The Netherlands)

Faculty of Humanities
The Faculty of Humanities provides education and conducts research with a strongly international profile in a large number of disciplines in the field of language and culture. Located in the heart of Amsterdam, the Faculty maintains close ties with many cultural institutes in the capital city. There are almost 1,000 employees affiliated with the Faculty, which has about 7,500 students.
The Department of Dutch Studies currently has a vacancy for a professor in
Speech Communication
1.0 FTE

Job description

The chair in Speech Communication is charged with teaching and research in the broad field of speech communication, which also includes argumentation and rhetoric in institutional contexts. The Faculty of Humanities consists of six departments: History, Archaeology and Area Studies; Art, Religion and Cultural Sciences; Media Studies; Dutch Studies; Language and Literature; and Philosophy. Each department is made up of sections comprising one or more professors and a number of other academic staff working in the relevant field.
The chair in Speech Communication is part of the Speech Communication, Argumentation Theory and Rhetoric section in the Department of Dutch Studies. The Department further comprises sections of the Dutch Literature and the Dutch Linguistics. At present, the section of Speech Communication, Argumentation Theory and Rhetoric has a staff of more than 12 full-time equivalent positions (FTE). Financial developments permitting, additional staff may be recruited during the coming years.

Tasks

The teaching tasks of the professor of Speech Communication focus mainly on the BA and MA programmes in Dutch Language and Culture, the BA programme in Language and Communication, the dual MA programme in Text and Communication, the Research MA programme in Rhetoric, Argumentation Theory and Philosophy (RAP) and the MA programme track in Discourse and Argumentation Studies (DASA), along with several relevant minors and electives (for the curriculum, please see the UvA’s digital course catalogue: www.studiegids.uva.nl ). The Faculty’s BA programmes are taught within the College of Humanities, while MA and doctorate programmes are administered within the Graduate School for Humanities.
Research activities are to cover the broad field of speech communication, including argumentation and rhetoric in institutional contexts. Depending on the interests and specialisation of the appointee, these research activities will be based at either the Amsterdam School for Cultural Analysis (ASCA), the Amsterdam Center for Language and Communication (ACLC) or the interfaculty Institute for Logic, Language and Computation (ILLC).
In defining its research programme for the period 2009-2012, the Faculty has identified three research priority areas: Cultural Heritage and Identities (Cultureel Erfgoed en Identiteit), Cultural Transformations and Globalisation (Culturele Transformaties en Globalisering) and the interfaculty area of Cognitive Modelling and Learning (Cognitieve modellen en leerbaarheid). For further information about the Faculty research programme, please see: www.hum.uva.nl/onderzoek.

Profile

The candidate must be able to demonstrate a thorough knowledge of the field, as evidenced by his/her academic qualifications, publications and teaching experience. S/he has completed doctoral work on a topic in this or a related discipline and, as the prospective chair, has a good understanding of the domain as a whole.
The new professor is expected to both implement and further develop the section’s existing ambitious research profile in speech communication, argumentation theory and rhetoric. Specifically, that profile must be expanded to include the study of language usage, aspects of speech acts and stylistic features of written and oral communication. A further, key part of this process will be the reassessment of the
discipline’s educational objectives, with special emphasis on teaching in the Research Master’s programme.
The successful candidate will have wide-ranging experience of university teaching and of supervising students at all academic levels. In addition, s/he must be able to demonstrate familiarity and affinity with ICT developments relevant to teaching and research. S/he can draw on an existing national and international network in the relevant field.
Where education is concerned, the new professor will be responsible for developing and maintaining a high-quality and appealing contribution to the aforementioned study programmes. This shall be done in consultation with other staff. S/he will ensure that teaching programmes respond to society’s demand for graduates capable of making academic knowledge more accessible to a broad audience. The candidate must show willingness to collaborate with various other educational units both within the University and at other higher education institutions. In view of the Faculty’s general policy that academic staff should be capable of flexible deployment, the new professor must be prepared to teach in an interdisciplinary context as well as outside his/her direct field of expertise.
The successful candidate should have experience of teaching at all levels of university education and in all forms employed at the UvA (seminars, lectures and supervision of dissertations/theses and work experience placements) and also of the methods of assessment associated with each of these. S/he must possess teaching and educational skills of a high order and an approachable personality and manner. The candidate must have a fluent command of both Dutch and English; any appointee lacking this level of linguistic competence will be expected to acquire it within two years of taking up the post.
The importance that the Faculty attaches to this chair is reflected in the standards set for candidates in terms of research experience. The candidate must hold a doctorate degree earned either within this field or in a related discipline. S/he must be able to demonstrate a thorough knowledge of the field by reference to major contributions to international discussions in the broad domain of speech communication and to past publications, including articles in international academic journals and anthologies, as well as to contributions to the wider public debate.
In addition, the successful candidate will be expected to undertake new research, including both independent work and larger-scale projects involving partners outside the Department of Dutch Studies and Faculty of Humanities. S/he must be capable of recruiting the necessary indirect government or private sector funding for this. Further duties include the supervision of doctorate students and postdocs, and candidates are expected to possess experience relevant to the exercise of these responsibilities. In addition, the new professor will be expected to maintain close contacts with the field, or to be in a position to establish such contacts.
The appointee will have administrative responsibility for his/her own field of activity. First and foremost, this will require inspiring and supportive team leadership. By encouraging staff and providing constructive criticism, the new professor will help to advance the quality and effectiveness of University teaching and research. Specific means of achieving this will include regular team meetings with staff and annual consultations and assessment interviews.
In addition, the new professor will be expected to undertake general administrative and organisational duties both within and outside the Faculty. Substantial evidence of practical experience in these areas is extremely desirable.
In keeping with University policy, candidates should hold a Master’s or doctoral degree and have at least three years subsequent work experience at a university or academic research institute other than the UvA, preferably abroad.

Further information

For further information, please contact the secretary of the selection committee, Mr H.A. Mulder, tel. 020-525 3066, email H.A.Mulder@uva.nl, or the committee chairman, Prof. F.P. Weerman, tel. 020-525 4737, email F.P.Weerman@uva.nl.

Appointments

The initial appointment will be on a temporary basis for a period of no more than two years. Subject to satisfactory performance, this will be followed by a permanent appointment. The gross salary will normally conform to professorial scale 2 (between €4904 and €7142 per month on a full-time basis in
accordance with the salary scale established in January 2009). In certain cases, however, different terms of employment may be offered.

Application procedure

Please submit a letter of application in Dutch or English by no later than 15 May 2010, accompanied by a CV and list of publications. The application should be addressed to the selection committee for the chair in Speech Communication, c/o Mr H.A. Mulder, Office of the Faculty of Humanities, Spuistraat 210, 1012 VT Amsterdam, the Netherlands, and should be sent in an envelop marked ‘strictly confidential’.
Applications will be reviewed by the selection committee, headed by the chair of the Department of Dutch Studies, Prof. F.P. Weerman. The selection procedure includes a formal assessment and a trial public lecture, on the basis of which the committee makes a recommendation to the Dean of the Faculty of Humanities. The committee will make an initial selection before the summer recess and invite candidates for interviews in September.

6-14

(2010-05-12) 2 PhD Positions at Vrije Universiteit of Brussel Belgium

PhD position in Audio Visual Signal Processing

ETRO – AVSP – Vrije Universiteit Brussel

PhD position in audiovisual crossmodal attention and multisensory integration.

Keywords: audio visual signal processing, scene analysis, cognitive vision.

The Vrije Universiteit Brussel (Brussels, Belgium; http://www.vub.ac.be), department of Electronics and Informatics (ETRO) has available a PhD position in the area of audio visual scene analysis and in particular in crossmodal attention and multisensory integration in the detection and tracking of spatio-temporal events in audiovisual streams.

The position is part of an ambitious European project aliz-e “Adaptive Strategies for Sustainable Long-Term Social Interaction”. The overall aim of the project is to develop the theory and practice behind embodied cognitive robots which are capable of maintaining believable multi-modal any-depth affective interactions with a young user over an extended and possibly discontinuous period of time.

Within this context, audiovisual attention plays an important role. Indeed, attention is the cognitive process of selectively concentrating on an aspect of the environment while ignoring others. The human selective attention mechanism enables us to concentrate on the most meaningful signals amongst all information provided by our audio-visual senses. The human auditory system is able to separate acoustic mixtures in order to create a perceptual stream for each sound source. It is widely assumed that this auditory scene analysis interacts with attention mechanisms that select a stream for attentional focus. In computer vision, attention mechanisms are mainly used to reduce the amount of data for complex computations. They employ a method of determining important, salient units of attention and select them sequentially for being subjected to these computations. The most common visual attention model is the bottom-up approach which uses basic features, conjunctions of features or even learned features as saliency information to guide visual attention. Attention can also be controlled by top-down or goal-driven information relevant to current behaviors. The deployment of attention is then determined by an interaction between bottom-up and top-down attention priming or setting.

Motivated by these models, the present research project aims at developing a conceptual framework for audio-visual selective attention in which the formation of groups and streams is heavily influenced by conscious and subconscious attention.

The position will be within the ETRO research group (http://www.etro.vub.ac.be) under supervision of Prof. Werner Verhelst and Prof. Hichem Sahli, but will also have close collaboration and regular interaction with the research groups participating in Aliz-e.

The ideal candidate is a team worker having theoretical knowledge and practical experience in audio and image processing, machine learning and/or data mining. He/she is a good programmer (preferably matlab or C++). He or she is in the possession of a 2 year master in engineering science (electronics, informatics, artificial intelligence or other relevant discipline).

The position and research grant are available from June 2010. The position is for 4 years.

Applicants should send a letter explaining their research interests and experience, a complete curriculum vitae (with the relevant courses and grades), and an electronic copy of their master thesis (plus, optionally, reports of other relevant projects) to wverhels@etro.vub.ac.be

============================================================Post Doc Position in Audio-Visual Signal Processing & Machine Learning

ETRO – AVSP – Vrije Universiteit Brussel

Post Doctoral Position in audiovisual signal processing and machine learning.

Keywords: audio visual signal processing, scene analysis, machine learning, affective human-robot interaction.

The Vrije Universiteit Brussel (Brussels, Belgium; http://www.vub.ac.be), department of Electronics and Informatics (ETRO) has available a Post Doctoral position in the area of audio visual signal processing and multi-modal affective interaction.

The position is part of an ambitious European project aliz-e “Adaptive Strategies for Sustainable Long-Term Social Interaction”. The overall aim of the project is to develop the theory and practice behind embodied cognitive robots which are capable of maintaining believable multi-modal any-depth affective interactions with a young user over an extended and possibly discontinuous period of time.

Skills:

PhD with concentration in relevant areas or closely related areas, such as audiovisual speech processing, audiovisual scene analysis, human-machine interaction, affective computing, machine learning
Track record of important publications
Ability to generate new ideas and apply them to human-machine applications
Good programming skills, especially implementation of complex algorithms
Highly motivated and willing to coordinate the work of 2-3 PhD students
Proficient in English, both written and spoken
Knowledge of the Dutch language is a plus but not a requirement

The position is available from June 2010 at a competitive salary. The position is guaranteed for 3 years and can be extended. In addition, candidates that qualify for an Odysseus grant from the Research Foundation Flanders will be encouraged and supported to do so (http://www.fwo.be/Odysseusprogramma.aspx).

Applicants should send a letter explaining their research interests and experience, a complete curriculum vitae and recommendation letters to wverhels@etro.vub.ac.be

6-15

(2010-05-12) Post doc at Universite de Bordeaux, Talence , France

Sélection de modèles pour les systèmes de Markov à saut Post-Doc

DeadLine: 31/07/2010
audrey.giremus@ims-bordeaux.fr, eric.grivel@ims-bordaux.fr
http://www.ims-bordeaux.fr/IMS/pages/accueilEquipe.php?guidPage=NGEwMTNmYWVhODg3OA==&groupe=RECH_EXT
Lieu : Laboratoire IMS, Groupe Signal, Talence, Bordeaux.
Date : Septembre 2010 Domaine : Traitement du signal
Contacts : audrey.giremus@ims-bordeaux.fr, eric.grivel@ims-bordaux.fr

Fournir un CV avec 2 lettres de personnes référentes Le post-doctorat proposé porte sur les approches de sélection de modèles pertinents dans un contexte d’estimation par algorithmes dits à modèles multiples. Ces approches consistent à mettre en compétition plusieurs modèles pour décrire l’évolution de l’état d’un système que l’on cherche à estimer. Les premiers algorithmes proposés [1] considèraient des modèles linéaires Gaussiens et étaient donc fondés sur une estimation du vecteur état par filtrage de Kalman. Avec le développement des méthodes de filtrage particulaire [2], le problème posé s’élargit au contexte des systèmes dits de Markov à saut dont l’évolution peut être décrite par différentes lois de probabilités. Dans ce cadre, le post-doctorant s’interrogera sur le choix a priori des modèles d’évolution de l’état du système. Si ce choix n’est pas dicté par des considérations physiques, différentes questions peuvent alors être soulevées telles que : - le nombre optimal de modèles à utiliser, - la validité des modèles sélectionnés, - l’influence du degré de recouvrement ou de ressemblance de ces modèles. Ainsi, il conviendra de déterminer si le fait d’utiliser un jeu de modèles très « différents » les uns des autres permet d’améliorer l’estimation de l’état du système. Le post-doctorant sera donc amené à étudier/développer des critères permettant de mesurer la ressemblance entre deux modèles, ou plus génériquement entre deux lois de probabilité, et s’intéressera entre autres à des outils tels que le facteur de Bayes ou la déviance Bayésienne [3].

[1]H. A. P. Blom,Y. Bar-Shalom, The interacting multiple model algorithm for systems with Markovian switching coefficients, IEEE Trans. Autom. Control, 33 8, (1988), 780–783.
[2]M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, A tutorial on particle filters for online nonlinear/ non-Gaussian Bayesian tracking, IEEE Trans. Signal Processing, vol. 50, no. 2, pp. 174-188, 2002.
[3]C.P. Robert, Le Choix Bayésien, Springer Editions, 2005.

6-16

(2010-05-20) Associate professor at Nanjng Normal University China

Associate Professor or Lecturer positions in Phonetic Science and Speech Technology at Nanjing Normal University, China

The Department of Linguistic Science and Technology at Nanjing Normal University, China, invites applications for two positions at Associate Professor or Lecturer level in the area of Phonetic Sciences and Speech Technologies.

Nanjing Normal University (NNU) is situated in Nanjing, a city in China not only famous for its great history and culture but also pride for excellence in education and academy. With Chinese-style buildings and garden-like environment, the NNU campus is often entitled as the “Most Beautiful Campus in the Orient.”

NNU is among the top 5 universities of China in the area of Linguistics. Placing a strong emphasis on interdisciplinary research, the Department of Linguistic Science and Technology at NNU is unique in that it bridges the studies of theoretical and applied linguistics, cognitive sciences, and information technologies. A new laboratory has recently been established in phonetic sciences and speech technologies, to stimulate a closer collaboration between linguists, phoneticians, psychologists, and computer/engineering scientists. The laboratory is very well equipped, possessing sound-proof recording studio, professional audio facilities, physiological instruments (e.g., EGG, EMG, EPG, airflow and pressure module, and nasality sensor), EEG for ERP studies, and Linux/Windows workstations.

We welcome interested colleagues to join us. The research can cover any related areas in phonetic sciences and speech technologies, including but not limited to speech production, speech perception, prosodic modeling, speech synthesis, automatic speech recognition and understanding, spoken language acquisition, and computer-aided language learning. Outstanding research support will be offered. The position level will be determined based on qualifications and experience.

Requirements:

* A PhD degree in related disciplines (e.g., linguistics, psychology, physics, applied mathematics, computer sciences, and electronic engineering) is preferred, though a MS degree with a distinguished experience in R&D of speech technologies at world-class institutes/companies is also acceptable

* 3+ years’ experience and strong publication/patent record in phonetic sciences or speech technologies

* Good oral and written communication skills in both Chinese and English

* Good programming skills

* Team work spirit in a multidisciplinary group

* Candidates working in any related topics are encouraged to apply, but those who have backgrounds and research interests in both phonetic/linguistic sciences and speech technologies will be considered with preference

Interested candidates should submit a current CV, a detailed list of publication, the copies of the best two or three publications, and the contact information of at least two references. The application and any further enquiry about the positions should be sent to Prof. Wentao GU by email (preferred) or regular mail to the following address:

Prof. Wentao GU

Dept of Linguistic Science and Technology

Nanjing Normal University

122 Ning Hai Road, Nanjing

Jiangsu 210097, China

Phone: +86-189-3687-2840

Email: wentaogu@gmail.com wtgu@njnu.edu.cn

The positions will keep open until they are filled.

6-17

(2010-05-21) Post doc at LORIA Nancy France

Title : Bayesian networks for modeling and handling variability sources in speech recognition

- Location: INRIA Nancy Grand Est research center --- LORIA Laboratory, NANCY, France
- Project-team: PAROLE
Contact: Denis Jouvet (denis.jouvet@loria.fr)

In state-of-art speech recognition systems, Hidden Markov Models (HMM) are used to model the acoustic realization of the sounds. The decoding process compares the unknown speech signal to sequences of these acoustic models to find the best matching sequence which determines the recognized words. Lexical and grammatical constraints are taken into account during the decoding process; they limit the amount of model sequences that are considered in the comparisons, which, nevertheless remains very large. Hence precise acoustic models are necessary for achieving good speech recognition performance. To obtain reliable parameters, the HMM-based acoustic models are trained on very large speech corpus. However, speech recognition performance is very dependent on the acoustic environment: good performance is achieved when the acoustic environment matches with that of the training data, and performance degrades when the acoustic environment gets different. The acoustic environment depends on many variability sources which impact on the acoustic signal. This includes the speaker gender (male / female), individual speaker characteristics, the speech loudness, the speaking rate, the microphone, the transmission channel, and of course the noise, to name only of few of them [Benzeghiba et al, 2007]. Using a training corpus which exhibits too many different variability sources (for example many different noise levels, too different channel speech coding schemes, ...) makes the acoustic models less discriminative, and thus lowers the speech recognition performance. On the opposite, having many sets of acoustic models, each one of them dedicated to a specific environment condition raises training problems. Indeed, because each training subset is restricted to a specific environment condition, its size gets much smaller, and consequently it might be impossible to train reliably some parameters of the acoustic models associated to this environment condition. In recent years, Dynamic Bayesian Networks (DBN) have been applied in speech recognition. In such an approach, certain model parameters are set dependent on some auxiliary features, such as articulatory information [Stephenson et al., 2000], pitch and energy [Stephenson et al. 2004], speaking rate [Shinozaki & Furui, 2003] or some hidden factor related to a clustering of the training speech data [Korkmazsky et al., 2004]. The approach has also been investigated for dealing with multiband speech recognition, non-native speech recognition, as well as for taking estimations of speaker classes into account in continuous speech recognition [Cloarec & Jouvet, 2008]. Although the above experiments were conducted with limited vocabulary tasks, they showed that Dynamics Bayesian Networks provide a way of handling some variability sources in the acoustic modeling. The objective of the work is to further investigate the application of Dynamic Bayesian Network (DBN) for continuous speech recognition application using large vocabularies. The aim is to estimate the current acoustic environment condition dynamically, and to constraint the current acoustic space used during decoding accordingly. The underlying idea is to be able to handle various range of acoustic space constraints during decoding. Hence, when the acoustic environment condition estimation is reliable, the corresponding specific condition constraints can be used (leading, for example, to model parameters associated to a class of very similar speakers in a given environment). On the opposite, when the acoustic environment condition estimation is less reliable, more tolerant constraints should be used (leading, for example, to model parameters associated to a broader class of speakers or to several environment conditions). Within the formalism of Dynamic Bayesian Networks, the work to be carried out is the following. The first aspect concerns the optimization of the classification of the training data, and associated methods for estimating the classes that best matches unknown test data automatically. The second aspect involves the development of confidence measures associated to the classification process of test sentences, and the integration of these confidence measures in the DBN modeling (in order to constraint more or less the acoustic space for decoding according to the reliability of the environment condition estimation).

[Benzeghiba et al, 2007] M. Benzeghiba, R. de Mori, O. Deroo, S. Dupont, T. Erbes, D. Jouvet, L. Fissore, P. Laface, A. mertins, C. Ris, R. Rose, V. Tyagi & C. Wellekens: 'Automatic speech recognition and speech variability: A review'; Speech Communication, Vol. 49, 2007, pp. 763-786.

[Cloarec & Jouvet, 2008] G. Cloarec & D. Jouvet: 'Modeling inter-speaker variability in speech recognition' ; Proc. ICASSP'2008, IEEE International Conference on Acoustics, Speech, and Signal Processing, 30 March – 4 April 2008, Las Vegas, Nevada, USA, pp. 4529-4532

[Korkmazsky et al., 2004] F. Korkmazsky, M. Deviren, D. Fohr & I. Illina: 'Hidden factor dynamic Bayesian networks for speech recognition'; Proc. ICSLP'2004, International Conference on Spoken Language Processing, 4-8 October 2004, Jeju Island, Korea, pp. 1134-1137.

[Shinozaki & Furui, 2003] T. Shinozaki & S. Furui: 'Hidden mode HMM using bayesian network for modeling speaking rate fluctuation'; Proc. ASRU'2003, IEEE Workshop on Automatic Speech Recognition and Understanding, 30 November - 4 December 2003, US Virgin Islands, pp.417-422.

[Stephenson et al., 2000] T.A. Stephenson, H. Bourlard, S. Bengio & A.C. Morris: 'Automatic speech recognition using dynamic Bayesian networks with both acoustic and articulatory variables'; Proc. ICSLP'2000, International Conference on Spoken Language Processing, 2000, Beijing, China, vol. 2, pp. 951–954.

[Stephenson et al., 2004] T.A. Stephenson, M.M. Doss & H. Bourlard: 'Speech recognition with auxiliary information'; IEEE Transactions on Speech and Audio Processing, SAP-12 (3), 2004, pp. 189–203.

6-18

(2010-05-26) Post-doc position in Speech Recognition, Adaptation, Retrieval and Translation- Aalto University Finland

Post-doc position in Speech Recognition, Adaptation, Retrieval and Translation
In Aalto University School of Science and Technology (previously known as Helsinki University of Technology) in Department of Computer and Information Science

http://ics.tkk.fi/en/current/news/view/postdoc_position_in_the_speech_recognition_group/

or directly: http://www.cis.hut.fi/projects/speech/jobs10.shtml

We are looking for a postdoc to join our research group working on machine learning and probabilistic modeling in speech recognition, adaptation, retrieval and translation. Speech recognition group (led by Mikko Kurimo) belongs to the Adaptive Informatics Research Centre (by Prof. Oja, 2006-) which is the successor to the Neural Networks Research Centre (by Prof. Kohonen, 1995-2005).

We are happy to consider outstanding candidates interested in any of our research themes, for example:

· large-vocabulary speech recognition

· acoustic and language model adaptation

· speech recognition in noisy environments

· spoken document retrieval

· speech translation based on unsupervised morpheme models

· speech recognition in multimodal and multilingual interfaces

Postdoc: 1 year + extension possibilities. Starting date: near August 1, 2010. Position requires a relevant doctoral degree in CS or EE, skills for doing excellent research in a group, and outstanding research experience in any of the research themes mentioned above. The candidate is expected to perform high-quality research, and provide assistance in supervision of our PhD students.

In Helsinki you will join the innovative international computational data analysis and ICT community. Among European cities, Helsinki is special in being clean, safe, liberal, Scandinavian, and close to nature, in short, having a high standard of living. English is spoken everywhere. See. e.g. Visit Finland.

Please attach a CV including a list of publications and email addresses of 2-3 people willing to give more information. Include a brief description of research interests and send the application by email to

Mikko Kurimo, Mikko.Kurimo@tkk.fi
Adaptive Informatics Research Centre, Department of Information and Computer Science, Aalto University School of Science and Technology

6-19

(2010-05-26) Ph D grant at University of Nantes France

Fusion Strategies for Handwriting and Speech Modalities – Application in Mathematical Expression Recognition These DeadLine: 15/07/2010 christian.viard-gaudin@univ-nantes.fr http://www.projet-depart.org/index.php

Keywords : Handwriting recognition, Speech recognition, Data/decision fusion. IRCCyN - UMR CNRS 6597 - NANTES Equipe IVC

Description of the Ph-D thesis: Handwriting and speech are the two most common modalities of interaction for human beings. Each of them has specific features related to usability, expressibility, and requires dedicated tools and techniques for digitization. The goal of this PhD is to study fusion strategies for a multi-modal input system, combining on-line handwriting and speech, so that extended facilities or increased performances are achieved with respect to a single modality. Several fusion methods will be investigated in order to take advantage of a possible mutual disambiguation. They will range from early fusion to late fusion, for exploiting as much as possible redundancy and complementarity of the two streams. The joint analysis of handwritten documents and speech is a quite new area of research, and only a few works have emerged concerning applications such as identity verification [1], white board interaction [2], lecture note taking [3], and mathematical expression recognition [4]. Precisely, the focus of this thesis will be on mathematical expression recognition [4,5,6]. This is a very challenging domain where a lot of difficulties have to be faced. Specifically, the large number of symbols, and the 2D layout of expressions have to be considered. Pattern recognition, machine learning, fusion techniques will play fundamental roles in this work. This PhD is part of the DEPART (Document Ecrit, Parole et Traduction) project funded by the Pays de la Loire Region. Applications, including cover letter, CV, and the contact information for references should be emailed to christian.viard-gaudin@univ-nantes.fr
Qualification required:
• Master’s degree in computer science or a related field such as electrical, telecommunications engineering, signal processing or machine learning
• Good programming skills in C++, Java, C, Unix/Linux
• High motivation in research and applications
• Good communication skills in English or French
. French knowledge is welcome but not mandatory

Starting date: September or October 2010
Place: Nantes (France).
The position is within IRCCyN IVC team in Nantes (Christian Viard-Gaudin, H. Mouchère) in collaboration with LIUM speech team in Le Mans (Simon Petitrenaud). http://gdr-isis.org/rilk/gdr/Kiosque/poste.php?jobid=3802

6-20

(2010-06-08) Two Associate Professor positions in Speech Communication at KTH.

Two Associate Professor positions in Speech Communication at KTH.
The positions are placed in the School of Computer Science and
Communication, Department of Speech, Music and Hearing.
Further information is available on:
http://www.kth.se/om/work-at-kth/vacancies/associate-professor-in-speech-communication-1.61450?l=en_UK
and
http://www.kth.se/om/work-at-kth/vacancies/associate-professor-in-speech-communication-with-specialization-in-multimodal-embodied-systems-1.61437?l=en_UK
Deadline for applications is June 28, 2010

6-21

(2010-06-15) Ph D students at Dpt Applied Informatics at University of Bielefeld Germany

The Applied Informatics Group, Faculty of Technology, Bielefeld University is looking for PhD
candidates for a project position within the EU Initial Training Network ?RobotDoc“ in the area of Social
Learning and Interaction.

We are looking for a PhD candidate for the following project:
Development of dialogical rules. The verbal and cognitive development of infants is rooted
in dialog with other people (e.g. parents, peers). However, there is little research on how
infants develop the capability for dialogue. We hypothesize that contingency is a fundamental
mechanism that may help infants to develop their basic interactive capabilities such as turntaking
(Masataka 2003). These capabilities guide their attention to opening phases of a
dialogue, and even regulate their emotions. This project proposes to use the mechanism of
contingency to build a system that can learn dialogical rules through interaction by analyzing
the effects of its own dialogue contributions. In this project, experimental studies on children’s
dialogical capabilities are planned.

We invite applications from motivated young scientists coming from the areas of computer science,
linguistics, psychology, robotics, mathematics, cognitive science or similar, that are willing to contribute
to the cross-disciplinary research agenda of our research group. Research and development are
directed towards understanding the processes and functional constituents of cognitive interaction, and
establishing cognitive interfaces and robots that facilitate the use of complex technical systems.
Bielefeld University provides a unique environment for research in cognitive and intelligent systems by
bringing together researchers from all over the world in a variety of relevant disciplines under the roof
of central institutions such as the Excellence Center of Cognitive Interaction Technology (CITEC) or
the Research Institute for Cognition and Robotics (CoR-Lab).

Successful candidates should hold an academic degree (MSc/Diploma) in a related discipline and
have a background in experimental work (because of planned experimental studies with children) as
well as have a strong interest in social robotics. All applications should include: a short cover letter
indicating the motivation and research interests of the candidate, a CV including a list of publications,
and relevant certificates of academic qualification.

Bielefeld University is an equal opportunity employer. Women are especially encouraged to apply and
in the case of comparable competences and qualification, will be given preference. Bielefeld University
explicitly encourages disabled people to apply. Bielefeld University offers a family friendly environment
and special arrangements for child care and double carrier opportunities.

Please send your application with reference to the offered position (RobotDoc) no later than
15.7.2009 to Ms Susanne H?ke (shoeke@techfak.uni-bielefeld.de).

Contact:
Susanne H?ke
AG Applied Informatics
Faculty of Technology
Universit?tsstr. 21-23
33615 Bielefeld
Germany
Email: shoeke@techfak.uni-bielefeld.de

6-22

(2010-06-16) Postdoc position at IRISA Rennes France

Unsupervised paradigms for domain-independent video structure analysis

Post-Doc

DeadLine: 31/06/2011

guillaume.gravier@irisa.fr, mathieu.ben@irisa.fr

http://www.irisa.fr/metiss/emploi/postdoc/postdoc_unsupervised

Video structure analysis consists in dividing a video into elementary structural units such as anchor shots or interviews. Most approaches to the problem of structure analysis follow a supervised train/detect paradigm. For example, machine learning techniques have widely been used for the detection of anchor shots, specific actions, etc. Such paradigms have proven highly efficient on specific contents but lack domain and genre independence. To overcome the limitation of current techniques, we will investigate unsupervised paradigms for robust video structure analysis.

In recent years, we have been working on discovery algorithms to find out in a totally unsupervised fashion coherent or repeating elements in audio and video streams. In a very general way, the problem of unsupervised discovery can be seen as a particular case of a clustering problem. For instance, in audio contents, we have proposed variability tolerant pattern matching techniques to discover repeating chunks of signals corresponding to word-like units [1]. In video contents, we have used audiovisual consistency between audio and visual clusters to discover structural elements such as anchor persons or guest's shots in games and talkshows.

In parallel, we have been working on topic segmentation of TV programs based on their automatic transcription, developing domain-independent methods robust to transcription errors, where no prior knowledge on topics is required [2]. In particular, robustness can be obtained relying on sources of information other than the transcribed speech material, such as audio events (pauses, speaker changes, etc.) or visual events (shot changes, anchor shots, etc.).

The goal of this post-doctoral position is to experiment further unsupervised discovery paradigms for robust structure analysis. The post-doctoral researcher will lead research in the following topics:

1. Unsupervised discovery paradigms in audio and video contents: (a) Improve current algorithms, both in performance and in computational burden; For example, one can rely on automatically built discriminative models from the result of an initial discovery step to improve performance. (b) Propose innovative solutions to define amapping of discovered elements to semantically meaningful events.

2. Apply discovery paradigms for video segmentation, and, in particular, for topic segmentation (accouting for structural elements, transcript-free segmentation, etc.).

The work will be carried out jointly in the Multimedia group and in the Speech and Audio Processing group at INRIA Rennes, France, in the framework of the OSEO-funded project QUAERO. The position is to be filled as soon as possible and for a duration of 1 year, renewable once. Prospective candidates should have a strong background in at least one of the following domains: pattern recognition preferably applied to speech or video processing, machine learning, multimedia, data mining. Salary depending on experience.

Contacts:

Guillaume Gravier (guillaume.gravier@irisa.fr)

Mathieu Ben (mathieu.ben@irisa.fr)

For applications, please send a resume, a short summary of previous work and contacts for recommendation.

Links:

INRIA Rennes: http://www.inria.fr/rennes

Multimedia Group Texmex, http://www.irisa.fr/texmex

Speech and Audio Processing Group Metiss, http://www.irisa.fr/metiss

Quaero project: http://www.quaero.org

References:

[1] Armando Muscariello, Guillaume Gravier and Frédéric Bimbot. Audio keyword extraction by unsupervised word discovery. In Proc. Conf. of the Intl. Speech Communication Association (Interspeech), 2009.

[2] Camille Guinaudeau, Guillaume Gravier and Pascale Sébillot. Improving ASR-based topic segmentation of TV programs with confidence measures and semantic relations. Submitted to Intl. Speech Communication Association (Interspeech), 2010.

http://gdr-isis.org/rilk/gdr/Kiosque/poste.php?jobid=3853

6-23

(2010-06-18) Ph D position at Institut Eurecom Sophia Antipolis France

PhD position at EURECOM

Title:      A new co-modal approach to biometrics
Department: Multimedia Communications
URL:        http://www.eurecom.fr/mm.en.htm
Start date: October 2010
Duration:   Duration of the thesis

Description: This PhD thesis aims to pioneer the next generation of multi-modal biometric systems. Most current multi-modal approaches operate at the score or decision levels and thus involve individual, parallel classifiers. The aim of this work is to develop a ‘co-modal’ approach to biometrics which better combines biometric traits from the outset, within the enrolment and/or modelling stage. Furthermore, as a semi-supervised approach to machine learning, co-training has the potential to acquire vast volumes of training data from only small manually labelled training sets and thus to bring significant improvements in multi-modal biometrics performance through improved modelling and normalisation strategies. As a fundamentally new and adventurous approach to biometric modelling, and the first to apply co-training to multi-modal biometrics, this research program is an ideal opportunity to undertake exciting, cutting edge PhD research at one of Europe’s premier research institutions. A highly competitive salary and benefits package is offered for the duration of the PhD.

Requirements: The position calls for a candidate with a strong Masters’ degree in Engineering, Mathematics, Computing Science or other relevant subject, ideally with some experience of signal processing, speech processing, image and video processing, pattern recognition or biometrics. You will be highly motivated to undertake challenging, applied research and have excellent English language speaking and writing skills. French language skills are a bonus.

Applications: Please send to the address below (i) your CV and (ii) contact details for two referees (preferably one from your final project supervisor) no later than 30th June 2010.

Contact:        Dr N. Evans and Prof. J-L. Dugelay
Postal address: 2229 Route des Crêtes BP 193, F-06904 Sophia Antipolis cedex, France

Email address: evans@eurecom.fr, dugelay@eurecom.fr
Web address:   http://www.eurecom.fr/main/institute/job.en.htm
Phone:         +33/0 4 93 00 81 14, +33/0 4 93 00 81 41
Fax:           +33/0 4 93 00 82 00

EURECOM is located in Sophia Antipolis, a vibrant science park on the French Riviera. It is in close proximity with a large number of research units of leading multi-national corporations in the telecommunications, semiconductor and biotechnology sectors, as well as other outstanding research and teaching institutions. A freethinking, multinational population and the unique geographic location provide a quality of life without equal.

6-24

(2010-06-22) Technical director at ELDA

Technical Director

Working under the supervision of the Managing Director, he/she will be responsible of the development and management of technical projects related to language technologies, as well as partnerships, guaranteeing the timely and cost effectiveness of the execution of those projects.

He/she will be responsible for the management of project teams and the steering of all technical aspects.

He/she will organise, supervise and coordinate the technical activities, guaranteeing of the good scheduling of tasks. He/she will be in charge of the establishment of new contacts, in order to ensure the development of the firm and negotiate business activities in collaboration with our Business Development Manager. Thus, he/she will help setting up all necessary means such as new competences, in order to develop the activity of the firm. Most part of the activity is taking place within international R&D projects co-funded by the European Commission, the French ANR or private partners.

Skills required:

Engineer/PhD degree, with a minimum of 5 years’ experience as a project manager in the field of information technologies (human language technologies)
Experience and/or excellent knowledge of European cooperation programmes in the field of human language technologies, as well as international programmes
Experience in project management, including the management of European projects
Experience and/or good knowledge of issues related to Language Resources and tools of natural language processing in general.
Service-, customer- and business-minded, he/she has ability to work with a team and to listen. Excellent communication, as well as good social skills, written and oral ability will be a plus. Knowledge of market strategic orientations of human language technologies.
Proficiency in French and English

Candidates should have the citizenship (or residency papers) of a European Union country.

Salary: Commensurate with qualifications and experience.

Applicants should send (preferably via email) a cover letter addressing the points listed above together with a curriculum vitae to:

Khalid Choukri
ELRA / ELDA
55-57, rue Brillat-Savarin
75013 Paris
FRANCE
Fax : 01 43 13 33 30
Courriel : job@elda.org

For more information on ELRA/ELDA, visit the following web sites:

http://www.elda.org

http://www.elra.info

6-25

(2010-06-24) SOFTWARE ENGINEER AT HONDA RESEARCH INSTITUTE USA (MOUNTAIN VIEW, CA)

TITLE : SOFTWARE ENGINEER AT HONDA RESEARCH INSTITUTE USA (MOUNTAIN VIEW, CA)
We are seeking applications for a software engineering position to support research and development in the area of AI and Machine Learning at our Mountain View California office. The project goals are to design and develop our software system for spoken dialog systems and machine learning. The candidate will be involved in re-architecting and extending the existing software, closely collaborating with researchers.

Depending on interests and skills, the candidate will also have an opportunity to participate in our research projects on spoken dialog systems, probabilistic and symbolic reasoning, and decision making.

The candidate must have familiarity with AI/machine learning, and strong skills in software development in Java/C++. Desirable experience includes probabilistic and/or symbolic reasoning.

This position will last till March 2011 with possibility of further extensions at company discretion. To apply for this position, please send a cover letter and your resume to: career2010@honda-ri.com.

6-26

(2010-06-24) PhD (3 years) position available at the Radboud University Nijmegen.

PhD (3 years) position available at the Radboud University Nijmegen.

Job description

The FP7 Marie Curie Initial Training Network 'BBfor2' (Bayesian Biometrics for Forensics) provides an opportunity for young researchers to study several biometric technologies in a forensic context. The Network consists of 9 European research institutes and 3 associated partners. The Network will provide regular workshops and Summer Schools, so that the 15 PhD students (Early Stage Researchers - ESRs) and PostDocs (Experienced Researchers - ERs) and senior researchers can exchange research experience, insights and ideas. The main areas of research are Speaker Recognition, Face Recognition, Fingerprint Recognition, but also combinations of these techniques are studied. The challenge of applying biometric techniques in a forensic context is to be able to deal with the uncontrolled quality of the evidence, and to provide calibrated likelihood scores. The researchers in this Network will have the opportunity during their assignment to stay for some period at another Network institute and to get experience in an industrial or forensic laboratory.

The PhD student will investigate automatic speaker recognition in the forensic environment. The research will include theoretical aspects such as developing a general framework for evidence evaluation and reporting, and experimental aspects by conducting studies with automatic speaker recognition systems. The candidate will collaborate with other PhD students and senior researchers in the Network, in a highly interdisciplinary environment. Successful candidates have a Master Degree in Computer Science, Engineering or other relevant disciplines with a strong background in pattern recognition and / or signal processing, and excellent communication and writing skills in English.

Requirements

Candidates should comply with the rules set forward by the FP7 Marie Curie ITNs: Candidates should

- be transferring from another country, i.e., not be of Dutch nationality, and not have resided more than 12 months in the last 3 years in The Netherlands.

- be willing to work in at least one other country in the BBfor2 network.

- have less than 4 years of research experience since their master degree, and not hold a PhD.

Organization

The project will be carried out within the Centre for Language and Speech Technology (CLST), a research unit within the Faculty of Arts of the Radboud University Nijmegen. The CLST hosts a large international group of senior researchers and PhD students who do research at the frontier of science and develop innovative applications.

Conditions of employment

The duration of the contract is 3 years. The PostDoc will receive an initial contract for the duration of one year, with the possibility of prolongation for another 2 years. The salary is in accordance with the rules of the Marie Curie ITNs. The annual gross salary is EUR 25,000 in the first year and will grow to EUR 30,000 in the third year. In addition to the salary, travel allowances and career exploratory allowances are foreseen according to generous Marie Curie ITN provisions. The Radboud University is an equal opportunity employer. Female researchers are strongly encouraged to apply for this vacancy.

Additional information

For further information about the position, please contact David van Leeuwen, d.vanleeuwen@let.ru.nl.

Application

Letters of application, including extensive CVs, (with reference to the vacancy number 23.02.10 and preferably by e-mail) can be sent to: vacatures@let.ru.nl. Candiates can apply until August 15th, 2010.

6-27

(2010-06-24) PhD POSITION in PERSON RECOGNTION IN AUDIOVISUAL BROADCASTS Grenoble France

PhD POSITION in PERSON RECOGNTION IN AUDIOVISUAL BROADCASTS (36
months; starting Sept./Oct. 2010) IN GRENOBLE (France)
===================================================================
Key words: video information retrieval, spoken language processing, cross-modal fusion

The ANR QCOMPERE project is one the three consortiums that will participate to the REPERE
challenge. REPERE is a multimedia challenge for person recognition within audiovisual
broadcasts. Its general goals are to improve the state-of-the-heart in automatic processing of
multimedia documents and to create collaborations between specialists of the different modalities
involved in the challenge. More precisely, the participants to the REPERE challenge are expected
to build a system for identifying the persons in audiovisual broadcasts, relying on different
possible information sources: the image of the person, his/her voice and the name written on the
image or pronounced. In order to participate to the call, each consortium needs to address these
four questions: who is seen (person identification in videos), who is speaking (speaker
identification in audio), whose name is written on screen (name identification in video using
OCR), whose name is pronounced (name spotting or name identification in ASR), and be able to
fuse the answers in a single system.

This PhD position focus on the fusion of information for cross-(multi-)modal person
recognition in videos, as well as name identification in videos using OCR.
The PhD will take place in the Laboratory of Informatics of Grenoble (LIG) that was created on
January 1, 2007. This laboratory gathers 500 researchers, lecturers-researchers, students and
post-docs, technical and administrative staff members. Research activities are structured around
24 autonomous research groups. Due to its multimodal dimension, this PhD would take place
between two different teams of the laboratory: MRIM and GETALP. The Multimedia
information indexing and retrieval (MRIM) group is specialized (as it is shown in its name) with
multimedia indexing. The GETALP group is specialized in spoken and written natural language
processing. More details on the groups can be found on http://mrim.imag.fr/en/ and
http://getalp.imag.fr/

Applicants should hold a Master Thesis in Computer Science and show a strong academic
background. They should be fluent in English. Competence in French is optional, though
applicants will be encouraged to acquire this skill during the PhD.

For further information, please contact Laurent Besacier (Laurent.Besacier at imag.fr) and
Georges Quénot (Georges.Quenot at imag.fr)

6-28

(2010-06-29) Post doc Universite de Neuchatel Suisse

1 poste de POST-DOCTORANT(E)

à temps partiel (50%)

dans le cadre d’un projet FNS portant sur l’étude psycholinguistique,

neurolinguistique et eletrophysiologique (ERP) des processus cognitifs

impliqués dans la production du langage

Charge Collaboration dans le cadre du projet de recherche FNS,

conduite de recherche indépendante dans une problématique

reliée.

Entrée en fonction 1er octobre 2010 ou à convenir

Traitement légal

Durée du mandat : 2 ans

Titre requis Doctorat en psychologie, linguistique, logopédie ou sciences

du langage.

Profil Recherche scientifique sur les processus cognitifs impliqués

dans la production du langage dans le domaine de

la psycholinguistique expérimentale et/ou neurolinguistique

et/ou neuroimagerie fonctionnelle.

Les demandes de renseignements peuvent être adressées par e-mail à :

Marina.Laganaro@unine.ch

Le dossier de candidature (CV et lettre de motivation) doivent être adressées à Marina

Laganaro, par e-mail de préférence (Marina.Laganaro@unine.ch) jusqu’au 30

juillet 2010.

Neuchâtel, le 25 juin 2010

6-29

(2010-06-30) Doctoral and postdoctoral opportunities in Forensic Voice Comparison Australia

Doctoral and postdoctoral opportunities in Forensic Voice Comparison

Doctoral students are sought in connection with the three-year half-million-dollar Australian Research Council Linkage Project (LP100200142) 'Making demonstrably valid and reliable forensic voice comparison a practical everyday reality in Australia'. This is a unique opportunity to obtain multidisciplinary training in both acoustic-phonetic and automatic approaches to forensic voice comparison and evaluation of forensic evidence within the new paradigm for forensic science – likelihood-ratio framework with testing of validity and reliability.

The project will be centred at the to-be-formed Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales, Sydney, Australia http://www.ee.unsw.edu.au/. The University of New South Wales is one of Australia's major research institutions, attracting top national competitive research grants and with extensive international research links.

The lead investigator on this project is Dr. Geoffrey Stewart Morrison http://geoff-morrison.net/. More details about the project can be found at http://forensic-voice-comparison.net/.

Doctoral students

We are looking for two types of candidates:

1.      The ideal candidate would have a strong background in acoustic phonetics and also be (or have the potential to quickly become) knowledgeable and skilful in forensic science (especially the likelihood-ratio framework), programming (especially Matlab), statistics (especially Bayesian) and pattern recognition, and signal processing (especially speech processing).

2.      The ideal candidate would have a strong background in signal processing, especially speech processing, pattern recognition, and programming (especially Matlab), and also be (or have the potential to quickly become) knowledgeable and skilful in forensic science (especially the likelihood-ratio framework), acoustic phonetics, and statistics (especially Bayesian).

A candidate with a Masters degree specialising in forensic voice comparison within the likelihood-ratio framework would be highly favoured. An excellent command of spoken and written English is essential, and fluency in Spanish and/or Standard Chinese would be advantageous.

The start date is flexible but is likely to be early 2011. In some cases this may be dictated by scholarship rules. Applicants should initially submit a curriculum vitae with cover letter as soon as possible, both documents should be submitted as a single pdf to geoff-morrison@forensic-voice-comparison.net

If a suitable international candidate is found by 10 July 2010, we will make a decision at that time in order to facilitate Endeavour award application, see below (the deadline is tight because of the Australian Research Council's late announcement of Linkage Project funding). Otherwise positions will remain open until filled. The number of positions is flexible depending on the ability of candidates to obtain external funding.

We have a small amount of money (AU$10k per year) within the project allocated to supporting PhD students. This is envisaged as a top up to a more substantial scholarship. Potential students will have to apply for such a scholarship, but we will render all assistance possible to assist in making that application.

Citizens and permanent residents of Australia and New Zealand should apply for:

1. Australian Postgraduate Award and other scholarships

        http://www.grs.unsw.edu.au/scholarships/localschols/unswlocalschols.html

        The application closing date for a 2011 start will probably be mid-October 2010 (check the link regularly for updates).

International students should apply for:

1. Endeavour Awards

        http://www.deewr.gov.au/International/EndeavourAwards/Pages/Home.aspx

        The Australian Government's internationally competitive, merit-based scholarship program providing opportunities for citizens of the Asia-Pacific, Middle East, Europe and Americas to undertake study, research and professional development in Australia. The UNSW internal deadline for the next round of applications is 19 July 2010 (the following deadline will be January 2011).

2. UNSW International Research Scholarships

        http://www.grs.unsw.edu.au/scholarships/intschols/unswintschols.html

        These scholarships include full fees and some living allowance and are typically very competitive. Please contact us before applying.

3. Any scholarships available from the student's home country (we have experience with applying for SSHRC Doctoral Fellowships). See also http://www.grs.unsw.edu.au/scholarships/homecountryschols.html

Postdoctoral Researchers

We would also welcome applications from postdoctoral researchers who have external funding (e.g., SSHRC Postdoctoral Fellowship) and wish to join our team. We will assist with preparing applications for external funding.

Visiting Students/Researchers

We would also welcome applications for externally-funded shorter research visits from PhD students and researchers from other universities. We will assist with preparing applications for external funding (e.g., Endeavour Research Fellowships).

Enquiries should be addressed to geoff-morrison@forensic-voice-comparison.net

6-30

(2010-07-07) Two positions at ELDA

Two positions are currently available at ELDA.

Engineer in HLT Evaluation Department

He/she will be in charge of managing the evaluation activities in relation with the collection of Language Resources for evaluation, the evaluation of technology components, and in general, the setting up of an HLT evaluation infrastructure. As part of the HLT Evaluation Department, he/she will be working on European projects and will be involved in the evaluation of technology components related to information retrieval, information extraction, machine translation, etc.

Profile :

Engineer/Master degree (preference for a PhD) in computer science, electrical engineering, computational linguistics, information science, knowledge management or similar fields.

Experience and/or good knowledge of the information retrieval/information extraction programmes in Europe, the US and Japan.

Experience in project management, including the management of European projects.

Experience and/or good knowledge of issues related to Language Resources.

Ability to work independently and as part of a team, in particular the ability to supervise members of a multidisciplinary team.

Proficiency in English.

Programmer

ELDA offers a position for its Language Resource Production and Evaluation activities working in the framework of European projects. The position is related to a number of NLP activities within ELDA, with a focus on the development of web-service architectures for the automatic production and distribution of language resources. The candidate may also be involved in the creation of LR repositories, NLP applications development and/or evaluation, etc.

Profile :

Degree or MSc in computer science, computational linguistics, natural language processing or similar fields (preference for a PhD)

Good programming skills in C, C++, Perl and/or Java

Experience and/or knowledge of web services

Good knowledge of Linux and open source software

Experience and/or knowledge of NLP is a plus

Experience and/or knowledge of Machine Translation, Information Retrieval and related areas is a plus

Ability to work independently and as part of a team, in particular to collaborate with members of a multidisciplinary and multilingual team

Proficiency in French and English

Applicants should send (preferably via email) a cover letter addressing the points listed above together with a curriculum vitae to :

Khalid Choukri

ELRA / ELDA

55-57, rue Brillat-Savarin

75013 Paris

FRANCE

Fax : 01 43 13 33 30

Courriel : job@elda.org

6-31

(2010-07-07) Doctorat au LORIA Nancy France (fluency in french required))

Sujet de thèse

Motivations

Dans le cadre d'une collaboration avec une entreprise qui commercialise des morceaux de

documentai res vidéo (rushes), nous nous intéres sons à la reconnaissance automatique

des dialogues de ces rushes afin de pouvoir les indexer.

L'équipe parole a développé un système de transcription automa tique de bulletins

d'information : ANTS [2,3]. Si les performances des systèmes de transcription

automatique actuels sont satisfaisantes dans le cas de la parole lue ou

« préparée » (bulletins d'informations, discours), elles se dégradent fortement dans le cas

de la parole spontanée [1,4,5]. Par rappor t à la parole préparée, la parole spontanée se

caractérise par:

• des insertions (hésitations, pauses, faux dépar t s de mots, reprises),

• des variations de prononciations comme la contraction de mots ou de syllabes

(/monsieur / => /m' sieu / ),

• des variations de la vitesse d'élocution (réduction de l'articulation de certains

phonèmes et allongement s d'aut res phonèmes),

• des environnement s sonores difficiles (parole superposée, rires, bruits

d'ambiance...).

Ces spécificités sont peu ou pas prises en compte par les systèmes de reconnaissance

actuels. Tous ces phénomène s provoquent des erreur s de reconnais sance et peuvent

entraîner une indexation erronée.

Sujet

Le but du sujet de thèse est de prendre en compte un ou plusieurs des phénomènes

spécifiques décrits ci- dessus, afin d'améliorer le taux de reconnaissance [4,6,7]. Les

phénomène s seront choisis et traités au niveau acoustique ou linguistique en fonction du

profil du candidat. Le travail consistera à :

• comprendre l'architecture de ANTS,

• pour les phénomène s choisis, faire un état de l'art et proposer de nouveaux

algorithmes,

• réaliser un prototype de reconnaissance de parole spontanée et le valider sur un

corpus de parole spontanée étiqueté.

Cadre du travail

Le travail s'effectuera au sein de l'équipe Parole de l'Inria - Loria à Nancy

(http: / / p a role.loria.fr). L'étudiant utilisera le logiciel ANTS de reconnaissance

automatique de la parole développé dans l'équipe.

Profil souhaité

Les candidat s devront maîtriser le français et l'anglais et savoir programme r en C ou en

Java dans un environnement Unix. Des connaissances en modélisation stochas tique ou en

traitement automatique de la parole seront un plus.

Contacts : illina@loria.fr , fohr@loria.fr ou mella@loria.fr

[1] S. Galliano, E. Geoffrois, D.Mostefa , K. Choukri, JF. Bonastre and G. Gravier, The ESTER Phase II Evaluation

Campaign for Rich Transcription of French broadcas t news, EUROSPEECH 2005,

[2] I. Irina, D. Fohr, O. Mella and C.Cerisara, The Automatic News Transcription System: ANTS some realtime

experiment s, ISCPL2004

[3] D. Fohr, O. Mella, I. Irina and C. Cerisara, Experiment s on the accuracy of phone models and liaison

proces sing in a French broadcas t news transcription systems, ISCPL2004

[4] J.- L Gauvain, G. Adda, L. Lamel, L. F. Lefevre and H. Schwenk, Transcription de la parole conversationnelle

Revue TAL vol 45 n° 3

[5] M. Garnier - Rizet, G. Adda, F. Cailliau, J.- L. Gauvain, S. Guillemin- Lanne, L. Lamel, S. Vanni, C. Waaste -

Richard CallSurf: Automatic transcription, indexing and structuration of call center conversational speech for

knowledge extraction and query by content. LREC 2008

[6] J.Ogata, M.Goto, The use of acous tically detected filled and silent pauses in spontaneous speech

recognition ICASSP 2009

[7] F. Stouten, J. Duchateau, J.- P. Martens and P. Wambacq, Coping with disfluencies in spontaneous speech

recognition: Acoustic detection and linguistic context manipulation, Speech Communication vol 48, 2006

6-32

(2010-07-14 ) Ph D position at Loria Nancy (in french)

Sujet de these

Motivations

Dans le cadre d'une collaboration avec une entreprise qui commercialise des morceaux de

documentai res vidéo (rushes), nous nous intéres sons à la reconnaissance automatique

des dialogues de ces rushes afin de pouvoir les indexer.

L'équipe parole a développé un système de transcription automa tique de bulletins

d'information : ANTS [2,3]. Si les performances des systèmes de transcription

automatique actuels sont satisfaisantes dans le cas de la parole lue ou

« préparée » (bulletins d'informations, discours), elles se dégradent fortement dans le cas

de la parole spontanée [1,4,5].

Cadre du travail

Le travail s'effectuera au sein de l'équipe Parole de l'Inria - Loria à Nancy

(http: / / parole.loria.fr). L'étudiant utilisera le logiciel ANTS de reconnaissance

automatique de la parole développé dans l'équipe.

Profil souhaité

Les candidat s devront maîtriser le français et l'anglais et savoir programme r en C ou en

Java dans un environnement Unix. Des connaissances en modélisation stochas tique ou en

traitement automatique de la parole seront un plus.

Contacts

: illina@loria.fr , fohr@loria.fr ou mella@loria.fr

[1] S. Galliano, E. Geoffrois, D.Mostefa , K. Choukri, JF. Bonastre and G. Gravier, The ESTER Phase II Evaluation

Campaign for Rich Transcription of French broadcas t news, EUROSPEECH 2005,

[2] I. Irina, D. Fohr, O. Mella and C.Cerisara, The Automatic News Transcription System: ANTS some realtime

experiment s, ISCPL2004

[3] D. Fohr, O. Mella, I. Irina and C. Cerisara, Experiment s on the accuracy of phone models and liaison

proces sing in a French broadcas t news transcription systems, ISCPL2004

[4] J.- L Gauvain, G. Adda, L. Lamel, L. F. Lefevre and H. Schwenk, Transcription de la parole conversationnelle

Revue TAL vol 45 n° 3

[5] M. Garnier - Rizet, G. Adda, F. Cailliau, J.- L. Gauvain, S. Guillemin- Lanne, L. Lamel, S. Vanni, C. Waaste -

Richard CallSurf: Automatic transcription, indexing and structuration of call center conversational speech for

knowledge extraction and query by content. LREC 2008

[6] J.Ogata, M.Goto, The use of acous tically detected filled and silent pauses in spontaneous speech

recognition ICASSP 2009

[7] F. Stouten, J. Duchateau, J.- P. Martens and P. Wambacq, Coping with disfluencies in spontaneous speech

recognition: Acoustic detection and linguistic context manipulation, Speech Communication vol 48, 2006

6-33

(2010-07-20) Ph D at IDIAP Martigny Switzerland

PhD POSITION in PERSON SEGMENTATION AND CLUSTERING IN AUDIO-VIDEO STREAMS,
36 MONTHS STARTING IN OCTOBER 2010,
in IDIAP (MARTIGNY, SUISSE) AND LIUM (LE MANS, FRANCE),
NET SALARY: 1700€ + INDEMNITY
------------------------------------------------------------------------------------

Research areas:
Audio/video segmentation and clustering, speaker recognition, face recognition, pattern recognition, machine learning, audio and image processing.

---
Description:
The objective of the thesis is to investigate novel algorithms for the automatic segmentation and clustering of people in audio-visual documents. More precisely, the goal is to detect the people who appear in the documents, when they appear or/and when they speak, with whom they speak, and who they are. The work will rely on and improve previous knowledge of the LIUM and IDIAP in speaker diarization, names recognition from automatic speech transcripts, person detection, tracking and recognition, and will be expanded to address the audio-visual identity association and the recognition of the roles of people in the Tv shows. The work will be evaluated in the framework of the REPERE evaluation campaign, which is a challenge for audio and video person detection and recognition in TV broadcasts (journal debates, sitcoms) and will focus on segmentation and clustering targeting well-known people (anchors, journalists, known or introduced persons).

---
Supervision and organization:
The proposed position is funded by the ANR in the SODA project. It is a joint PhD position within both IDIAP and LIUM, under academic co-supervision by Profs. Paul Deléglise (LIUM), Jean-Marc Odobez (IDIAP) and Sylvain Meignier (LIUM). He will work closely with a post-doctoral fellow working for the same project.

The candidate will be registered as a student at the University of Le Mans. He will share this time between Le Mans and Martigny depending on the need. The position will start in October 2010 and the net salary will be between €1700 a month. 18 months of indemnity (€500 per month) will be provided to support the extra cost of working at two different sites, as well as the higher cost of life in Martigny.

---
Requirement:
Applicants should hold a strong university degree entitling them to start a doctorate (Master’s degree or equivalent) in a relevant discipline (Computer Science, Human Language Technology, Machine Learning, etc).

Applicants for this full-time 3 year PhD position should be fluent in English or in French. Competence in French is optional, though applicants will be encouraged to acquire this skill during training.

Very strong software skills are required, especially in Java, C, C++, Unix/Linux, and at least one scripting language such as Perl or Python.

---
Contact:
Please send a curriculum vitae to Jean-Marc Odobez odobez@idiap.ch AND sylvain.meignier@lium.univ-lemans.fr

6-34

(2010-07-28) Ph D position in model based speech synthesis

Post Doctoral Speech Synthesis Research Associate Position

The Communication Analysis and Design Laboratory at Northeastern University is pleased to announce the availability of a postdoctoral research associate position, funded by the National Science Foundation Division of Computer and Information Systems. This project aims to build a personalized speech synthesizer for individuals with severe speech impairments by mining their residual source characteristics and morphing these vocal qualities with filter properties of a healthy talker. An initial prototype has been designed and implemented in MATLAB. Further work is required to refine the voice morphing and speech synthesis algorithms, to develop a front-end user interface and to assess system usability. The successful candidate will work on an interdisciplinary team toward the project goals.

Required Skills:

PhD in computer science or electrical engineering or related field

Strong knowledge in machine learning and digital signal processing

Extensive experience with MATLAB and C/C++ programming

Experience with building graphical user interfaces

Knowledge of, and experience with, concatenative and/or model-based speech synthesis

This position is available immediately. Funding is available for up to two years on this project. Additional funding may be available for work on related projects. Interested candidates should email and/or send the following to Rupal Patel, Director, Communication Analysis and Design Laboratory, 360 Huntington Avenue, Boston, MA, 02115; r.patel@neu.edu; 617-373-5842: A cover letter stating your research interests and career goals, CV, two letters of recommendation, official transcripts of all postsecondary education.

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy

© Copyright 2026 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA