ISCApad #148 |
Sunday, October 10, 2010 by Chris Wellekens |
6-1 | (2010-05-12) 2 PhD Positions at Vrije Universiteit of Brussel Belgium PhD position in Audio Visual Signal Processing ETRO – AVSP – Vrije Universiteit Brussel
PhD position in audiovisual crossmodal attention and multisensory integration. Keywords: audio visual signal processing, scene analysis, cognitive vision.
The Vrije Universiteit Brussel (Brussels, Belgium; http://www.vub.ac.be), department of Electronics and Informatics (ETRO) has available a PhD position in the area of audio visual scene analysis and in particular in crossmodal attention and multisensory integration in the detection and tracking of spatio-temporal events in audiovisual streams.
The position is part of an ambitious European project aliz-e “Adaptive Strategies for Sustainable Long-Term Social Interaction”. The overall aim of the project is to develop the theory and practice behind embodied cognitive robots which are capable of maintaining believable multi-modal any-depth affective interactions with a young user over an extended and possibly discontinuous period of time.
Within this context, audiovisual attention plays an important role. Indeed, attention is the cognitive process of selectively concentrating on an aspect of the environment while ignoring others. The human selective attention mechanism enables us to concentrate on the most meaningful signals amongst all information provided by our audio-visual senses. The human auditory system is able to separate acoustic mixtures in order to create a perceptual stream for each sound source. It is widely assumed that this auditory scene analysis interacts with attention mechanisms that select a stream for attentional focus. In computer vision, attention mechanisms are mainly used to reduce the amount of data for complex computations. They employ a method of determining important, salient units of attention and select them sequentially for being subjected to these computations. The most common visual attention model is the bottom-up approach which uses basic features, conjunctions of features or even learned features as saliency information to guide visual attention. Attention can also be controlled by top-down or goal-driven information relevant to current behaviors. The deployment of attention is then determined by an interaction between bottom-up and top-down attention priming or setting. Motivated by these models, the present research project aims at developing a conceptual framework for audio-visual selective attention in which the formation of groups and streams is heavily influenced by conscious and subconscious attention.
The position will be within the ETRO research group (http://www.etro.vub.ac.be) under supervision of Prof. Werner Verhelst and Prof. Hichem Sahli, but will also have close collaboration and regular interaction with the research groups participating in Aliz-e. The ideal candidate is a team worker having theoretical knowledge and practical experience in audio and image processing, machine learning and/or data mining. He/she is a good programmer (preferably matlab or C++). He or she is in the possession of a 2 year master in engineering science (electronics, informatics, artificial intelligence or other relevant discipline). The position and research grant are available from June 2010. The position is for 4 years. Applicants should send a letter explaining their research interests and experience, a complete curriculum vitae (with the relevant courses and grades), and an electronic copy of their master thesis (plus, optionally, reports of other relevant projects) to wverhels@etro.vub.ac.be
============================================================Post Doc Position in Audio-Visual Signal Processing & Machine Learning ETRO – AVSP – Vrije Universiteit Brussel
Post Doctoral Position in audiovisual signal processing and machine learning. Keywords: audio visual signal processing, scene analysis, machine learning, affective human-robot interaction.
The Vrije Universiteit Brussel (Brussels, Belgium; http://www.vub.ac.be), department of Electronics and Informatics (ETRO) has available a Post Doctoral position in the area of audio visual signal processing and multi-modal affective interaction.
The position is part of an ambitious European project aliz-e “Adaptive Strategies for Sustainable Long-Term Social Interaction”. The overall aim of the project is to develop the theory and practice behind embodied cognitive robots which are capable of maintaining believable multi-modal any-depth affective interactions with a young user over an extended and possibly discontinuous period of time.
Skills: PhD with concentration in relevant areas or closely related areas, such as audiovisual speech processing, audiovisual scene analysis, human-machine interaction, affective computing, machine learning The position is available from June 2010 at a competitive salary. The position is guaranteed for 3 years and can be extended. In addition, candidates that qualify for an Odysseus grant from the Research Foundation Flanders will be encouraged and supported to do so (http://www.fwo.be/Odysseusprogramma.aspx). Applicants should send a letter explaining their research interests and experience, a complete curriculum vitae and recommendation letters to wverhels@etro.vub.ac.be
| |||||||||
6-2 | (2010-05-12) Post doc at Universite de Bordeaux, Talence , France Sélection de modèles pour les systèmes de Markov à saut Post-Doc DeadLine: 31/07/2010 Fournir un CV avec 2 lettres de personnes référentes Le post-doctorat proposé porte sur les approches de sélection de modèles pertinents dans un contexte d’estimation par algorithmes dits à modèles multiples. Ces approches consistent à mettre en compétition plusieurs modèles pour décrire l’évolution de l’état d’un système que l’on cherche à estimer. Les premiers algorithmes proposés [1] considèraient des modèles linéaires Gaussiens et étaient donc fondés sur une estimation du vecteur état par filtrage de Kalman. Avec le développement des méthodes de filtrage particulaire [2], le problème posé s’élargit au contexte des systèmes dits de Markov à saut dont l’évolution peut être décrite par différentes lois de probabilités. Dans ce cadre, le post-doctorant s’interrogera sur le choix a priori des modèles d’évolution de l’état du système. Si ce choix n’est pas dicté par des considérations physiques, différentes questions peuvent alors être soulevées telles que : - le nombre optimal de modèles à utiliser, - la validité des modèles sélectionnés, - l’influence du degré de recouvrement ou de ressemblance de ces modèles. Ainsi, il conviendra de déterminer si le fait d’utiliser un jeu de modèles très « différents » les uns des autres permet d’améliorer l’estimation de l’état du système. Le post-doctorant sera donc amené à étudier/développer des critères permettant de mesurer la ressemblance entre deux modèles, ou plus génériquement entre deux lois de probabilité, et s’intéressera entre autres à des outils tels que le facteur de Bayes ou la déviance Bayésienne [3]. [1]H. A. P. Blom,Y. Bar-Shalom, The interacting multiple model algorithm for systems with Markovian switching coefficients, IEEE Trans. Autom. Control, 33 8, (1988), 780–783.
| |||||||||
6-3 | (2010-05-20) Associate professor at Nanjng Normal University China Associate Professor or Lecturer positions in Phonetic Science and Speech Technology at Nanjing Normal University, China
The Department of Linguistic Science and Technology at Nanjing Normal University, China, invites applications for two positions at Associate Professor or Lecturer level in the area of Phonetic Sciences and Speech Technologies.
Nanjing Normal University (NNU) is situated in Nanjing, a city in China not only famous for its great history and culture but also pride for excellence in education and academy. With Chinese-style buildings and garden-like environment, the NNU campus is often entitled as the “Most Beautiful Campus in the Orient.”
NNU is among the top 5 universities of China in the area of Linguistics. Placing a strong emphasis on interdisciplinary research, the Department of Linguistic Science and Technology at NNU is unique in that it bridges the studies of theoretical and applied linguistics, cognitive sciences, and information technologies. A new laboratory has recently been established in phonetic sciences and speech technologies, to stimulate a closer collaboration between linguists, phoneticians, psychologists, and computer/engineering scientists. The laboratory is very well equipped, possessing sound-proof recording studio, professional audio facilities, physiological instruments (e.g., EGG, EMG, EPG, airflow and pressure module, and nasality sensor), EEG for ERP studies, and Linux/Windows workstations.
We welcome interested colleagues to join us. The research can cover any related areas in phonetic sciences and speech technologies, including but not limited to speech production, speech perception, prosodic modeling, speech synthesis, automatic speech recognition and understanding, spoken language acquisition, and computer-aided language learning. Outstanding research support will be offered. The position level will be determined based on qualifications and experience.
Requirements: * A PhD degree in related disciplines (e.g., linguistics, psychology, physics, applied mathematics, computer sciences, and electronic engineering) is preferred, though a MS degree with a distinguished experience in R&D of speech technologies at world-class institutes/companies is also acceptable * 3+ years’ experience and strong publication/patent record in phonetic sciences or speech technologies * Good oral and written communication skills in both Chinese and English * Good programming skills * Team work spirit in a multidisciplinary group * Candidates working in any related topics are encouraged to apply, but those who have backgrounds and research interests in both phonetic/linguistic sciences and speech technologies will be considered with preference
Interested candidates should submit a current CV, a detailed list of publication, the copies of the best two or three publications, and the contact information of at least two references. The application and any further enquiry about the positions should be sent to Prof. Wentao GU by email (preferred) or regular mail to the following address:
Prof. Wentao GU Dept of Linguistic Science and Technology Nanjing Normal University 122 Ning Hai Road, Nanjing Jiangsu 210097, China Phone: +86-189-3687-2840 Email: wentaogu@gmail.com wtgu@njnu.edu.cn
The positions will keep open until they are filled.
| |||||||||
6-4 | (2010-05-21) Post doc at LORIA Nancy France Title : Bayesian networks for modeling and handling variability sources in speech recognition - Location: INRIA Nancy Grand Est research center --- LORIA Laboratory, NANCY, France In state-of-art speech recognition systems, Hidden Markov Models (HMM) are used to model the acoustic realization of the sounds. The decoding process compares the unknown speech signal to sequences of these acoustic models to find the best matching sequence which determines the recognized words. Lexical and grammatical constraints are taken into account during the decoding process; they limit the amount of model sequences that are considered in the comparisons, which, nevertheless remains very large. Hence precise acoustic models are necessary for achieving good speech recognition performance. To obtain reliable parameters, the HMM-based acoustic models are trained on very large speech corpus. However, speech recognition performance is very dependent on the acoustic environment: good performance is achieved when the acoustic environment matches with that of the training data, and performance degrades when the acoustic environment gets different. The acoustic environment depends on many variability sources which impact on the acoustic signal. This includes the speaker gender (male / female), individual speaker characteristics, the speech loudness, the speaking rate, the microphone, the transmission channel, and of course the noise, to name only of few of them [Benzeghiba et al, 2007]. Using a training corpus which exhibits too many different variability sources (for example many different noise levels, too different channel speech coding schemes, ...) makes the acoustic models less discriminative, and thus lowers the speech recognition performance. On the opposite, having many sets of acoustic models, each one of them dedicated to a specific environment condition raises training problems. Indeed, because each training subset is restricted to a specific environment condition, its size gets much smaller, and consequently it might be impossible to train reliably some parameters of the acoustic models associated to this environment condition. In recent years, Dynamic Bayesian Networks (DBN) have been applied in speech recognition. In such an approach, certain model parameters are set dependent on some auxiliary features, such as articulatory information [Stephenson et al., 2000], pitch and energy [Stephenson et al. 2004], speaking rate [Shinozaki & Furui, 2003] or some hidden factor related to a clustering of the training speech data [Korkmazsky et al., 2004]. The approach has also been investigated for dealing with multiband speech recognition, non-native speech recognition, as well as for taking estimations of speaker classes into account in continuous speech recognition [Cloarec & Jouvet, 2008]. Although the above experiments were conducted with limited vocabulary tasks, they showed that Dynamics Bayesian Networks provide a way of handling some variability sources in the acoustic modeling. The objective of the work is to further investigate the application of Dynamic Bayesian Network (DBN) for continuous speech recognition application using large vocabularies. The aim is to estimate the current acoustic environment condition dynamically, and to constraint the current acoustic space used during decoding accordingly. The underlying idea is to be able to handle various range of acoustic space constraints during decoding. Hence, when the acoustic environment condition estimation is reliable, the corresponding specific condition constraints can be used (leading, for example, to model parameters associated to a class of very similar speakers in a given environment). On the opposite, when the acoustic environment condition estimation is less reliable, more tolerant constraints should be used (leading, for example, to model parameters associated to a broader class of speakers or to several environment conditions). Within the formalism of Dynamic Bayesian Networks, the work to be carried out is the following. The first aspect concerns the optimization of the classification of the training data, and associated methods for estimating the classes that best matches unknown test data automatically. The second aspect involves the development of confidence measures associated to the classification process of test sentences, and the integration of these confidence measures in the DBN modeling (in order to constraint more or less the acoustic space for decoding according to the reliability of the environment condition estimation). [Benzeghiba et al, 2007] M. Benzeghiba, R. de Mori, O. Deroo, S. Dupont, T. Erbes, D. Jouvet, L. Fissore, P. Laface, A. mertins, C. Ris, R. Rose, V. Tyagi & C. Wellekens: 'Automatic speech recognition and speech variability: A review'; Speech Communication, Vol. 49, 2007, pp. 763-786. [Cloarec & Jouvet, 2008] G. Cloarec & D. Jouvet: 'Modeling inter-speaker variability in speech recognition' ; Proc. ICASSP'2008, IEEE International Conference on Acoustics, Speech, and Signal Processing, 30 March – 4 April 2008, Las Vegas, Nevada, USA, pp. 4529-4532 [Korkmazsky et al., 2004] F. Korkmazsky, M. Deviren, D. Fohr & I. Illina: 'Hidden factor dynamic Bayesian networks for speech recognition'; Proc. ICSLP'2004, International Conference on Spoken Language Processing, 4-8 October 2004, Jeju Island, Korea, pp. 1134-1137. [Shinozaki & Furui, 2003] T. Shinozaki & S. Furui: 'Hidden mode HMM using bayesian network for modeling speaking rate fluctuation'; Proc. ASRU'2003, IEEE Workshop on Automatic Speech Recognition and Understanding, 30 November - 4 December 2003, US Virgin Islands, pp.417-422. [Stephenson et al., 2000] T.A. Stephenson, H. Bourlard, S. Bengio & A.C. Morris: 'Automatic speech recognition using dynamic Bayesian networks with both acoustic and articulatory variables'; Proc. ICSLP'2000, International Conference on Spoken Language Processing, 2000, Beijing, China, vol. 2, pp. 951–954. [Stephenson et al., 2004] T.A. Stephenson, M.M. Doss & H. Bourlard: 'Speech recognition with auxiliary information'; IEEE Transactions on Speech and Audio Processing, SAP-12 (3), 2004, pp. 189–203.
| |||||||||
6-5 | (2010-05-26) Post-doc position in Speech Recognition, Adaptation, Retrieval and Translation- Aalto University Finland Post-doc position in Speech Recognition, Adaptation, Retrieval and Translation http://ics.tkk.fi/en/current/news/view/postdoc_position_in_the_speech_recognition_group/ or directly: http://www.cis.hut.fi/projects/speech/jobs10.shtml We are looking for a postdoc to join our research group working on machine learning and probabilistic modeling in speech recognition, adaptation, retrieval and translation. Speech recognition group (led by Mikko Kurimo) belongs to the Adaptive Informatics Research Centre (by Prof. Oja, 2006-) which is the successor to the Neural Networks Research Centre (by Prof. Kohonen, 1995-2005). We are happy to consider outstanding candidates interested in any of our research themes, for example: · large-vocabulary speech recognition · acoustic and language model adaptation · speech recognition in noisy environments · spoken document retrieval · speech translation based on unsupervised morpheme models · speech recognition in multimodal and multilingual interfaces Postdoc: 1 year + extension possibilities. Starting date: near August 1, 2010. Position requires a relevant doctoral degree in CS or EE, skills for doing excellent research in a group, and outstanding research experience in any of the research themes mentioned above. The candidate is expected to perform high-quality research, and provide assistance in supervision of our PhD students. In Helsinki you will join the innovative international computational data analysis and ICT community. Among European cities, Helsinki is special in being clean, safe, liberal, Scandinavian, and close to nature, in short, having a high standard of living. English is spoken everywhere. See. e.g. Visit Finland. Please attach a CV including a list of publications and email addresses of 2-3 people willing to give more information. Include a brief description of research interests and send the application by email to Mikko Kurimo, Mikko.Kurimo@tkk.fi
| |||||||||
6-6 | (2010-05-26) Ph D grant at University of Nantes France Fusion Strategies for Handwriting and Speech Modalities – Application in Mathematical Expression Recognition These DeadLine: 15/07/2010 christian.viard-gaudin@univ-nantes.fr http://www.projet-depart.org/index.php Keywords : Handwriting recognition, Speech recognition, Data/decision fusion. IRCCyN - UMR CNRS 6597 - NANTES Equipe IVC Description of the Ph-D thesis: Handwriting and speech are the two most common modalities of interaction for human beings. Each of them has specific features related to usability, expressibility, and requires dedicated tools and techniques for digitization. The goal of this PhD is to study fusion strategies for a multi-modal input system, combining on-line handwriting and speech, so that extended facilities or increased performances are achieved with respect to a single modality. Several fusion methods will be investigated in order to take advantage of a possible mutual disambiguation. They will range from early fusion to late fusion, for exploiting as much as possible redundancy and complementarity of the two streams. The joint analysis of handwritten documents and speech is a quite new area of research, and only a few works have emerged concerning applications such as identity verification [1], white board interaction [2], lecture note taking [3], and mathematical expression recognition [4]. Precisely, the focus of this thesis will be on mathematical expression recognition [4,5,6]. This is a very challenging domain where a lot of difficulties have to be faced. Specifically, the large number of symbols, and the 2D layout of expressions have to be considered. Pattern recognition, machine learning, fusion techniques will play fundamental roles in this work. This PhD is part of the DEPART (Document Ecrit, Parole et Traduction) project funded by the Pays de la Loire Region. Applications, including cover letter, CV, and the contact information for references should be emailed to christian.viard-gaudin@univ-nantes.fr Starting date: September or October 2010
| |||||||||
6-7 | (2010-06-08) Two Associate Professor positions in Speech Communication at KTH. Two Associate Professor positions in Speech Communication at KTH.
| |||||||||
6-8 | (2010-06-15) Ph D students at Dpt Applied Informatics at University of Bielefeld Germany The Applied Informatics Group, Faculty of Technology, Bielefeld University is looking for PhD
| |||||||||
6-9 | (2010-06-16) Postdoc position at IRISA Rennes France Unsupervised paradigms for domain-independent video structure analysis
| |||||||||
6-10 | (2010-06-22) Technical director at ELDA Technical Director Working under the supervision of the Managing Director, he/she will be responsible of the development and management of technical projects related to language technologies, as well as partnerships, guaranteeing the timely and cost effectiveness of the execution of those projects. He/she will be responsible for the management of project teams and the steering of all technical aspects. He/she will organise, supervise and coordinate the technical activities, guaranteeing of the good scheduling of tasks. He/she will be in charge of the establishment of new contacts, in order to ensure the development of the firm and negotiate business activities in collaboration with our Business Development Manager. Thus, he/she will help setting up all necessary means such as new competences, in order to develop the activity of the firm. Most part of the activity is taking place within international R&D projects co-funded by the European Commission, the French ANR or private partners. Skills required:
Candidates should have the citizenship (or residency papers) of a European Union country. Salary: Commensurate with qualifications and experience. Applicants should send (preferably via email) a cover letter addressing the points listed above together with a curriculum vitae to:
Khalid Choukri
For more information on ELRA/ELDA, visit the following web sites:
| |||||||||
6-11 | (2010-06-24) SOFTWARE ENGINEER AT HONDA RESEARCH INSTITUTE USA (MOUNTAIN VIEW, CA) TITLE : SOFTWARE ENGINEER AT HONDA RESEARCH INSTITUTE USA (MOUNTAIN VIEW, CA)
| |||||||||
6-12 | (2010-06-24) PhD (3 years) position available at the Radboud University Nijmegen. PhD (3 years) position available at the Radboud University Nijmegen. Job description The FP7 Marie Curie Initial Training Network 'BBfor2' (Bayesian Biometrics for Forensics) provides an opportunity for young researchers to study several biometric technologies in a forensic context. The Network consists of 9 European research institutes and 3 associated partners. The Network will provide regular workshops and Summer Schools, so that the 15 PhD students (Early Stage Researchers - ESRs) and PostDocs (Experienced Researchers - ERs) and senior researchers can exchange research experience, insights and ideas. The main areas of research are Speaker Recognition, Face Recognition, Fingerprint Recognition, but also combinations of these techniques are studied. The challenge of applying biometric techniques in a forensic context is to be able to deal with the uncontrolled quality of the evidence, and to provide calibrated likelihood scores. The researchers in this Network will have the opportunity during their assignment to stay for some period at another Network institute and to get experience in an industrial or forensic laboratory. Requirements Candidates should comply with the rules set forward by the FP7 Marie Curie ITNs: Candidates should - be transferring from another country, i.e., not be of Dutch nationality, and not have resided more than 12 months in the last 3 years in The Netherlands. - be willing to work in at least one other country in the BBfor2 network. - have less than 4 years of research experience since their master degree, and not hold a PhD. Organization The project will be carried out within the Centre for Language and Speech Technology (CLST), a research unit within the Faculty of Arts of the Radboud University Nijmegen. The CLST hosts a large international group of senior researchers and PhD students who do research at the frontier of science and develop innovative applications. Conditions of employment The duration of the contract is 3 years. The PostDoc will receive an initial contract for the duration of one year, with the possibility of prolongation for another 2 years. The salary is in accordance with the rules of the Marie Curie ITNs. The annual gross salary is EUR 25,000 in the first year and will grow to EUR 30,000 in the third year. In addition to the salary, travel allowances and career exploratory allowances are foreseen according to generous Marie Curie ITN provisions. The Radboud University is an equal opportunity employer. Female researchers are strongly encouraged to apply for this vacancy. Additional information For further information about the position, please contact David van Leeuwen, d.vanleeuwen@let.ru.nl.
Application Letters of application, including extensive CVs, (with reference to the vacancy number 23.02.10 and preferably by e-mail) can be sent to: vacatures@let.ru.nl. Candiates can apply until August 15th, 2010.
| |||||||||
6-13 | (2010-06-24) PhD POSITION in PERSON RECOGNTION IN AUDIOVISUAL BROADCASTS Grenoble France PhD POSITION in PERSON RECOGNTION IN AUDIOVISUAL BROADCASTS (36
| |||||||||
6-14 | (2010-06-29) Post doc Universite de Neuchatel Suisse 1 poste de POST-DOCTORANT(E)
à temps partiel (50%)
dans le cadre d’un projet FNS portant sur l’étude psycholinguistique,
neurolinguistique et eletrophysiologique (ERP) des processus cognitifs
impliqués dans la production du langage
Charge Collaboration dans le cadre du projet de recherche FNS,
conduite de recherche indépendante dans une problématique
reliée.
Entrée en fonction 1er octobre 2010 ou à convenir
Traitement légal
Durée du mandat : 2 ans
Titre requis Doctorat en psychologie, linguistique, logopédie ou sciences
du langage.
Profil Recherche scientifique sur les processus cognitifs impliqués
dans la production du langage dans le domaine de
la psycholinguistique expérimentale et/ou neurolinguistique
et/ou neuroimagerie fonctionnelle.
Les demandes de renseignements peuvent être adressées par e-mail à :
Marina.Laganaro@unine.ch
Le dossier de candidature (CV et lettre de motivation) doivent être adressées à Marina
Laganaro, par e-mail de préférence (Marina.Laganaro@unine.ch) jusqu’au 30
juillet 2010.
Neuchâtel, le 25 juin 2010
| |||||||||
6-15 | (2010-06-30) Doctoral and postdoctoral opportunities in Forensic Voice Comparison Australia Doctoral and postdoctoral opportunities in Forensic Voice Comparison
| |||||||||
6-16 | (2010-07-07) Two positions at ELDA Two positions are currently available at ELDA.
Engineer in HLT Evaluation Department
He/she will be in charge of managing the evaluation activities in relation with the collection of Language Resources for evaluation, the evaluation of technology components, and in general, the setting up of an HLT evaluation infrastructure. As part of the HLT Evaluation Department, he/she will be working on European projects and will be involved in the evaluation of technology components related to information retrieval, information extraction, machine translation, etc.
ProgrammerELDA offers a position for its Language Resource Production and Evaluation activities working in the framework of European projects. The position is related to a number of NLP activities within ELDA, with a focus on the development of web-service architectures for the automatic production and distribution of language resources. The candidate may also be involved in the creation of LR repositories, NLP applications development and/or evaluation, etc. Profile :
Applicants should send (preferably via email) a cover letter addressing the points listed above together with a curriculum vitae to :
Khalid Choukri
ELRA / ELDA
55-57, rue Brillat-Savarin
75013 Paris
FRANCE
Fax : 01 43 13 33 30
Courriel : job@elda.org
| |||||||||
6-17 | (2010-07-07) Doctorat au LORIA Nancy France (fluency in french required)) Sujet de thèse
Motivations
Dans le cadre d'une collaboration avec une entreprise qui commercialise des morceaux de
documentai res vidéo (rushes), nous nous intéres sons à la reconnaissance automatique
des dialogues de ces rushes afin de pouvoir les indexer.
L'équipe parole a développé un système de transcription automa tique de bulletins
d'information : ANTS [2,3]. Si les performances des systèmes de transcription
automatique actuels sont satisfaisantes dans le cas de la parole lue ou
« préparée » (bulletins d'informations, discours), elles se dégradent fortement dans le cas
de la parole spontanée [1,4,5]. Par rappor t à la parole préparée, la parole spontanée se
caractérise par:
• des insertions (hésitations, pauses, faux dépar t s de mots, reprises),
• des variations de prononciations comme la contraction de mots ou de syllabes
(/monsieur / => /m' sieu / ),
• des variations de la vitesse d'élocution (réduction de l'articulation de certains
phonèmes et allongement s d'aut res phonèmes),
• des environnement s sonores difficiles (parole superposée, rires, bruits
d'ambiance...).
Ces spécificités sont peu ou pas prises en compte par les systèmes de reconnaissance
actuels. Tous ces phénomène s provoquent des erreur s de reconnais sance et peuvent
entraîner une indexation erronée.
Sujet
Le but du sujet de thèse est de prendre en compte un ou plusieurs des phénomènes
spécifiques décrits ci- dessus, afin d'améliorer le taux de reconnaissance [4,6,7]. Les
phénomène s seront choisis et traités au niveau acoustique ou linguistique en fonction du
profil du candidat. Le travail consistera à :
• comprendre l'architecture de ANTS,
• pour les phénomène s choisis, faire un état de l'art et proposer de nouveaux
algorithmes,
• réaliser un prototype de reconnaissance de parole spontanée et le valider sur un
corpus de parole spontanée étiqueté.
Cadre du travail
Le travail s'effectuera au sein de l'équipe Parole de l'Inria - Loria à Nancy
(http: / / p a role.loria.fr). L'étudiant utilisera le logiciel ANTS de reconnaissance
automatique de la parole développé dans l'équipe.
Profil souhaité
Les candidat s devront maîtriser le français et l'anglais et savoir programme r en C ou en
Java dans un environnement Unix. Des connaissances en modélisation stochas tique ou en
traitement automatique de la parole seront un plus.
Contacts : illina@loria.fr , fohr@loria.fr ou mella@loria.fr
[1] S. Galliano, E. Geoffrois, D.Mostefa , K. Choukri, JF. Bonastre and G. Gravier, The ESTER Phase II Evaluation
Campaign for Rich Transcription of French broadcas t news, EUROSPEECH 2005,
[2] I. Irina, D. Fohr, O. Mella and C.Cerisara, The Automatic News Transcription System: ANTS some realtime
experiment s, ISCPL2004
[3] D. Fohr, O. Mella, I. Irina and C. Cerisara, Experiment s on the accuracy of phone models and liaison
proces sing in a French broadcas t news transcription systems, ISCPL2004
[4] J.- L Gauvain, G. Adda, L. Lamel, L. F. Lefevre and H. Schwenk, Transcription de la parole conversationnelle
Revue TAL vol 45 n° 3
[5] M. Garnier - Rizet, G. Adda, F. Cailliau, J.- L. Gauvain, S. Guillemin- Lanne, L. Lamel, S. Vanni, C. Waaste -
Richard CallSurf: Automatic transcription, indexing and structuration of call center conversational speech for
knowledge extraction and query by content. LREC 2008
[6] J.Ogata, M.Goto, The use of acous tically detected filled and silent pauses in spontaneous speech
recognition ICASSP 2009
[7] F. Stouten, J. Duchateau, J.- P. Martens and P. Wambacq, Coping with disfluencies in spontaneous speech
recognition: Acoustic detection and linguistic context manipulation, Speech Communication vol 48, 2006
| |||||||||
6-18 | (2010-07-14 ) Ph D position at Loria Nancy (in french)
Sujet de these Motivations Dans le cadre d'une collaboration avec une entreprise qui commercialise des morceaux de documentai res vidéo (rushes), nous nous intéres sons à la reconnaissance automatique des dialogues de ces rushes afin de pouvoir les indexer. L'équipe parole a développé un système de transcription automa tique de bulletins d'information : ANTS [2,3]. Si les performances des systèmes de transcription automatique actuels sont satisfaisantes dans le cas de la parole lue ou « préparée » (bulletins d'informations, discours), elles se dégradent fortement dans le cas de la parole spontanée [1,4,5]. Le travail s'effectuera au sein de l'équipe Parole de l'Inria - Loria à Nancy (http: / / parole.loria.fr). L'étudiant utilisera le logiciel ANTS de reconnaissance automatique de la parole développé dans l'équipe. Profil souhaité Les candidat s devront maîtriser le français et l'anglais et savoir programme r en C ou en Java dans un environnement Unix. Des connaissances en modélisation stochas tique ou en traitement automatique de la parole seront un plus. Contacts [1] S. Galliano, E. Geoffrois, D.Mostefa , K. Choukri, JF. Bonastre and G. Gravier, The ESTER Phase II Evaluation Campaign for Rich Transcription of French broadcas t news, EUROSPEECH 2005, [2] I. Irina, D. Fohr, O. Mella and C.Cerisara, The Automatic News Transcription System: ANTS some realtime experiment s, ISCPL2004 [3] D. Fohr, O. Mella, I. Irina and C. Cerisara, Experiment s on the accuracy of phone models and liaison proces sing in a French broadcas t news transcription systems, ISCPL2004 [4] J.- L Gauvain, G. Adda, L. Lamel, L. F. Lefevre and H. Schwenk, Transcription de la parole conversationnelle Revue TAL vol 45 n° 3 [5] M. Garnier - Rizet, G. Adda, F. Cailliau, J.- L. Gauvain, S. Guillemin- Lanne, L. Lamel, S. Vanni, C. Waaste - Richard CallSurf: Automatic transcription, indexing and structuration of call center conversational speech for knowledge extraction and query by content. LREC 2008 [6] J.Ogata, M.Goto, The use of acous tically detected filled and silent pauses in spontaneous speech recognition ICASSP 2009 [7] F. Stouten, J. Duchateau, J.- P. Martens and P. Wambacq, Coping with disfluencies in spontaneous speech recognition: Acoustic detection and linguistic context manipulation, Speech Communication vol 48, 2006
| |||||||||
6-19 | (2010-07-20) Ph D at IDIAP Martigny SwitzerlandPhD POSITION in PERSON SEGMENTATION AND CLUSTERING IN AUDIO-VIDEO STREAMS, 36 MONTHS STARTING IN OCTOBER 2010, in IDIAP (MARTIGNY, SUISSE) AND LIUM (LE MANS, FRANCE), NET SALARY: 1700€ + INDEMNITY ------------------------------------------------------------------------------------ Research areas: Audio/video segmentation and clustering, speaker recognition, face recognition, pattern recognition, machine learning, audio and image processing. --- Description: The objective of the thesis is to investigate novel algorithms for the automatic segmentation and clustering of people in audio-visual documents. More precisely, the goal is to detect the people who appear in the documents, when they appear or/and when they speak, with whom they speak, and who they are. The work will rely on and improve previous knowledge of the LIUM and IDIAP in speaker diarization, names recognition from automatic speech transcripts, person detection, tracking and recognition, and will be expanded to address the audio-visual identity association and the recognition of the roles of people in the Tv shows. The work will be evaluated in the framework of the REPERE evaluation campaign, which is a challenge for audio and video person detection and recognition in TV broadcasts (journal debates, sitcoms) and will focus on segmentation and clustering targeting well-known people (anchors, journalists, known or introduced persons). --- Supervision and organization: The proposed position is funded by the ANR in the SODA project. It is a joint PhD position within both IDIAP and LIUM, under academic co-supervision by Profs. Paul Deléglise (LIUM), Jean-Marc Odobez (IDIAP) and Sylvain Meignier (LIUM). He will work closely with a post-doctoral fellow working for the same project. The candidate will be registered as a student at the University of Le Mans. He will share this time between Le Mans and Martigny depending on the need. The position will start in October 2010 and the net salary will be between €1700 a month. 18 months of indemnity (€500 per month) will be provided to support the extra cost of working at two different sites, as well as the higher cost of life in Martigny. --- Requirement: Applicants should hold a strong university degree entitling them to start a doctorate (Master’s degree or equivalent) in a relevant discipline (Computer Science, Human Language Technology, Machine Learning, etc). Applicants for this full-time 3 year PhD position should be fluent in English or in French. Competence in French is optional, though applicants will be encouraged to acquire this skill during training. Very strong software skills are required, especially in Java, C, C++, Unix/Linux, and at least one scripting language such as Perl or Python. --- Contact: Please send a curriculum vitae to Jean-Marc Odobez odobez@idiap.ch AND sylvain.meignier@lium.univ-lemans.fr
| |||||||||
6-20 | (2010-07-28) Ph D position in model based speech synthesis Post Doctoral Speech Synthesis Research Associate Position
The Communication Analysis and Design Laboratory at Northeastern University is pleased to announce the availability of a postdoctoral research associate position, funded by the National Science Foundation Division of Computer and Information Systems. This project aims to build a personalized speech synthesizer for individuals with severe speech impairments by mining their residual source characteristics and morphing these vocal qualities with filter properties of a healthy talker. An initial prototype has been designed and implemented in MATLAB. Further work is required to refine the voice morphing and speech synthesis algorithms, to develop a front-end user interface and to assess system usability. The successful candidate will work on an interdisciplinary team toward the project goals.
Required Skills: PhD in computer science or electrical engineering or related field Strong knowledge in machine learning and digital signal processing Extensive experience with MATLAB and C/C++ programming Experience with building graphical user interfaces Knowledge of, and experience with, concatenative and/or model-based speech synthesis This position is available immediately. Funding is available for up to two years on this project. Additional funding may be available for work on related projects. Interested candidates should email and/or send the following to Rupal Patel, Director, Communication Analysis and Design Laboratory, 360 Huntington Avenue, Boston, MA, 02115; r.patel@neu.edu; 617-373-5842: A cover letter stating your research interests and career goals, CV, two letters of recommendation, official transcripts of all postsecondary education.
| |||||||||
6-21 | (2010-08) Speech Synthesis Post Doctoral Research Associate Position Speech Synthesis Post Doctoral Research Associate Position
The Communication Analysis and Design Laboratory at Northeastern University is pleased to announce the availability of a postdoctoral research associate position, funded by the National Science Foundation Division of Computer and Information Systems. This project aims to build a personalized speech synthesizer for individuals with severe speech impairments by mining their residual source characteristics and morphing these vocal qualities with filter properties of a healthy talker. An initial prototype has been designed and implemented in MATLAB. Further work is required to refine the voice morphing and speech synthesis algorithms, to develop a front-end user interface and to assess system usability. The successful candidate will work on an interdisciplinary team toward the project goals.
Required Skills: PhD in computer science or electrical engineering or related field Strong knowledge in machine learning and digital signal processing Extensive experience with MATLAB and C/C++ programming Experience with building graphical user interfaces Knowledge of, and experience with, concatenative and/or model-based speech synthesis This position is available immediately. Funding is available for up to two years on this project. Additional funding may be available for work on related projects. Interested candidates should email and/or send the following to Rupal Patel, Director, Communication Analysis and Design Laboratory, 360 Huntington Avenue, Boston, MA, 02115; r.patel@neu.edu; 617-373-5842: A cover letter stating your research interests and career goals, CV, two letters of recommendation, official transcripts of all postsecondary education.
| |||||||||
6-22 | (2010-09-08) European project in Basque country
| |||||||||
6-23 | (2010-09-12) Ph D positions at KTH PhD Student Positions:
| |||||||||
6-24 | (2010-09-27) Two positions at ELDA Two positions are currently available at ELDA (reminder).
| |||||||||
6-25 | (2010-10-01) Ingenieur/Doctorat en Reconnaissance automatique de la parole des personnes âgées Reconnaissance automatique de la parole des personnes âgées pour les,services d’assistance aux personnes à domicile
|