ISCApad #180 |
Monday, June 10, 2013 by Chris Wellekens |
6-1 | (2012-12-07) Doctoral and Post-doctoral Positions in Signal Processing for Hearing Instruments, Bochum, Germany Doctoral and Post-doctoral Positions in Signal Processing for Hearing Instruments Position Description The ITN ICanHear, starting on 1 January 2013, will provide cutting-edge research projects for 12 doctoral and 5 post-doctoral research fellows in digital signal processing for hearing instruments. ICanHear aims to develop models based on emerging knowledge about higher-level processing within the auditory pathway and exploit that knowledge to develop creative solutions that will improve the performance of hearing instruments. Attractive grants and a wide variety of international training activities, including collaborations with ICanHear Associated Partners in the U.K., Switzerland, Belgium, U.S.A., and Canada, will be made available to successful candidates, who will stay in the network for a period of 12 to 36 months.
Research and training positions will be available in the following ICanHear labs:
Requirements for Candidates and Application procedure: Early-stage (doctoral) Research Fellows have less than four years experience in research after obtaining a Masters degree in engineering, computer science, or similar. Experienced (post-doctoral) Researcher Fellows are already in possession of a doctoral degree or have at least 4 years but less than 5 years of research experience in engineering and/or hearing research. In order to ensure transnational mobility candidates may have resided no more than 12 months (during the last 3 years) in the country of the host institution they wish to apply to. For all positions excellent English language skills are required. To apply please send in the following documents via e-mail to the ICanHear coordination office (icanhear@rub.de): CV, certified copies of all relevant diplomas and transcripts, two letters of recommendation, proof of proficiency in English, letter of motivation (research interest, reasons for applying to programme and host). For further information on research projects available, application details and eligibility please visit the ICanHear web-site (http://www.icanhear-itn.eu) or contact the project coordinator Rainer Martin (rainer.martin@rub.de).
| |||||
6-2 | (2012-12-15) Technicien en instrumentation scientifique,expérimentation et mesure Aix-en-Provence FranceCAMPAGNE NOEMI HIVER 2012-2013 PROFIL DE POSTE Description de l'Unité Code unité : UMR 7309 Nom de l’unité : Laboratoire Parole et Langage Directeur : Noël NGUYEN Ville : Aix-en-Provence Délégation régionale : DR12 Institut : INSHS Description du poste NUMERO NOEMI : T54030 CORPS : Technicien BAP : C Emploi-type : C4B21-Technicien en instrumentation scientifique, expérimentation et mesure Fonction Technicien de Plateforme Technicien de Plateforme Mission Au sein du Laboratoire Parole et Langage (LPL), affecté au Centre d’Expérimentation sur la Parole (CEP), l’agent sera chargé du soutien aux expériences en collaboration avec le coordinateur de la plateforme. Activités L’activité principale consiste à apporter un soutien quotidien au fonctionnement de la plateforme, il peut s’agir notamment : - D’assurer le prêt et le suivi du matériel utilisé sur la plateforme ou à l’extérieur, - D’effectuer le montage, l’assemblage de sous-ensembles (notamment audio et vidéo) pour la réalisation d’expériences, - D’assister les expérimentateurs lors de la passation d’expériences en appliquant un protocole défini, - D’effectuer des modifications ou adaptations de dispositifs expérimentaux, - D’assurer la maintenance et les interventions de premier niveau, la détection et le diagnostique de pannes,. - De réaliser des enregistrements (installation et enregistrement proprement-dit) audio et vidéo, - D’assurer la gestion des consommables nécessaires au déroulement des expériences, - D’utiliser des applications logicielles de contrôle d’instruments. Compétences Le (ou)la candidate devra faire preuve d’une grande motivation pour ce poste de soutien indispensable au fonctionnement du centre d’expérimentation sur la parole. Une formation de base en électronique et/ou en mesures physiques est souhaitée, pour mener à bien la réalisation éventuelle de modules élémentaires de synchronisation entre instruments, ou encore pour synchroniser les systèmes d’enregistrement audio et vidéo, pour effectuer le montage et l’assemblage de sous-ensembles pour la réalisation de dispositifs expérimentaux. Le ou la candidate devra apprécier le travail en équipe puisqu’il ou elle travaillera en lien étroit avec le coordinateur de la plateforme. La personne doit être capable d’apprendre de nouvelles techniques, et avoir goût pour lele sens du contact humain car elle sera en contact avec un grand nombre d’utilisateurs. Elle devra montrer une grande rigueur dans le respect des procédures mises en place. L’adhésion aux règles d’hygiène et sécurité en place ainsi est indispensable. Contexte Le Laboratoire Parole et Langage est une unité de recherche du CNRS et d’Aix Marseille Université. Il accueille des phonéticiens, linguistes, des informaticiens, des psychologues, des neuroscientifiques, des physiciens et des médecins. Les activités du LPL portent sur l’étude des mécanismes de production et de perception du langage et de la parole. Le LPL se distingue par ses méthodes de recherche reposant à la fois sur l’expérimentation, l’investigation instrumentale et la formalisation. Approche originale dans ce champ scientifique, qui émarge à la fois aux domaines des sciences humaines, des sciences du vivant et des sciences pour l’ingénieur. Cette particularité explique, au-delà d’une forte activité de recherche fondamentale, l’importance des applications développées à partir des travaux menés dans les domaines du traitement de l’écrit, de l’intelligibilité du message parlé, de la conversion texte-parole de qualité, ou encore de l’évaluation et de la rééducation des troubles de la voix ou du langage. Ces caractéristiques font du Laboratoire Parole et Langage une unité de recherche adaptée aux défis scientifiques des sciences du langage, tout en étant impliquée dans leurs enjeux technologiques. Le LPL regroupe actuellement plus de 80 personnes statutaires (chercheurs, enseignants-chercheurs, ingénieurs, techniciens, administratifs), auxquelles s’ajoutent 40 doctorants dont 20 boursiers. Il est le laboratoire français le plus important dans ce domaine scientifique et l’un des premiers en Europe. Le LPL dispose désormais d’une plateforme technique regroupant un ensemble d’instruments pour l’investigation de la production et la perception de la parole : électro-encéphalographie, tracking oculaire, articulographie, électro- palatographie, évaluation articulatoire, etc. Cette ressource unique en Europe est mutualisée au sein du Centre d’Expérimentation sur la Parole (http://www.lpl.univ-aix.fr/~cep/), plateforme technique à laquelle le poste sera affecté le poste.
| |||||
6-3 | (2012-12-16) Master project IRISA Rennes France Computer Science Internship CORDIAL group Title : Voice Conversion from non-parallel corpora Description : The main goal of a voice conversion system (VCS) is to transform the speech signal uttered by speaker (the source speaker) so that it sounds like it was uttered by an other person (the target speaker). The applications of such techniques are limitless. For example, a VCS can be combined to a Text-To-Speech system in order to produce multiple high quality synthetic voices. In the entertainment domain, a VCS can be used to dub an actor with its own voice. State of the art VCS use Gaussian Mixture Models (GMM) to capture the transformation from the acoustic space of the source to the acoustic space of the target. Most of the models are source-target joint models that are trained on paired source-target observations. Those paired observations are often gathered from parallel corpora, that is speech signals resulting from the two speakers uttering the same set of sentences. Parallel corpora are hard to come with. Moreo- ver, they do not guaranty that the pairing of vectors is accurate. Indeed, the pairing process is unsupervised and uses a Dynamic Time Warping under the strong (and unrealistic) hypothesis that the two speakers truly uttered the same sentence, with the same speaking style. This asser- tion is often wrong and results in non-discriminant models that tends to over-smooth speaker's distinctive characteristics. The goal of this Master subject is to suppress the use of parallel corpora in the process of training joint GMM for voice conversion. We suggest to pair speech segments on high level speech descriptors as those used in Unit Selection Text-To-Speech. Those descriptors not only contain the segmental information (acoustic class for example) but also supra-segmental informations such as phoneme context, speed, prosody, power, ... In a rst step, both source and target corpora are segmented and tagged with descriptors. In a second step, each class from one corpus is paired with the equivalent class from the other corpus. Finally, a classical DTW algorithm can be applied on each paired class. The expected result is to derive transform models that both could take into account speaker variability and be more robust to pairing errors. Keywords : Voice Conversion, Gaussian Mixture Models Contacts : Vincent Barreaud (vincent.barreaud@irisa.fr) Bibliographie : [1] H. Benisty and D. Malah. Voice conversion using gmm with enhanced global variance. In Conference of the International Speech Communication Association (Interspeech) , pages 669{ 672, 2011. [2] L. Mesbahi, V. Barreaud, and O. Boeard. Non-parallel hierarchical training for voice conver- sion. In Proceedings of the 16th European Signal Processing Conference, Lausanne, Switzerland, 2008. [3] Y. Stylianou, O. Cappe, and E. Moulines. Continuous probabilistic transform for voice conver- sion. IEEE Transactions on Speech and Audio Processing, 6(2) :131-142, 1998.
| |||||
6-4 | (2012-12-16) Master project 2 IRISA Rennes France Computer Science Internship CORDIAL group Title : Unit-selection speech synthesis guided by a stochastic model of spectral and prosodic parameters. A Text-To-Speech system (TTS) produces a speech signal corresponding to the vocalization of a given text. Such a system is composed of a linguistic processing stage followed by an acoustic one which complies as much as possible with the linguistic directives. Concerning the second step, the most used approaches are { the corpus based synthesis approach which lies on the selection and concatenation of unit sequences extracted from a large continuous speech corpus. It has been popular for 20 years, yielding an unmatched sound quality but still bearing some artefacts due to spectral discontinuities. { the statistical approach. The new generation of TTS systems has emerged in the last years, reintroducing the rule based systems. The rules are no longer deterministic like in the rst systems in the 1950's, but they are replaced by stochastic models. HTS, an HMMbased speech synthesis system, is currently the most used statistical system. The HTS type systems yield a good acoustic continuum but with a sound quality strongly depending on the underlying acoustic model. Recently, some hybrid synthesis systems have been proposed, combining the statistical approach with the method of unit selection. It consists in using the acoustic descriptions and the melodic contours generated by a statistical system in order to drive the cost function during the natural speech unit selection phase, or also, substituting the poor quality natural speech units by units derived from a statistical system. The framework of this subject is the corpus based TTS. Considering the combinatorial problem due to the search of an optimal unit sequence with a blind sequencing, the work consists in determining heuristics to reduce the search space and satisfy a real time objective. These assumptions, based on spectral and prosodic type parameters generated by HTS, will permit to implement pre-selection lters or to propose new cost functions within the corpus based system developped by the Cordial group. The production of the hybrid system will be evaluated and compared via listening tests with standard systems like HTS and a corpus based system. Keywords : TTS, Corpus based speech synthesis, Statistical Learning, Experiments. Contacts : Olivier Boe Bibliography : [1] A. W. Black and K. A. Lenzo, Optimal data selection for unit selection synthesis, 4th ISCA Tutorial and Research Workshop on Speech Synthesis, 2001. [2] H. Kawai, T. Toda, J. Ni, M. Tsuzaki and K. Tokuda, Ximera : a new tts from atr based on corpus-based technologies . ISCA Tutorial and Research Workshop on Speech Synthesis, 2004. [3] S. Rouibia and O. Rosec, Unit selection for speech synthesis based on a new acoustic target cost , Interspeech, 2005. [4] H. Zen, K. Tokuda and A. W. Black, Statistical parametric speech synthesis. Speech Communication, v.51, n.11, pages 1039{1064, 2009. [5] H. Silen, E. Helander, J. Nurminen, K. Koppinen and M. Gabbouj, Using Robust Viterbi Algorithm and HMM-Modeling in Unit Selection TTS to Replace Units of Poor Quality , Interspeech 2010.
| |||||
6-5 | (2012-12-16) Master project 3 IRISA Rennes France Computer Science Internship CORDIAL group Title: Grapheme-to-phoneme conversion adaptation using conditional random elds Description: Grapheme-to-phoneme conversion consists in generating possible pronuncia- tions for an isolated word or for a sequence of words. More formally, this conversion is a translit- eration of a sequence of graphemes, i.e., letters, into a sequence of phonemes, symbolic units to represent elementary sounds of a language. Grapheme-to-phoneme converters are used in speech processing
either to help automatic speech recognition systems to decode words from a speech signal
or as a mean to explain speech synthesizers how a written input should be acoustically produced. A problem with such tools is that they are trained on large and varied amounts of aligned sequences of graphemes and phonemes, leading to generic manners of pronouncing words in a given language. As a consequence, they are not adequate as soon as one wants to recognize or synthesize specic voices, for instance, accentuated speech, stressed speech, dictating voices versus chatting voices, etc. [1]. While multiple methods have been proposed for grapheme-to-phoneme conversion [2, 3], the primary goal of this internship is to propose a method to adapt grapheme-to-phoneme models which can easily be adapted under conditions specied by the user. More precisely, the use of conditional random elds (CRF) will be studied to model the generic French pronunciation and variants of it [4]. CRFs are state-of-the-art statistical tools widely used for labelling problems in natural language processing [5]. A further important goal is to be able to automatically characterize pronunciation distinctive features of a given specic voice as compared to a generic voice. This means highlighting and generalizing di sequences of phonemes derived from a same sequence of graphemes. Results of this internship would be integrated into the speech synthesis platform of the team in order to easily and automatically simulate and imitate specic voices. Technical skills: C/C++ and a scripting language (e.g., Perl or Python) Keywords: Natural language processing, speech processing, machine learning, statistical learn- ing Contact: Gwenole Lecorve (gwenole.lecorve@irisa.fr) References: [1] B. Hutchinson and J. Droppo. Learning non-parametric models of pronunciation. In Pro- ceedings of ICASSP , 2011. [2] M. Bisani and H. Ney. Joint-sequence models for grapheme-to-phoneme conversion. In Speech Communication , 2008. [3] S. Hahn, P. Lehnen, and Ney H. Powerful extensions to crfs for grapheme to phoneme conversion. In Proceedings of ICASSP, 2011. [4] Irina Illina, Dominique Fohr, and Denis Jouvet. Multiple pronunciation generation using grapheme-to-phoneme conversion based on conditional random elds. In Proceedings of SPECOM , 2011. [5] John D. La elds: probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML , 2001.
| |||||
6-6 | (2013-01-14) Ph.D. Researcher in Speech Synthesis, Trinity College, Dublin, Ireland Post Specification Post Title: Ph.D. Researcher in Speech Synthesis Post Status: 3 years Department/Faculty: Centre for Language and Communication Studies (CLCS) Location: Phonetics and Speech Laboratory Salary: €16,000 per annum (plus fees paid) Closing Date: 31st January 2013 Post Summary A Ph.D. Researcher is required to work in the area of speech synthesis at the Phonetics and Speech Laboratory, School of Linguistic, Speech and Communication Sciences. The position will involve carrying out research on the topic of Hidden Markov Model (HMM)-based speech synthesis. Specifically, we are looking for a researcher to work on developing a source-filter based acoustic modelling for HMM-based speech synthesis which is closely related to the human speech production process and which can facilitate modification of voice source and vocal tract filter components at synthesis time. Background to the Post Much of the research carried out to date in the Phonetics and Speech Laboratory has been concerned with the role of the voice source in speech. This research involves the development of accurate voice source processing both as a window on human speech production and for exploitation in voice-sensitive technology, particularly synthesis. The laboratory team is interdisciplinary and includes engineers, linguists, phoneticians and technologists. This post will the main be funded by the on-going Abair project which has developed the first speech synthesisers for Irish (www..abair.ie), and the researcher will exploit the current Abair synthesis platform. In this project the aim is to deliver multi-dialect synthesis with multiple personages and voices that can be made appropriate to different contexts of use. The post will also be linked to the FastNet project which aims at voice-sensitive speech technologies. A specific goal of our laboratory team is to leverage our expertise on the voice by improving the naturalness of parametric speech synthesis, as well as making more flexible synthesis platforms which can allow modifications of voice characteristics (e.g., for creating different personalities/characters, different forms of expression etc). Standard duties of the Post Initially the researcher will be required to attend some lectures as part of the Masters programme on Speech and Language Processing. This and a supervised reading programme will provide a background in the area of voice production, analysis and synthesis. * In the very early stages the researcher will be required to develop synthetic voices, using the Irish corpora, with the standard HMM-based synthesis platform (i.e. HTS). Note that to work with the Irish corpora does not require a background in the Irish language, as there will be collaboration with experts in this field. * The researcher will be required to familiarise themselves with existing speech synthesis platforms which provide explicit modelling of the voice source (e.g., Cabral et al. 2011, Raitio et al. 2011, Anumanchipalli et al. 2010). * The researcher will then need to first implement similar versions of these systems and then work towards developing novel vocoding methods which would allow full parametric flexibility of both voice source and vocal tract filter components at synthesis time. Person Specification Qualifications * Bachelors degree in Electrical Engineering, Computer Science with specialisation in speech signal processing, or related areas. * Knowledge & Experience (Essential & Desirable) * Strong digital signal processing skills (Essential) *Good knowledge of HTS including previous experience developing synthetic voices (Essential) * Knowledge of speech production and perception (Desirable) * Experience in speech recognition (Desirable) Skills & Competencies * Good knowledge of written and spoken English. Benefits * Opportunity to work with a world-class inter-disciplinary speech research group. To apply, please email a brief cover letter and CV, including the names and addresses of two academic referees, to: kanejo@tcd.ie and to cegobl@tcd.ie
| |||||
6-7 | (2013-01-12) Stage de Master à IRIT Toulouse F Nous sommes à la recherche d'une personne pour un stage de Master 2 recherche qui sera suivi d'un financement CIFRE de 3 ans avec AIRBUS. Merci de me contacter en m'envoyant un CV et une lettre de motivation le plus rapidement possible.
| |||||
6-8 | (2013-01-20) Ph.D. Researcher in Speech Synthesis, Trinity College, Dublin, Ireland Ph.D. Researcher in Speech Synthesis, Trinity College, Dublin, Ireland Post Specification Post Title: Ph.D. Researcher in Speech Synthesis Post Status: 3 years Department/Faculty: Centre for Language and Communication Studies (CLCS) Location: Phonetics and Speech Laboratory Salary: €16,000 per annum (plus fees paid) Closing Date: 31st January 2013 Post Summary A Ph.D. Researcher is required to work in the area of speech synthesis at the Phonetics and Speech Laboratory, School of Linguistic, Speech and Communication Sciences. The position will involve carrying out research on the topic of Hidden Markov Model (HMM)-based speech synthesis. Specifically, we are looking for a researcher to work on developing a source-filter based acoustic modelling for HMM-based speech synthesis which is closely related to the human speech production process and which can facilitate modification of voice source and vocal tract filter components at synthesis time. Background to the Post Much of the research carried out to date in the Phonetics and Speech Laboratory has been concerned with the role of the voice source in speech. This research involves the development of accurate voice source processing both as a window on human speech production and for exploitation in voice-sensitive technology, particularly synthesis. The laboratory team is interdisciplinary and includes engineers, linguists, phoneticians and technologists. This post will the main be funded by the on-going Abair project which has developed the first speech synthesisers for Irish ( www.abair.ie), and the researcher will exploit the current Abair synthesis platform. In this project the aim is to deliver multi-dialect synthesis with multiple personages and voices that can be made appropriate to different contexts of use. The post will also be linked to the FastNet project which aims at voice-sensitive speech technologies. A specific goal of our laboratory team is to leverage our expertise on the voice by improving the naturalness of parametric speech synthesis, as well as making more flexible synthesis platforms which can allow modifications of voice characteristics (e.g., for creating different personalities/characters, different forms of expression etc). Standard duties of the Post Initially the researcher will be required to attend some lectures as part of the Masters programme on Speech and Language Processing. This and a supervised reading programme will provide a background in the area of voice production, analysis and synthesis.
| |||||
6-9 | (2013-01-23) Tenure-track and Research Assistant Professor positions at the Toyota Technological Institute, Chicago Tenure-track and Research Assistant Professor positions
Toyota Technological Institute at Chicago (http://www.ttic.edu) is a philanthropically endowed academic computer science institute, dedicated to basic research and graduate education in computer science, located on the University of Chicago campus. TTIC opened for operation in 2003. It currently has 9 tenure-track/tenured faculty, 10 research faculty, and a number of adjoint/visiting faculty, and is growing. Regular faculty have a teaching load of one course per year and research faculty have no teaching responsibilities. Research faculty positions are endowed positions (not based on grant funding) and are for a term of 3 years. Applications are welcome in all areas of computer science, including speech and language processing, for both tenure-track and research faculty positions. Applications can be submitted online at http://www.ttic.edu/faculty-hiring.php . Additional questions can be directed to Karen Livescu at klivescu@ttic.edu
| |||||
6-10 | (2013-01-25) Postdoc at Orange Labs Lannion Densité de présence des personnes dans les émissions télévisuelles
Le sujet porte sur la détection automatique de la présence des personnes dans les émissions télévisuelles, et introduit une notion qualitative et quantitative de cette présence, résumée sous la dénomination « densité de présence ».
En effet, les personnes présentes dans une émission télévisuelle n’occupent pas toutes une place équivalente, durant l’émission.
Tout d’abord, il convient de différencier la notion de présence vs la citation : par exemple, on peut distinguer plusieurs niveaux de présence :
-présence physique (ou en duplex) de la personne dans l’émission : la personne parle dans l’émission (interview etc)
-présence par extrait : l’émission montre des extraits de documents audiovisuels ou la personne parle.
-citation visuelle: la personne ne parle pas , mais est montrée (images de reportages, extrait)
-citation : on parle de cette personne, qui n’est pas présente dans l’émission
Ensuite, il convient de différencier « l’intensité » de cette présence, selon le rôle qu’occupe cette personne dans l’émission : sujet principal, témoin. Cette notion d’intensité est orthogonale au type de présence : une personne peut être présente par citation uniquement, et sujet central. C’est la combinaison des niveaux de présence et de leur intensité (rôle) qui définit ce que nous proposons de nommer « densité » de présence.
Enfin, ces notions ne sont pas nécessairement constantes tout le long de l’émission, et il convient de déterminer automatiquement les segments durant lesquelles la densité de présence de la personne est constante. En pratique, cela permet par exemple d’extraire d’une émission uniquement le segment où telle personne est le sujet principal.
Le travail de ce post-doc consistera dans un premier temps à affiner ces notions de présence/intensité, pour formaliser le problème de classification/segmentation automatique associé. Il s’agira ensuite d’annoter les corpus d’émissions télévisuelles disponibles en fonction des classes de présence, puis de concevoir, développer, et tester les algorithmes permettant de répondre à ce problème.
Le postdoc est d’une durée de 12 mois, non-renouvelable, à Orange Labs Lannion, rémunéré 34k€ brut annuel (soit environ 2150€ net/mois).
Il doit être le premier contrat de travail après la soutenance de la thèse.
Pour tout renseignement : delphine.charlet@orange.com
| |||||
6-11 | (2013-01-30) 15 RESEARCH POSITIONS (PhD, Post-Doc and Research Programmer), Dublin, Ireland 15 RESEARCH POSITIONS (PhD, Post-Doc and Research Programmer) IN
| |||||
6-12 | (2013-02-01) Ph.D. Research Assistant or Post-Doctoral Researcher, Cooperative State University in Karlsruhe, Germany This is a pre-announcement for a position in the Computer Science Department at the Cooperative State University in Karlsruhe, Germany, for a
Ph.D. Research Assistant or Post-Doctoral Researcher in the field of Automatic Language Processing for Education
to be filled immediately with a salary according to TV-L E13 at 50% for 18 Months. The opening is in Karlsruhe, Germany, as part of a joint research project between Karlsruhe Institute of Technology (KIT), the Cooperative State University (DHBW) and the University of Education (PH) sponsored by DFG involving speech technology for educational system. (Working language: English or German) Project cooperation partners are: Cooperative State University (Duale Hochschule, Karlsruhe) University of Education, Karlsruhe (Pädagogische Hochschule, Karlsruhe) Karlsruhe Institute of Technology (KIT) Description Starting as soon as possible, we are seeking an experienced and motivated person to join our team of researchers from the above mentioned institutes. The ideal candidate will have knowledge in computational linguistics and algorithm design. Responsibilities include the use and improvement of research tools to update and optimize algorithms applied to diagnostics in children’s (German) writing using speech recognition and speech synthesis tools. For further details of this work, please refer to publications at SLaTE 2011, Interspeech 2011, and WOCCI 2012 by authors Berkling, Stüker, and Fay. Joint and collaborative research between the partners will be very close, offering exposure to each research lab.
Candidates:
Requirements
Desirable:
Application Procedure: Non-EU candidates need to check their ability to reside in Germany. Interested candidates, please send application (CV, certified copies of all relevant diplomas and transcripts, two letters of recommendation , proof of proficiency in English, letter of motivation (research interest, reason for applying to position) to: Berkling@dhbw-karlsruhe.de The Cooperative State University is pursuing a gender equality policy. Women are therefore particularly encouraged to apply. If equally qualified, handicapped applicants will be preferred.
| |||||
6-13 | (2013-02-20) Research Associate in Robust Speech Recognition at the University of Shefield UK -- Research Associate in Robust Speech Recognition
| |||||
6-14 | (2013-02-10) Developer for Large-Scale Audio Indexing Technologies at IRCAM, Paris
Position: Developer for Large-Scale Audio Indexing Technologies: 1 W/M position at IRCAM Starting: March 2013 Duration: 18 months Position description He/she will be in charge of the development of a framework for scalable storage, management and access of distributed data (audio and meta-data). He/she will be also in charge of the development of scalable search algorithms. Required profile: · High skill in database management systems · High skill in scalable indexing technologies (hash-table, m-trees …) · High skill in C++ development (including template-based meta-programming) · Good knowledge of Linux, Mac OSX and Windows development environment (gcc, Intel and MSVC, svn) · High productivity, methodical work, excellent programming style.
The developer will collaborate with the project team and participate in the project activities (evaluation of technologies, meetings, specifications, reports). Introduction to IRCAM IRCAM is a leading non-profit organization associated to Centre Pompidou, dedicated to music production, R&D and education in sound and music technologies. It hosts composers, researchers and students from many countries cooperating in contemporary music production, scientific and applied research. The main topics addressed in its R&D department include acoustics, audio signal processing, computer music, interaction technologies, and musicology. Ircam is located in the center of Paris near the Centre Pompidou, at 1, Place Igor Stravinsky 75004 Paris. Salary According to background and experience Applications Please send an application letter together with your resume and any suitable information addressing the above issues preferably by email to: peeters_at_ircam_dot_fr with cc to vinet_a_t_ircam_dot_fr, roebel_at_ircam_dot_fr.
| |||||
6-15 | (2013-02-14) INRIA PhD fellowship, Bordeaux, F Proposal for an INRIA PhD fellowship (Cordi-S) Title of the proposal: Nonlinear speech analysis for differential diagnosis between Parkinson's disease and Multiple-System Atrophy
Project Team INRIA: GeoStat (http://geostat.bordeaux.inria.fr/) Author of the proposal research subject: Khalid Daoudi (khalid.daoudi@inria.fr) Keywords: speech processing, nonlinear speech analysis, machine learning, voice pathology, dysphonia, dysarthria, Multiple-System Atrophy, Parkinson's disease. Scientific context: Parkinson's disease (PD) is the most common neurodegenerative disorder after Alzheimer's disease. Prevalence is 1.5% of the population over age 65 and affects about 143,000 French. Given the aging of the population, the prevalence is likely to increase over the next decade. Multiple-System Atrophy (MSA) is a rare and sporadic neurodegenerative adult disorder, of progressive evolution and of unknown etiology. The MSA has a prevalence of 2 to 5/100 000 and has no effective treatment. It usually starts in the 6th decade and there is a slight male predominance. It takes 3 years on average from the first signs of the disease for a patient to require a walking aid, 4-6 years to be in a wheelchair and about 8 years to be bedridden. The PD and MSA require different treatment and support. However, the differential diagnosis between PD and MSA is a very difficult task because, at the early stage of the diseases, patients look alike as long as signs, such as dysautonomia, are not more clearly installed for MSA patients. There is currently no valid clinical nor biological marker for clear distinction between the two diseases at an early stage. Goal: Voice and speech disorders in Parkinson's disease is a clinical marker that coincides with a motor disability and the onset of cognitive impairment. Terminology commonly used to describe these disorders is dysarthria [1]. Like PD patients, depending on areas of the brain that are damaged, people with AMS may also have speech disorders: difficulties of articulation, staccato rhythm, squeaky or muted voice. Dysarthria in AMS is more severe and early in the sense that it requires more early rehabilitation compared to PD. Since dysarthria is an early symptom of both diseases, the purpose of this thesis is to use dysarthria, through digital processing of voice recordings of patients as a mean for objective discrimination between PD and MSA. The ultimate goal is to develop a numerical dysarthria measure, based on the analysis of the speech signal of the patients, which allows objective discrimination between PD and MSA and would thus complement the tools currently available to neurologists in the differential diagnosis of the two diseases. Project: Pathological voices, such as in PD and MSA, generally present high non-linearity and turbulence. Nonlinear/turbulent phenomena are not naturally suited to linear signal processing. The latter is however ruling over current speech technology. Thus, from the methodological point of view, the goal of this thesis is to investigate the framework of nonlinear and turbulent systems, which is better suited to analyzing the range of nonlinear and turbulent phenomena observed in pathological voices in general [2], and in PD and MSA voices in particular. We will adopt an approach based on novel nonlinear speech analysis algorithms recently developed in the GeoStat team [3]. The goal being to extract relevant speech features to design new dysarthria measures that enable accurate discrimination between PD and MSA voices. This will also require investigation of machine learning theory in order to develop robust classifiers (to discriminate between PD and MSA voices) and to make correspondence (regression) between speech measures and standard clinical rates. The PhD candidate will actively participate, in coordination with neurologists from the Parkinson's Center of Haut-Lévêque Hospital, to set up the experimental protocol and data collection. The latter will consist in recording patient's voices using DIANA or EVA2 workstation (http://www.sqlab.fr/).
References: [1] Auzou, P.; Rolland, V.; Pinto, S., Ozsancak C. (eds.). Les dysarthries. Editions Solal. 2007. [2] Baghai-Ravary L. ; Beet S.W. Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders. Springer 2013. [3] PhD thesis of Vahid Khanagha. GeoStat team, INRIA Bordeaux-Sud Ouest. January 2013. http://geostat.bordeaux.inria.fr/images/vahid%20khanagha%204737.pdf Advisor: K. Daoudi Duration: 3 years (starting fall 2013) Prerequisites: Good level in signal/speech processing is necessary, as well as Matlab and C/C++ programing. Knowledge in machine learning would be a strong advantage.
| |||||
6-16 | (2013-02-19) Ph.D Student in Speech and Music Communication , KTH Stockholm Sweden Ph.D Student in Speech and Music Communication KTH School of Computer Science and Communication (CSC) announces a PhD position in Speech and Music Communication. The WorkplaceKTH in Stockholm is the largest and oldest technical university in Sweden. No less than one-third of Sweden’s technical research and engineering education capacity at university level is provided by KTH. Education and research spans from natural sciences to all branches of engineering and includes architecture, industrial management and urban planning. There are a total of just over 15,000 first and second level students and more than 1,600 doctoral students. KTH has almost 4,300 employees. KTH Computer Science and Communication is one of the most outstanding research and teaching environments in Information Technology in Sweden with activities at KTH and partly at Stockholm University. We conduct education and research in theoretical computer science, from theory building and analysis of mathematical models to algorithm construction, implementation and simulation. The applied computer science research and education dealing with computer vision, robotics, machine learning, computational biology, neuroinformatics and neural networks, including high performance computing, visualization and speech and music communication. It also conducts applied research and training in media technology, human-computer interaction, interaction design and sustainable development. For more information about CSC, go to www.kth.se/csc. AssignmentKTH School of Computer Science and Communication (CSC) announces a PhD position in Speech and Music Communication at the Department of Speech, Music and Hearing. The thesis work will be directed towards basic research on simulating the human voice through advanced computer models. It comprises theoretical as well as experimental studies of speech production. For more information about the research project: http://www.speech.kth.se/eunison This is a four-year time-limited position that can be extended up to a year with the inclusion of a maximum of 20% departmental duties, usually teaching. Doctoral students must be registered at KTH. Expected starting date: 2013-09-02. EmploymentForm of employment: Time-limited Work time: Full time The salary follows the directions provided by KTH Start date: According to agreement, preferably 2013-09-02. Number of positions: 1 QualificationsThe applicant should, at the time of application or no later than the expected starting date, possess a Master of Science degree in computer science, electrical engineering or engineering physics, or the equivalent. In addition, thorough knowledge in several of the following areas are required: programming, phonetics or speech technology, statistical methods, computer simulations and multi-physics. The applicant should demonstrate a high proficiency in both written and spoken English. Applicants must be strongly motivated for doctoral studies, possess the ability to work independently and perform critical analysis as well as possess good levels of cooperative and communicative abilities. ApplicationApplication deadline: March 22, 2013 Employer's reference: D-2013-0107 Applications via email are to be sent to: Camilla Johansson, e-mail jobs@csc.kth.se. Write reference number in the email subject. (CV, etc. should be sent as an attachment, as pdf-files.) We also accept hard copy applications sent to: KTH, CSC Att. Camilla Johansson, Lindstedtsvägen 3, 4th floor, SE-100 44 Stockholm, Sweden ApplicationThe application must be written in English and contain:
Applicants are kindly ask to also fill in the form at http://www.speech.kth.se/eunison/phd-applicants.html We are currently gathering information to help improve our recruitment process. We would, therefore, be very grateful if you could include an answer to the following question within your application: Where did you initially come across this job advertisement? Contact(s)For enquiries about Ph.D studies and employment conditions please contact: Eva-Lena Åkerman, HR Manager Phone: +46 8 790 91 06 Email: ela@csc.kth.se For enquiries about the project please contact: Olov Engwall, Professor in Speech Communication Telephone: +468-790 75 35 e-mail: engwall@kth.se Union representativeLars Abrahamsson, SACO Phone: +46 8 790 7058 Email: lars.abrahamsson@ee.kth.se
| |||||
6-17 | (2013-02-19) Poste de Maître de conférences en informatique est ouvert à l'Université Paris-Sud, à destination de l'IUT d'Orsay. France Un poste de Maître de conférences en informatique est ouvert au concours à l'Université Paris-Sud, à destination de l'IUT d'Orsay.
| |||||
6-18 | (2013-02-25) Post-doc de durée de 12 mois, INRIA-LORIA, Nancy, France Dans le cadre du projet ANR ContNomina(2013-2016), nous proposons un post-doc de durée de 12 mois, financé par ce projet : Détection des noms propres dans des transcriptions automatiques de la parole en FrançaisBien que la reconnaissance d’entités nommées en anglais offre des performances excellentes, ce n’est pas le cas pour les autres langues, dans les domaines d’application avec peu de données d’apprentissage, et sur des transcriptions automatiques. Le travail demandé est donc de proposer des solutions permettant de : (i) exploiter les mesures de confiance du système de reconnaissance et le contexte lexical afin de localiser les noms propres, qu’ils soient présents ou non dans le lexique de transcription ; et (ii) pallier l’absence ou à la rareté des données d’apprentissage, ce qui donnera au système développé un caractère incrémental et auto-adaptatif très important pour envisager une valorisation à long terme des résultats, au-delà de la durée du projet lui-même. Les modèles bayésiens génératifs forment un cadre théorique intéressant à explorer pour résoudre ce défi.
Personnes à contacter : Irina Illina, Responsable du projet ANR ContNomina, INRIA-LORIA, Nancy, équipe Parole, tel 03 54 95 84 90, illina@loria.fr Christophe Cerisara, INRIA-LORIA, Nancy, équipe Sinalp, tel 03 54 95 86 25, cerisara@loria.fr
| |||||
6-19 | (2013-02-21) 4 positions for doctoral students, University of Gothenburg, Sweden My department is now announcing four (not much, but still) positions for doctoral students. The positions mean four years fully financed (full time salary on the order of € 2500 per month), office space, computer and other technical resources necessary to do the job. You will find more information here: Anders Eriksson, MSc, PhD.
| |||||
6-20 | (2013-02-22) Post-doc at François Rabelais University, Tours (France) Job offer: Post-doc
| |||||
6-21 | (2013-02-22) Researcher/PostDoc/Intern positions at Telfonica Research Barcelona, Spain We are seeking candidates for Researcher/PostDoc/Intern positions to
strengthen and complement our efforts in the areas we currently work
on:
- Distributed systems and networking
- Human&computer interaction
- Mobile computing
- Multimedia analysis
- Speech processing
- Recommender systems
- User modeling and machine learning
- Security and privacy
The Telefonica Digital Research group was created in 2006 and follows
an open research model in collaboration with Universities and other
research institutions, promoting the dissemination of scientific
results both through publications in top-tier peer-reviewed
international journals and conferences and technology transfer. Our
multi-disciplinary and international team comprises more than 20 full
time researchers, holding PhD degrees in various disciplines of
computer science and electrical engineering.
The salaries we offer are competitive and will depend upon the
candidate's experience. We also offer great benefits and a stimulating
and friendly working atmosphere in one of the most vibrant cities in
the world, Barcelona (Spain).
You can find more information about the group here:
http://www.tid.es/en/Research/Pages/TIDResearchHome.aspx
To apply for a position at Telefonica Research Barcelona, please send
an e-mail with your cv and research statement to:
careers_research@tid.es
Applications submitted by March 10, 2013 will receive full
consideration, although we will continue to accept applications after
this date until all positions are filled up.
| |||||
6-22 | (2013-03-01) Maître de Conférences en Informatique: Traduction automatique statistique, Le Mans, France Poste de Maître de Conférences en Informatique
| |||||
6-23 | (2013-03-02) Researcher/PostDoc/Intern positions at Telefonica, Barcelona, Spain We are seeking candidates for Researcher/PostDoc/Intern positions to
strengthen and complement our efforts in the areas we currently work
on:
- Distributed systems and networking
- Human&computer interaction
- Mobile computing
- Multimedia analysis
- Speech processing
- Recommender systems
- User modeling and machine learning
- Security and privacy
The Telefonica Digital Research group was created in 2006 and follows
an open research model in collaboration with Universities and other
research institutions, promoting the dissemination of scientific
results both through publications in top-tier peer-reviewed
international journals and conferences and technology transfer. Our
multi-disciplinary and international team comprises more than 20 full
time researchers, holding PhD degrees in various disciplines of
computer science and electrical engineering.
The salaries we offer are competitive and will depend upon the
candidate's experience. We also offer great benefits and a stimulating
and friendly working atmosphere in one of the most vibrant cities in
the world, Barcelona (Spain).
You can find more information about the group here:
To apply for a position at Telefonica Research Barcelona, please send
an e-mail with your cv and research statement to:
Applications submitted by March 10, 2013 will receive full
consideration, although we will continue to accept applications after
this date until all positions are filled up.
| |||||
6-24 | (2013-03-03) Six positions at Nuance, Vienna, Austria
Nuance Healthcare, a division of Nuance Communications, is the market leader in providing clinical understanding solutions that accurately capture and transform the patient story into meaningful, actionable information. Thousands of hospitals, providers and payers worldwide trust Nuance speech-enabled clinical documentation and analytics solutions to facilitate smarter, more efficient decisions across the healthcare enterprise. These solutions are proven to increase clinician satisfaction and HIT adoption, supporting organizations to achieve Meaningful Use of EHR systems and transform to the accountable care model. Nuance Healthcare has been recognized as “Best-in-KLAS” 2004-2012 for Speech Recognition.
Research Scientist (m/f) – Computational Linguist NLP Preferred Vienna / Austria As a Research Scientist - Computational Linguist - you will be part of the Healthcare Automatic Speech Recognition Research team. You will work on research and development of algorithms, resources and methods to support data collection and improve accuracy of Nuance healthcare products.
Your Task.
Your Profil.
Preferred
Our offer.
We offer a competitive compensation package and a casual yet technically challenging work environment. Join our dynamic, entrepreneurial team and become part of our fast growing track of continuing success. Nuance is an Equal Opportunity Employer.
Does Nuance speak to you?
Please apply via our Recruiting tool on our homepage https://jobs-nuance.icims.com/jobs/9447/job or via EMEAjobs@nuance.com reference number 9447-Research Scientist - Comp Linguist NLP. Please provide CV, supporting documents and letter of motivation including preferred country, start date and salary expectations.
For more information visit us www.nuance.com. °°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° Research Scientist (m/f) – Language Modeling Preferred Vienna / Austria As a Research Scientist you will be part of the Healthcare Automatic Speech Recognition Research team. You will work on research and development of ASR algorithms, resources and methods to improve accuracy and performance of Nuance healthcare products.
Your Task.
Your Profil.
Preferred
Our offer.
We offer a competitive compensation package and a casual yet technically challenging work environment. Join our dynamic, entrepreneurial team and become part of our fast growing track of continuing success. Nuance is an Equal Opportunity Employer.
Does Nuance speak to you?
Please apply via our Recruiting tool on our homepage https://jobs-nuance.icims.com/jobs/9446/job or via EMEAjobs@nuance.com reference number 9446-Research Scientist - LM. Please provide CV, supporting documents and letter of motivation including preferred country, start date and salary expectations.
For more information visit us www.nuance.com.
°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° Speech Scientist (m/f) - Linguist Preferred Vienna / Austria As a Speech Scientist - (Computer) Linguist - you will be part of the Healthcare Automatic Speech Recognition Research team. You will work on research and development of resources, algorithms and methods to improve accuracy and performance of Nuance healthcare products.
Your Task.
Your Profil.
Preferred
Our offer.
We offer a competitive compensation package and a casual yet technically challenging work environment. Join our dynamic, entrepreneurial team and become part of our fast growing track of continuing success. Nuance is an Equal Opportunity Employer.
Does Nuance speak to you?
Please apply via our Recruiting tool on our homepage https://jobs-nuance.icims.com/jobs/9445/job or via EMEAjobs@nuance.com reference number 9445-Speech Scientist Linguist. Please provide CV, supporting documents and letter of motivation including preferred country, start date and salary expectations.
For more information visit us www.nuance.com.
°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°
Research Scientist (m/f) – Acoustic Modeling Preferred Vienna / Austria As a Research Scientist you will be part of the Healthcare Automatic Speech Recognition Research team. You will work on research and development of ASR algorithms, resources and methods to improve accuracy and performance of Nuance healthcare products.
Your Task.
Your Profil.
Preferred
Our offer.
We offer a competitive compensation package and a casual yet technically challenging work environment. Join our dynamic, entrepreneurial team and become part of our fast growing track of continuing success. Nuance is an Equal Opportunity Employer.
Does Nuance speak to you?
Please apply via our Recruiting tool on our homepage https://jobs-nuance.icims.com/jobs/9444/job or via EMEAjobs@nuance.com reference number 9444-Research Scientist - AM. Please provide CV, supporting documents and letter of motivation including preferred country, start date and salary expectations.
For more information visit us www.nuance.com. °°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°
Research Scientist (m/f) - Acoustic Modeling Preferred Vienna / Austria As a Research Scientist you will be part of the Healthcare Automatic Speech Recognition Research team. You will work on research and development in area of acoustic modeling to improve accuracy and performance of Nuance healthcare products.
Your Task.
Your Profil.
Preferred
Our offer.
We offer a competitive compensation package and a casual yet technically challenging work environment. Join our dynamic, entrepreneurial team and become part of our fast growing track of continuing success. Nuance is an Equal Opportunity Employer.
Does Nuance speak to you?
Please apply via our Recruiting tool on our homepage https://jobs-nuance.icims.com/jobs/9443/job or via EMEAjobs@nuance.com reference number 9443-Research Scientist AM. Please provide CV, supporting documents and letter of motivation including preferred country, start date and salary expectations.
For more information visit us www.nuance.com. °°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°
Research Scientist (m/f) – Automatic Speech Recognition Preferred Vienna / Austria As a Research Scientist you will be part of the Healthcare Automatic Speech Recognition Research team. You will work on research and development of ASR algorithms, resources and methods to improve accuracy and performance of Nuance healthcare products.
Your Task.
Your Profil.
or
Preferred Skills:
Our offer.
We offer a competitive compensation package and a casual yet technically challenging work environment. Join our dynamic, entrepreneurial team and become part of our fast growing track of continuing success. Nuance is an Equal Opportunity Employer.
Does Nuance speak to you?
Please apply via our Recruiting tool on our homepage https://jobs-nuance.icims.com/jobs/8047/job or via EMEAjobs@nuance.com reference number 8047-Research Scientist - ASR. Please provide CV, supporting documents and letter of motivation including preferred country, start date and salary expectations.
For more information visit us www.nuance.com.
ILONA ALEXANDRA HOLTZ Recruiter - Employment Specialist DACH Human Resources Nuance Communications Deutschland GmbH Site Ulm Soeflingerstr. 100 D-89077 Ulm, Germany Fon +49 731 - 379 50 1166 Fax +49 731 - 379 50 1106 (Zentrale) Mobil +49 170 56 15 235
WWW.NUANCE.COM The experience speaks for itself ™ Geschäftsführung/Director: Caroline Curtis, Todd Michael DuChene, Thomas L. Beaudoin Sitz der Gesellschaft/Registered Office: Aachen Registergericht/Court of Registration: Aachen Reg. Nr.: HRB 16313 USt-ID/VAT: DE 264500438 This electronic transmission and any files transmitted with it are confidential. It is transmitted for the sole use of the person(s) to whom it is addressed. Any further distribution or copying is prohibited. If you receive this message in error, please inform the sender immediately, do not use it or disclose its contents and delete it from your system. Please note that Nuance cannot guarantee that the transmission will be secure or error-free.
Experience Nuance in the web: DragonDrive! http://www.youtube.com/watch?v=wYIwVP2JqL4 DragonNaturallySpeaking - Say It Your Way! http://www.youtube.com/watch?v=RkiYr8aw5pE or meet me on XING or LinkedIn
| |||||
6-25 | (2013-03-13) Speech Data Evaluator for French at Google Dublin IrelandSpeech Data Evaluator for FrenchJob title: Speech Data Evaluator for French (multiple positions) In Dublin. Job description: As a Speech Data Evaluator and a native speaker of French, you will be part of a team based in Dublin, processing large amounts of linguistic data and carrying out a number of tasks to improve the quality of Google’s speech synthesis and speech recognition in your own language. This includes:
Job requirements:
Project duration: 6-9 months (with potential for extension) For immediate consideration, please email your CV and cover letter in English (PDF format preferred) with 'Speech Data Evaluator French' in the subject line. Email Address for applications: DataOpsMan@gmail.com Contact information: Linne Ha Closing date: open until filled
| |||||
6-26 | (2013-03-15) Offre de post-doc en Traitement Automatique du Langage au LIUM Le Mans France Offre de post-doc au LIUM
| |||||
6-27 | (2013-03-20) Post-doc job proposal, LIMSI, France Postdoctoral position at LIMSI, France
| |||||
6-28 | (2013-05-01) Two positions at CSTR at the University of Edinburgh Scotland UK1. Marie Curie Research Fellow in Speech Synthesis and Speech Perception 'Using statistical parametric speech synthesis to investigate speech perception' The Centre for Speech Technology Research (CSTR) University of Edinburgh This is a rare opportunity to hold a prestigious individual fellowship in a world-leading research group at a top-ranked University, mentored by leading researchers in the field of speech technology. Marie Curie Experienced Research Fellowships are aimed at the most talented newly-qualified postdoctoral researchers, who have the potential to become leaders in their fields. This competitively salaried fellowship offers an outstanding young scientist the opportunity to kick-start his or her independent research career in speech technology, speech science or laboratory phonetics. This fellowship is part of the INSPIRE Network (http://www.inspire-itn.eu) and the project that the CSTR Fellow will spearhead involves developing statistical parametric speech synthesis into a toolbox that can be used to investigate issues in speech perception and understanding. There are excellent opportunities for collaborative working and joint publication with other members of the network, and generous funding for travel to visit partner sites, and to attend conferences and workshops. The successful candidate should have a PhD (or be near completion) in computer science, engineering, linguistics, mathematics, or a related discipline. He or she should have strong programming skills and experience with statistical parametric speech synthesis, as well as an appropriate level of ability and experience in machine learning. The fellowship is fixed term for 12 month (to start as soon as possible). CSTR is a successful and well-funded group, and so there are excellent prospects for further employment after the completion of the fellowship. The Marie Curie programme places no restrictions on nationality: applicants can be of any nationality and currently resident in any country worldwide, provided they meet the eligibility requirements set out in the full job description (available online - URL below). Salary: GBP 42,054 to GBP 46,731 plus mobility allowance Informal enquiries about this position should be made to Prof Simon King (Simon.King@ed.ac.uk) or Dr Cassie Mayo (catherin@inf.ed.ac.uk). Apply online: https://www.vacancies.ed.ac.uk/pls/corehrrecruit/erq_jobspec_version_4.jobspec?p_id=013062 Closing date: 10 Jun 2013 2. An Open Position for Postdoctoral Research Associate in Speech Synthesis The Centre for Speech Technology Research (CSTR) University of Edinburgh This post holder will contribute to our ongoing research in statistical parametric ('HMM-based') speech synthesis, working closely with Principal Investigators Dr. Junichi Yamagishi and Prof. Simon King, in addition to other CSTR researchers. The focus of this position will be to conduct research into methods for generating highly intelligible synthetic speech, for a variety of applications, in the context of three ongoing and intersecting projects in CSTR: The 'SALB' project concerns the generation of extremely fast, but highly intelligible, synthetic speech for blind children. This is a joint project with the Telecommunications Research Centre Vienna (FTW) in Austria, and is funded by the Austrian Federal Ministry of Science and Research. The 'Voice Bank' project concerns the building of synthetic speech using a very large set of recordings of amateur speakers (‘voice donors’) in order to produce personalised voices for people whose speech is disordered, due to Motor Neurone Disease. This is a joint project with the Euan MacDonald Centre for MND research, and is funded by the Medical Research Council. The main tasks will be to conduct research into automatic intelligibility assessment of disordered speech and to devise automatic methods for data selection from the large voice bank. The 'Simple4All' project is a large multi-site EU FP7 project led by CSTR which is developing methods for unsupervised and semi-supervised learning for speech synthesis, in order to create complete text-to-speech systems for any language or domain without relying on expensive linguistic resources, such as labelled data. The main tasks here will be to further the overall goals of the project, including contributing original research ideas. There is considerable flexibility in the research directions available within the Simple4All project and the potential for the post holder to form a network of international collaborators. The successful candidate should have a PhD (or be near completion) in computer science, engineering, linguistics, mathematics, or a related discipline. He or she should have strong programming skills and experience with statistical parametric speech synthesis. Whilst the advertised position is for 24 months (due to the particular projects that the post-holder will contribute to), CSTR is a stable, well-funded and successful group with a tradition of maintaining long-term support for ongoing lines of research and of building the careers of its research fellows. We expect to obtain further grant-funded research projects in the future. Informal enquiries about this position to either Dr. Junichi Yamagishi (jyamagis@inf.ed.ac.uk) or Prof. Simon King (Simon.King@ed.ac.uk). Apply Online: https://www.vacancies.ed.ac.uk/pls/corehrrecruit/erq_jobspec_version_4.jobspec?p_id=013063 Closing date: 10 Jun 2013
| |||||
6-29 | (2013-05-01) Ph D or post doc at University of Karlsruhe Germany
A job opening is to be filled as soon as possible as part of the 'Deutschen Forschungsgemeinschaft (DFG)' sponsored project at the Department of Computer Science for the duration of up to 18 months at 50% of employment at the Cooperative State University in Karlsruhe, Germany, for a Ph.D. Research Assistant
or Post-Doctoral Researcher in the field of Automatic Language Processing for Education arch A This job opening is in the field of automatic speech recognition as part of a joint research project between Karlsruhe Institute of Technology (KIT), the Cooperative State University (DHBW) and the University of Education (PH) sponsored by DFG involving speech technology for educational system. (Working language: English or German) Description Starting as soon as possible, we are seeking an experienced and motivated person to join our team of researchers from the above mentioned institutes. The ideal candidate will have knowledge in computational linguistics and algorithm design. Responsibilities include the use and improvement of research tools to update and optimize algorithms applied to diagnostics in children’s (German) writing using speech recognition and speech synthesis tools. For further details of this work, please refer to publications at SLaTE 2011, Interspeech 2011, and WOCCI 2012 by authors Berkling, Stüker, and Fay. Joint and collaborative research between the partners will be very close, offering exposure to each research lab. Candidates : - Doctoral Research Candidates may apply and are welcome for joint research with their host institution. - Experienced (post-doctoral) Research Candidates are already in possession of a doctoral degree or have at least 3 years but less than 5 years of research experience in engineering and/or hearing research. Requirements - Higher degree in speech science, linguistics, machine learning, or related field - Experience developing ASR applications - training, tuning, and optimization - Software development experience (for example: Perl, TCL, Ruby, Java, C) - Excellent communication skills in English - Willingness and ability to spend 18 months in Germany, working in a team with project partners - Knowledge of German linguistics, phonics, graphemes, morphology or willingness to learn - Strong interest in computational linguistics, morphology, phonics for German Desirable : - Interest in Education and language learning - Interest in Human Computer Interaction and game mechanics - Ability to create Graphic interfaces in multi-player applications - Working with Ruby on Rails The job will allow for interesting work within a modern and well equipped environment in the heart of Karlsruhe. The salary level, depending on your circumstances, will be in line with the 13 TV-L tarrifs. KIT and the Cooperative State University are pursuing a gender equality policy. Women are therefore particularly encouraged to apply. If equally qualified, handicapped applicants will be preferred (please submit your paperwork accordingly). Non-EU candidates need to check their ability to reside in Germany. Interested candidates, please send application (CV, certified copies of all relevant diplomas and transcripts, two letters of recommendation, proof of proficiency in English, letter of motivation (research interest, reason for applying to position) with notification of the job-number to be received on or before April 26, 2013. Send electronic application to: berkling@dhbw-karlsruhe.de. Questions about details of the job can be directed to berkling@dhbw-karlsruhe.de.
| |||||
6-30 | (2013-05-01) Thèse à Paris TechAttention : le dossier de candidature complet devra être soumis sur le site de l’EDITE au plus tard le 22 mai Sujet de thèse : Traitement du contenu verbal et analyse des sentiments dans les systèmes d’interactions humain-agent Proposé par : Chloé Clavel Directeur de thèse: Catherine Pelachaud Encadrant : Chloé Clavel Unité de recherche: UMR 5141 Laboratoire Traitement et Communication de l'Information Domaine: Département Traitement du Signal et des Images Secteur: Traitement Automatique du Langage Naturel, Dialogue Homme-Machine Thématique P: Signal Image SHS Financement : bourse EDITE (voir modalités http://edite-de-paris.fr/spip/spip.php?article172) Personnes à contacter : chloe.clavel@telecom-paristech.fr catherine.pelachaud@telecom-paristech.fr **Projet Le domaine du sentiment analysis et de l’opinion mining est un domaine en plein essor avec l’arrivée en masse de données textuelles sur le web comportant des expressions d’opinions par les citoyens (critiques de films, débats sur les commentaires de forums, tweets) (El-Bèze et al. 2010)). Les recherches en traitement automatique des langues se mobilisent sur le développement de méthodes de détection d’opinion dans les textes en s’appuyant sur ces nouvelles ressources. La diversité des données et des applications industrielles faisant appel à ces méthodes multiplient les défis scientifiques à relever avec, notamment, la prise en compte des différents contextes d’énonciation (e.g., contexte social et politique, personnalité du locuteur) et la définition du phénomène d’opinion à analyser en fonction du contexte applicatif. Ces méthodes d’analyse des sentiments dans les textes s’étendent également depuis peu à l’oral en passant par l’analyse des transcriptions automatiques issues de systèmes de reconnaissance automatique de la parole pour des problématiques d’indexation d’émissions radiophoniques ou de centres d’appels (Clavel et al., 2013), et peuvent être ainsi corrélées aux méthodes d’analyse acoustique/prosodique des émotions (Clavel et al., 2010). Autre domaine scientifique en plein essor, celui des agents conversationnels animés (ACA) fait intervenir des personnages virtuels intéragissant avec l’humain. Les ACA peuvent prendre un rôle d’assistant comme les agents conversationnels présents sur les sites de vente (Suignard, 2010), de tuteur dans le cadre des Serious Games (Chollet et al. 2012) ou encore de partenaire dans le cadre des jeux vidéos. Le défi scientifique majeur pour ce domaine est l’intégration, au sein de l’ACA, de la composante affective de l’interaction. Il s’agit d’une part de prendre en compte les comportements affectifs et des attitudes sociales de l’humain et d’autre part de les générer de façon pertinente. Nous proposons pour cette thèse de travailler sur la détection des opinions et des sentiments dans un contexte d’interaction multimodale de l’humain avec un agent conversationnel animé, sujet jusqu'à maintenant peu étudié par la “communauté agent”. En effet, d’un côté, les ACA réagissent à des contenus émotionnels essentiellement non verbaux (Schröder et al., 2011) et de l’autre côté, les ACA “assistant” réagissent à partir des contenus verbaux informatif (Suignard, 2010) sans prendre en compte les opinions ou les sentiments exprimés par l’utilisateur. Des premières études ont été réalisées sur la reconnaissance de l’affect dans le langage dans un contexte d’interaction avec un agent (Osherenko et al., 2009) mais celles-ci restent envisagées indépendamment de la stratégie de dialogue. Les développements de la thèse s’intègreront dans la plateforme GRETA qui repose sur l’architecture SAIBA, une architecture globale unifiée développée par la “communauté agent” pour la génération de comportements multimodaux (Niewiadomski et al., 2011). Greta permet de communiquer avec l’humain en générant chez l’agent une large palette de comportements expressifs verbaux et non verbaux (Bevacqua et al., 2012). Elle peut simultanément montrer des expressions faciales, des gestes, des regards et des mouvements de têtes. Cette plateforme a notamment été intégrée dans le cadre du projet SEMAINE avec le développement d’une architecture temps-réel d’interaction humain-agent (Schröder et al., 2011) qui inclut des analyses acoustiques et vidéos, un système de gestion du dialogue et, du côté de la synthèse, le système Text To Speech OpenMary et l’agent virtuel de la plateforme GRETA. A l’instar de ce projet, la détection d’opinions et de sentiments envisagée dans la thèse interviendra en entrée des modèles d’interactions multi-modaux de la plateforme. La stratégie de dialogue multimodale associée à ces entrées relatives au contenu verbal devra être définie et intégrée dans la plateforme GRETA. **Enjeux La thèse portera sur le développement conjoint de méthodes de détection des opinions et des sentiments et de stratégies de dialogue humain-agent. Les méthodes envisagées sont des méthodes hybrides mêlant apprentissage statistique et règles expertes. Pour les stratégies de dialogue, le doctorant pourra s’appuyer sur les travaux réalisés dans le cadre du moteur de dialogue DISCO (Rich et al., 2012) et du moteur développé dans le projet Semaine(Schröder et al., 2011). Les méthodes développées pourront également s’appuyer sur des analyses de corpus humain-humain ou de type Magicien d’Oz (McKeown et al., 2012) et un protocole d’évaluation de ces méthodes devra être mis en place. En particulier, pour répondre à cet objectif, la thèse devra aborder les problématiques suivantes: - la définition des types d’opinions et de sentiments pertinents à considérer en entrée du moteur de dialogue. Il s’agira d’aller au-delà delà de la distinction classique entre opinions positives et opinions négatives, peu pertinente dans ce contexte, en s’appuyant sur les modèles issus de la psycholinguistique (Martin and White, 2007); - l’identification des marqueurs lexicaux, syntaxiques, sémantiques et dialogiques des opinions et des sentiments; - la prise en compte du contexte d’énonciation: les règles implémentées pourront intégrer différentes fenêtres d’analyse : la phrase, le tour de parole et les tours de paroles antérieurs; - la prise en compte des problématiques temps-réel de l’interaction : des stratégies de dialogues seront définies en fonction des différentes fenêtres d’analyse afin de proposer des stratégies d’interactions à différents niveaux de réactivité. Par exemple, certains mots-clés pourront être utilisés comme déclencheurs de backchannel en temps réels et la planification des comportements de l’agent pourra être ajustée au fur et à mesure de l’avancement de l’interaction. **Ouverture à l’international: Ces travaux de thèse interviennent en complémentarité des travaux réalisés sur les interactions non verbales dans le cadre du projet européen FP7 TARDIS prenant comme application les Serious games dans le cas d’un entrainement à l’entretien d’embauche (http://tardis.lip6.fr/presentation) et des travaux réalisés sur le traitement des signaux sociaux dans le cadre du réseau d’excellence SSPNET (http://sspnet.eu/) Une collaboration avec Candy Sidner, professeur au département Computer Science du Worcester Polytechnic Institute et experte en modèles computationnels d’intéractions verbales et non verbales et à l’origine du moteur de dialogue DISCO (Richet et al. 2012) sera également mise en place. **Références: E. Bevacqua, E. de Sevin, S.J. Hyniewska, C. Pelachaud (2012), A listener model: Introducing personality traits, Journal on Multimodal User Interfaces, special issue Interacting ECAs, Elisabeth André, Marc Cavazza and Catherine Pelachaud (Guest Editors), 6:27–38, 2012. M. Chollet, M. Ochs and C. Pelachaud (2012), Interpersonal stance recognition using non-verbal signals on several time windows, Workshop Affect, Compagnon Artificiel, Interaction, Grenoble, November 2012, pp. 19-26 C. Clavel and G. Richard (2010). Reconnaissance acoustique des émotions, Systèmes d’interactions émotionnelles, C. Pelachaud, chapitre 5, 2010 C. Clavel, G. Adda, F. Cailliau, M. Garnier-Rizet, A. Cavet, G. Chapuis, S. Courcinous, C. Danesi, A-L. Daquo, M. Deldossi, S. Guillemin-Lanne, M. Seizou, P. Suignard (2013). Spontaneous Speech and Opinion Detection: Mining Call -centre Transcripts. In Language Resources and Evaluation, avril 2013. M. El-Bèze, A. Jackiewicz, S. Hunston, Opinions, sentiments et jugements d’évaluation, Revue TAL 2010, Volume 51 Numéro 3. J.R. Martin , P.R.R. White (2007) Language of Evaluation: Appraisal in English, Palgrave Macmillan, Novembre 2007 G. McKeown, M. Valstar, R. Cowie, R., M. Pantic, M. Schroder (2012) The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent, IEEE Transactions on Affective Computing, Volume: 3 , Issue: 1, Page(s): 5- 17, Jan.-March 2012 R. Niewiadomski, S. Hyniewska, C. Pelachaud (2011), Constraint-Based Model for Synthesis of Multimodal Sequential Expressions of Emotions, IEEE Transactions of Affective Computing, vol. 2, no. 3, 134-146, Juillet 2011. A. Osherenko, E. Andre, T. Vogt (2009), Affect sensing in speech: Studying fusion of linguistic and acoustic features, International Conference on Affective Computing and Intelligent Interaction and Workshops, 2009 C. Rich, C. L. Sidner (2012), Using Collaborative Discourse Theory to Partially Automate Dialogue Tree Authoring. IVA 2012: 327-340 M. Schröder, E. Bevacqua, R. Cowie, F. Eyben, H. Gunes, D. Heylen, M.ter Maat, G. McKeown, S. Pammi, M. Pantic, C. Pelachaud, B. Schuller, E. de Sevin, M.l Valstar, and M. Wöllmer (2011), Building Autonomous Sensitive Artificial Listeners, IEEE Transactions of Affective Computing, pp. 134-146, Octobre 2011. P. Suignard, (2010) NaviQuest : un outil pour naviguer dans une base de questions posées à un Agent Conversationnel, WACA, Octobre 2010
| |||||
6-31 | (2013-05-01) Ph D Visual articulatory biofeedback for speech therapy Grenoble France http://www.gipsa-lab.grenoble-inp.fr/transfert/propositions/1_2013-05-01_These_arc_retour_visuel_orthophonie.pdf
Offre de thèse financée. Retour articulatoire visuel pour l'aide à la rééducation des troubles de la parole. Keywords : speech technology, 3D avatar, machine learning, augmented reality, speech therapy.
-- Pierre BADIN, DR2 CNRS Dept Parole & Cognition (ex ICP), GIPSA-lab, UMR 5216, CNRS – Grenoble University Address : GIPSA-lab / DPC, ENSE3, Domaine universitaire, 11 rue des Mathématiques, BP 46 - 38402 Saint Martin d’Hères cedex, France Email: Pierre.Badin@gipsa-lab.grenoble-inp.fr, Web site: http://www.gipsa-lab.inpg.fr/~pierre.badin Fax: Int + 33 (0)476.57.47.10 - Tel: Int + 33 (0)476.57.48.26
| |||||
6-32 | (2103-05-01) Open positions for Research Engineers in Speech and Language Cambridge UK Positions description: Open positions for Research Engineers in Speech and Language Technology The Speech Technology Group at Toshiba Cambridge Research Lab (STG-CRL), in Cambridge UK, is looking for talented individuals to lead and contribute to our ongoing research in Statistical Speech and Language Processing, in specific areas such as Speech Recognition, Statistical Spoken Dialog and Speech Synthesis. The lab in Cambridge, in collaboration with other Toshiba groups and speech laboratories in China and in Japan, covers all aspects of speech technology and at many levels: from basic and fundamental research to industrial development. We support our researchers in building their career by providing them with the freedom to publish their results and by investing on innovation and creation for addressing real problems in speech and language technology. STG-CRL has also strong connections with EU Universities and especially with the Cambridge University Engineering Department. Outstanding PhD-level candidates at all levels of experience are encouraged to apply. Candidates should be highly motivated, team-oriented and should have the ability to work independently. Strong mathematical background and excellent knowledge in statistics are required. Very good programming skills are desired. Especially for the team leaders, researchers with a solid research track, senior level and international research experience will be considered. The Toshiba Cambridge Research Lab is located in the Science Park of the university city of Cambridge. To apply send your CV and a covering letter to stg-jobs@crl.toshiba.co.uk Informal enquiries about the open positions to Prof. Yannis Stylianou (yannis.stylianou@crl.toshiba.co.uk) Closing date for applications is June 30st 2013 (or until posts filled).
| |||||
6-33 | (2103-05-01) PhD student Learning Pronunciation Variants in a Foreign Language (full time) Radboud University Nijmegen, The Netherlands PhD student Learning Pronunciation Variants in a Foreign Language (full time) Faculty of Arts, Radboud University Nijmegen, The Netherlands Vacancy number: 23.12.13 Closing date: 24 May 2013
Responsibilities As a PhD student in this project you will investigate the interplay between exemplars and abstract representations, which is expected to vary with processing speed and experimental task, and to evolve during learning. The student will investigate these issues with behavioural experiments investigating how native speakers of Dutch learn pronunciation variants of French words with schwa deletion. Learning a foreign language implies learning pronunciation variants of words in that language. This includes the words' reduced pronunciation variants, which contain fewer and weaker sounds than the words' canonical variants (e.g. 'cpute' for English 'computer'), and which are highly frequent in casual conversations. The learner has to build mental representations (exemplars and possibly also abstract lexical representations) for these variants. Importantly, late learners will build representations that differ significantly from native listeners' representations, since reduction patterns in their native language will shape their interpretation of reduction patterns in the foreign language. The goal of this Vici project is to develop the first, fully specified, theory of how late learners of a foreign language build mental representations for pronunciation variants in that language. The dissertation will consist of an introduction, at least three experimental chapters that have been submitted to high impact international journals, and a General Discussion.
What we expect from you · You have or shortly expect to obtain a Master's degree in a field related to speech processing, such as phonetics, linguistics, psychology-, or cognitive neuroscience; · you have an excellent written and spoken command of English; · you have demonstrable knowledge of data analysis; · you preferably have knowledge of the phonetics / phonology of French; · you preferably have knowledge of the phonetics / phonology of Dutch.
What we have to offer We offer you: - full time employment at the Faculty of Arts, Radboud University Nijmegen - in addition to the salary: an 8% holiday allowance and an 8.3% end-of-year bonus; - the starting salary will amount to €2,042 gross per month on a full-time basis; the salary will increase to €2,612 gross per month on a full-time basis in the fourth year (salary scale P); - you will be appointed for an initial period of 18 months, after which your performance will be evaluated; - if the evaluation is positive, the contract will be extended by 2 years (on the basis of a 38-hour working week); - you will be classified as a PhD student (promovendus) in the Dutch university job-ranking system (UFO).
Further information - On the research group Speech Comprehension: http://www.ru.nl/speechcomprehension - On the project leader: http://mirjamernestus.nl - Or contact Prof. dr. Mirjam Ernestus, leader of the Vici project, telephone: +31 24 3612970, E-mail: m.ernestus@let.ru.nl
Applications It is Radboud University Nijmegen's policy to only accept applications by e-mail. Please send your application, including your letter of motivation, curriculum and transcripts of your university grades and stating vacancy number 23.12.13, to vacatures@let.ru.nl, for the attention of Mr drs. M.J.M. van Nijnatten, before 24 May 2013.
| |||||
6-34 | (2013-05-01) PhD position with scholarship - Silent speech interface GIPSA-lab, Grenoble, France Available PhD position with scholarship - Silent speech interface GIPSA-lab, Grenoble, France Incremental speech synthesis for a real-time silent speech interface Context : The design of a silent speech interface, i.e. a device allowing speech communication without the necessity of vocalizing the sound, has recently received considerable attention from the speech research community [1]. In the envisioned system, the speaker articulates normally but does not produce any audible sound. Application areas are in the medical field, as an aid for larynx-cancer patients, and in the telecommunication sector, in the form of a “silent telephone”, which could be used for confidential communication, or in very noisy environments. In [2], we have shown that ultrasound and video imaging can be efficiently combined to capture the articulatory movements during silent speech production; the ultrasound transducer and the video camera are placed respectively beneath the chin and in front of the lips. At present, our work focused mainly on the estimation of the target spectrum from the visual articulatory data (using artificial neural network, Gaussian mixture regression and hidden Markov modeling). The other challenging issue concerns the estimation of acceptable prosodic patterns (i.e. the intonation of the synthetic voice) from silent articulatory data only. To address this ill-posed problem, one solution consists of splitting the mapping process into two consecutive steps: (1) a visual speech recognition step which estimates the most likely sequence of word given the articulatory observations, and (2) a text-to-speech (TTS) synthesis step which generates the audio signal from the decoded word sequence. In that case, the target prosodic pattern is derived from the linguistic structure of the decoded sentence. The major drawback of this mapping method is that it cannot run in real-time. In fact, if the visual speech recognition step can be done online (i.e. words are decoded a short amount of time after they have been pronounced), standard TTS systems need to know the entire sentence to estimate the target prosody. This introduces a large delay between the (silent) articulation and the generation of the synthetic audio signal. This delay prevents the communication partners from having a fluent conversation. The main goal of this PhD project is to design a real-time silent speech interface, in which the delay between the articulatory gesture and the corresponding acoustic event has to be constant and as short as possible. Goals: The goal of this PhD project is twofold: (1) Reducing the delay between the recognition and the synthesis steps, by designing a new generation of TTS system, called “ incremental TTS system ” [3]. This system should be able to synthesize the decoded words, with acceptable prosody, as soon as they are provided by the visual speech recognition system. (2) Designing experimental paradigms in order to evaluate the system in realistic communication situations (faceto- face, remote/telephone-like interaction, human-machine interaction). The goal is to study how a silent speaker benefits from the acoustic feedback provided by the incremental TTS and how he/she adapts his/her own articulation to maximize the efficiency of the communication. Supervision: Dr. Thomas Hueber, Dr. Gérard Bailly (CNRS/GIPSA-lab) Duration / Salary : 36 months (October 2013- October 2016) / ~ €1400/month minimum (net salary). Research fields : multimodal signal processing, machine learning, interactive systems, experimental design Background: Master’s or engineer’s degree in computer science, signal processing or applied mathematics. Skills : Good skills in mathematics (machine learning) and programming (Matlab, C, Max/MSP). Knowledge in speech processing or computational linguistics would be appreciated. To apply : send your CV, transcript of records of your Master grade and a cover letter to thomas.hueber@gipsa-lab.grenoble-inp.fr References : [1] B. Denby, T. Schultz, K. Honda, T. Hueber, et al., “Silent Speech Interfaces,” Speech Communication, vol. 52, no. 4, pp. 270-287, 2010. [2] T. Hueber, E. L. Benaroya, G. Chollet, et al., “Development of a Silent Speech Interface Driven by Ultrasound and Optical Images of the Tongue and Lips”, Speech Communication, vol. 52, no. 4, pp. 288-300, 2010. [3] Buschmeier H, Baumann T, Dosch B, Schlangen D, Kopp S. “Combining Incremental Language Generation and Incremental Speech Synthesis for Adaptive Information Presentation”, in proc of the 13th Sigdial meeting, pp, 295-303, 2012.
| |||||
6-35 | (2013-05-10) Postdoctoral fellow at Toronto Rehabilitation Institute, University of Toronto We are seeking a skilled postdoctoral fellow (PDF) whose expertise intersects automatic speech recognition (ASR) and human-robot interaction (HRI). The PDF will work with a team of internationally recognized researchers to create an automated speech-based dialogue system between computers and robotic systems, and individuals with dementia and other cognitive impairments. These systems will automatically adapt the vocabularies, language models, and acoustic models of the component ASR to data collected from individuals with Alzheimer’s disease. Moreover, this system will analyze the linguistic and acoustic features of a user’s voice to infer the user’s cognitive and linguistic abilities, and emotional state. These abilities and mental states will in turn be used to adapt a speech output system to be more tuned to the user.
Work will involve programming, data analysis, dissemination of results (e.g., papers and conferences), and partial supervision of graduate and undergraduate students. Some data collection may also be involved.
The successful applicant will have: 1) A doctoral degree in a relevant field of computer science, electrical engineering, biomedical engineering, or a relevant discipline; 2) Evidence of impact in research through a strong publication record in relevant venues; 3) Evidence of strong collaborative skills, including possibly supervision of junior researchers, students, or equivalent industrial experience; 4) Excellent interpersonal, written, and oral communication skills; 5) A strong technical background in machine learning, natural language processing, robotics, and human-computer interaction.
This work will be conducted at the Toronto Rehabilitation Institute, which is affiliated with the University of Toronto.
--== About the Toronto Rehabilitation Institute ==--
One of North America’s leading rehabilitation sciences centres, Toronto Rehabilitation Institute (TRI) is revolutionizing rehabilitation by helping people overcome the challenges of disabling injury, illness ,or age-related health conditions to live active, healthier, more independent lives. It integrates innovative patient care, ground-breaking research and diverse education to build healthier communities and advance the role of rehabilitation in the health system. TRI, along with Toronto Western, Toronto General, and Princess Margaret Hospitals, is a member of the University Health Network and is affiliated with the University of Toronto.
If interested, please send a brief (1-2 page) statement of purpose, an up-to-date resume, and contact information for 3 references to Alex Mihailidis (alex.milhailidis@utoronto.ca) and Frank Rudzicz (frank@cs.toronto.edu) by 31 July 2013. The position will remain open until filled.
Frank Rudzicz, PhD. Scientist, Toronto Rehabilitation Institute; Assistant professor, Department of Computer Science, University of Toronto; Founder and Chief Science Officer, Thotra Incorporated >> http://www.cs.toronto.edu/~frank (personal) >> http://spoclab.ca (lab)
| |||||
6-36 | (2013-05-15) Post-doctorat dans le cadre du projet ANR DIADEMS, LABRI, Bordeaux France ance'Offre de post-doctorat dans le cadre du projet ANR DIADEMS (Description, Indexation, Accès aux Documents Ethnomusicologiques et Sonores).
- Sujet de post-doctorat : identification / classification instrumentale
Durée : 12 mois Salaire : environ 2000 €/mois Date de début souhaitée : septembre 2013
La reconnaissance automatique d'instrument et la classification par famille d'instruments est un domaine de recherche actif du MIR (Music Information Retrieval) [Hei09] [Kit07] [Her06] [Ess06]. Les principales techniques reposent sur des méthodes statistiques utilisant des paramètres audio de type MFCC. Nous souhaitons ici tracer une voie nouvelle, permettant de faire le lien entre le traitement de la parole et le traitement de la musique, en considérant l'interprétation musicale comme une phrase, et l'instrument ou l'instrumentiste comme un locuteur.
Ce travail s'effectuera en parallèle d'une thèse en cours sur la caractérisation et l'identification de la voix chantée. Au cours de cette thèse, nous avons proposé une méthode permettant d'identifier les segments contenant de la voix chantée dans des enregistrements polyphoniques (e.g. musique 'pop'). L'objet actuel d'étude est de déterminer quels sont les paramètres du signal les plus pertinents pour caractériser différents styles de chant.
Une des pistes que nous souhaitons poursuivre sera d'identifier l'instrument en suivant le vibrato, de manière similaire à ce qui est proposé pour la voix chantée. En insistant sur la dimension temporelle plutôt que spectrale, nous pourrons aussi observer comment s'enchainent les respirations, les attaques sonores ou les changements timbraux utilisés par le musicien. Ce travail exploratoire nécessitera dans un premier temps d'effectuer des expérimentations sur des bases de données simples (telles que [Fri97] et [Got03]) afin de valider notre approche avant d'appliquer nos algorithmes aux données du projet DIADEMS.
- Références :
[Hei09] Heittola, T., Klapuri, A., Virtanen, T., 'Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation,' in Proc. 10th Int. Society for Music Information Retrieval Conf. (ISMIR 2009), Kobe, Japan, 2009.
[Kit07] Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G. Okuno: 'Instrument Identification in Polyphonic Music: Feature Weighting to Minimize Influence of Sound Overlaps', EURASIP Journal on Advances in Signal Processing, Special Issue on Music Information Retrieval based on Signal Processing, Vol.2007, No.51979, pp.1--15, 2007.
[Her06] P. Herrera-Boyer, A. Klapuri, and M. Davy. Automatic classification of pitched musical instrument sounds. Signal Processing Methods for Music Transcription, pages 163–200. Springer, 2006.
[Ess06] S. Essid, G. Richard, and David.B. Instrument recognition in polyphonic music based on automatic taxonomies. IEEE Transactions on Audio, Speech & Language Processing, 14(1):68–80, 2006.
[Fri97] L. Fritts, “Musical Instrument Samples,” Univ. Iowa Electronic Music Studios, 1997–. [Online]. Available: http://theremin.music.uiowa.edu/MIS.html
[Got03] Goto M, Hashiguchi H, Nishimura T, Oka R. RWC music database: Music genre database and musical instrument sound database. ISMIR. 2003:229–230.
---------
Description du projet DIADEMS (Partenaires : LaBRI, IRIT, LESC, Parisson, LIMSI, MNHN, LAM-IJLRA)
Le Laboratoire d'Ethnologie et de Sociologie Comparative (LESC) comprenant le Centre de Recherche en Ethnomusicologie (CREM) et le centre d'Enseignement et de Recherche en Ethnologie Amérindienne (EREA) ainsi que le Laboratoire d'Eco-anthropologie du Muséum National d'Histoire Naturelle (MNHN) sont confrontés à la nécessité d'indexer les fonds sonores qu'ils gèrent et de faire un repérage des contenus, travail long, fastidieux et coûteux.
Lors de l'Ecole d'Été interdisciplinaire Sciences et Voix 2010 organisée par le CNRS, une convergence d'intérêts s'est dégagée entre les acousticiens, les ethnomusicologues et les informaticiens : il existe aujourd'hui des outils d'analyse avancés du son développés par les spécialistes en indexation qui permettent de faciliter le repérage, l'accès et l'indexation des contenus.
Le contexte du projet est l'indexation et l'amélioration de l'accès aux fonds d'archives sonores du LESC : le fonds du CREM et celui d'ethnolinguistique de l'EREA (« chanté-parlé » Maya, ainsi que celui du MNHN (musique traditionnelle africaine). Il s'inscrit dans la continuité d'une réflexion entreprise en 2007 pour l'accès aux données sonores de la Recherche : aucune application n'existant en « open source » sur le marché, le CREM-LESC, le LAM et la Phonothèque de la MMSH d'Aix-en Provence ont étudié la conception d'un outil innovant et collaboratif qui répond à des besoins « métier » liés à la temporalité du document, tout en étant adapté à des exigences du secteur de la recherche. Avec le soutien financier du Très Grand Equipement (TGE) ADONIS du CNRS et du Ministère de la Culture, la plateforme Telemeta développée par la société PARISSON a été mise en ligne en mai 2011 : http://archives.crem-cnrs.fr . Sur cette plateforme, des outils d'analyse élémentaires de traitement de signal sont d'ores et déjà disponibles.
Cependant, il est nécessaire de disposer d'un ensemble d'outils avancés et innovants pour une aide à l'indexation automatique ou semi-automatique de ces données sonores, issues d'enregistrements parfois longs, au contenu très hétérogène et d'une qualité variée. L'objectif du projet DIADEMS est de fournir certains des outils, de les intégrer dans Telemeta, en répondant aux besoins des usagers. Il s'en suit une complémentarité des objectifs scientifiques des différents partenaires : Les fournisseurs de technologies, l'IRIT, le LIMSI, le LaBRI et le LAM auront à : - Fournir des technologies existantes telles que la détection de parole, de musique, la structuration en locuteurs. Ces outils visent à extraire des segments homogènes d'intérêt pour l'usager. Ces systèmes auront à faire face à la diversité des bases qu'il est proposé d'étudier dans ce projet ; leur hétérogénéité est liée aux conditions d'enregistrement, au genre et à la nature des documents, à leur origine géographique. Il faudra adapter ces systèmes dits « état de l'art » aux besoins des usagers. - Proposer des outils innovants d'exploration du contenu de segments homogènes. Les travaux sur l'opposition voix parlée-déclamée-chantée, le chant, les tours de chant, la recherche de similarité musicale ne sont pas matures. Un véritable travail de recherche reste à faire et avoir à sa disposition des musicologues et des ethnomusicologues est un atout positif. Les ethnomusicologues, ethnolinguistes, acousticiens spécialistes de la voix et les documentalistes spécialisés vont jouer un rôle important dans le projet en tant que futurs utilisateurs des outils d'indexation : Les documentalistes doivent s'approprier les outils et apporter leur expérience afin d'adapter ces outils à leur besoin en indexation.
Un échange important doit se réaliser entre celui qui fournit l'outil, celui qui l'intègre et celui qui l'utilise. L'effort doit être porté sur la visualisation des résultats avec pour fin une aide forte à l'indexation en la rendant de fait semi-automatique Pour l'ethnomusicologue et le musicologue, l'objectif va au-delà de l'indexation. Il s'agit au travers d'aller et retour entre lui et les concepteurs de technologies de cibler les outils pertinents d'extraction d'information.
Jean-Luc Rouas LABRI 351, cours de la Libération 33405 Talence cedex France (+33) 5 40 00 35 08
| |||||
6-37 | (2013-05-16) Ph D Avignon France Reconnaissance du locuteur en milieu bruité Nous avons atteint ces dernières années de très bonnes performances en reconnaissance du locuteur. Et ce, malgré la présence de la variabilité session. En effet, le variabilité session est prise en compte lors du scoring en utilisant une matrice de covariance modélisant cette dernière. Ce processus est effectué dans l'espace des i-vectors [1]. Le concept des i-vectors est devenu un standard en reconnaissance du locuteur. Dans la dernière évaluation internationale NIST 2012, nous avons été confrontés à une nouvelle difficulté qui est le bruit additif [2], c'est à dire le bruit ambiant. La recherche pour réduire l'impact du bruit dans les systèmes de reconnaissance du locuteur est motivée en grande partie par le besoin d'appliquer les technologies de reconnaissance du locuteur sur des appareils portables ou sur l'Internet. Alors que les technologie promet un niveau supplémentaire de sécurité biométrique pour protéger l'utilisateur, la mise en œuvre pratique de ces systèmes doit faire face à de nombreux défis. Un des plus importants défis à surmonter est le bruit environnemental. En raison de la mobilité de ces systèmes, les sources de bruit peuvent être très variables dans le temps et potentiellement inconnus. Nous proposons de travailler dans ce cadre : proposer des stratégies permettant de compenser l'effet du bruit additif, ces stratégies peuvent intervenir à différents niveaux du processus de reconnaissance: au niveau du signal, au niveau des modèles acoustiques, au niveau des i-vectors et au niveau du scoring....) .
Dans une deuxième partie du travail, nous proposons de nous mettre dans les meilleures conditions pour que le système soit le plus robuste au bruit. Par exemple, le choix de l'énoncé à prononcer par le locuteur peut avoir de l'influence sur les performances du système [3]. Faut a t-il avoir avoir le même énoncé pour tous les locuteurs, ou au contraire chaque locuteur se distingue des autres locuteur sur un ensemble bien précis d'unités acoustiques. Dans ce dernier cas, il faut trouver une stratégie, qui permet de déterminer l'ensemble des unités acoustiques qui différencient le plus possible un locuteur (des autres locuteurs). D'autres stratégies de robustesse au bruit doivent être proposées et étudiées dans le cadre de cette thèse. Une des pistes à explorer est l'utilisation de la théorie des caractéristiques manquantes (missing-feature theory), qui a été utilisée dans le domaine du traitement de la parole [4][5][6]. Les systèmes de reconnaissance du locuteur de l'état de l'art sont fondamentalement basés sur l'utilisation de l'UBM (Universal Backgroud Model), il s'agit d'un modèle trop simple pour le traitement et la modélisation de la parole. Dans le cas de la reconnaissance en milieu bruité, la tâche devient plus complexe, il est donc légitime de se reposer la question sur l'adéquation de ce modèle pour cette tâche. Nous proposons d'adapter une approche utilisant des HMM (ou autre modèle) à cette tâche tout en profitant des avancées récemment proposées ( Factor analysis, I-vectors, …). [1] Bousquet Pierre-Michel, Matrouf Driss and Bonastre Jean-François, «Intersession compensation and scoring methods in the i-vectors space for speaker recognition » Interspeech 2011, Florence. [2] Miranti Indar Mandasari, Mitchell McLaren and David A. van Leeuwen, « The Effect of noise on modern automatic speaker recognition systems » , ICASSP 2012. [3] Anthony Larcher, Pierre-Michel Bousquet, Kong-Aik Lee, Driss Matrouf, Haizhou Li, Jean-François Bonastre, « I-vectors in the context of phonetically-constrained short utterances for speaker verification. » ICASSP 2012: 4773-4776. [4] M.P. Cooke, P.G. Green, L. Josifovski, and A. Vizinho, « Robust ASR with unreliable data and minimal assumptions, » in Proc., Robust’99, 1999 [5] M.P. Cooke, P.G. Green, L. Josifovski, and A. Vizinho, « Robust Automatic Speech Recognition with missing and unreliable acoustic data, » Speech Communication,, 2000. [6] B. Raj, M.L. Seltzer, and R.M. Stern, « Reconstruction of missing features for robust speech recognition, » Speech Communication, 2004.
| |||||
6-38 | (2013-06-03) 2 Ph D's Université d'AvignonLe labex Brain and Language Research Institute (BLRI) financera à la rentrée prochaine deux sujets
de thèse dont l'un des sujets pourra être celui proposé ci-dessous (en fonction des candidatures
retenues).
Calendrier :. date limite des candidatures : 10 juin.
auditions des candidats retenus : 24 juin
Bourse : 1 684.93€ brut mensuel (1 368€ nets)
Dossier des candidats :. CV détaillé. Notes. Motivation et/ou projet scientifique correspondant au sujet Contact scientifique : Corinne Fredouille et Christine Meunier
Contact administratif : nadera.bureau@blri.fr
Description du sujet :
Titre : Détection de zones de déviance dans la parole pathologique : apport du traitement automatique face à l'expertise humaine Superviseurs : Corinne Fredouille, Christine Meunier Laboratoire d'accueil : Laboratoire Informatique d'Avignon (collaboration avec le Laboratoire Parole et Langage –Aix-en-Provence) Discipline et école doctorale : Informatique, école doctorale ED536 de l'Université d'Avignon Calendrier : . date limite des candidatures : 10 juin . auditions des candidats retenus : 24 juin Bourse : 1 684.93€ brut mensuel (1 368€ nets) Description scientifique Si la définition de l'étendue de la variabilité en parole normale est une question fondamentale pour les théories linguistiques actuelles, une façon d'aborder ses limites est d'essayer de déterminer ses frontières par le biais de la variation pathologique. Comme le soulignent Duffy et Kent, 2001 « Science often takes advantages of nature’s accidents to learn the principles of a process ». Sur ce principe, la connaissance de la parole pathologique s'appuyant sur la compréhension des phénomènes d'altération « observables » dans la production de parole de patients atteints de troubles de la parole devient une nécessité. La parole dysarthrique correspond à une altération de la commande motrice d’origine centrale ou périphérique des gestes de la parole. Des variations importantes existent dans la parole dysarthrique en relation avec un déficit de l’exécution temporo-spatiale des mouvements de la parole et qui peuvent affecter différents niveaux de production (respiratoire, laryngé et supralaryngé). La grande majorité des travaux ayant porté sur l'étude de la parole dysarthrique repose sur des analyses perceptives. La raison principale tient dans le fait qu'un patient dysarthrique est dysarthrique parce qu'il « s'entend/a l'air » dysarthrique. Les travaux les plus connus au niveau international sont ceux de Darley et al., 1975. Ils ont conduit à l'élaboration d'une organisation des dysarthries en 6 classes (complétée par deux classes supplémentaires par Duffy, 1995) sur la base de clusters physiopathologiques définis à partir de la concomitance de caractéristiques les plus déviantes perçues par un jury d'écoute. L’hypothèse sous-jacente à la construction de ces clusters est qu’un ensemble de paramètres perturbés simultanément, mis en relation avec l’atteinte neurologique, reflèterait un processus physiopathologique particulier. Si cette classification reste d'actualité encore aujourd'hui pour évaluer notamment la parole dysarthrique en pratique clinique, elle reste néanmoins sujette à controverses pour deux raisons principales : la subjectivité des évaluations perceptives d'une manière générale et la difficulté pour un être humain, même expérimenté, à distinguer et à juger perceptivement les multiples dimensions à prendre en compte dans l'évaluation de la parole dysarthrique. En conséquence, différents travaux ont été menés à partir des années 80 à aujourd'hui dans l'objectif de combiner ces analyses perceptives à des méthodes plus objectives et quantitatives telles que les analyses instrumentales basées sur des mesures acoustiques ou physiologiques (revue de la littérature dans Kay, 2012). Si les analyses instrumentales peuvent s'appuyer sur des traitements semi- voire entièrement automatiques, l'analyse acoustique fine nécessaire pour comprendre les phénomènes déviants inhérents à la dysarthrie dans le signal de parole demeure encore très coûteuse en temps par un expert humain. Dans ce contexte, une grande part des études présentes dans la littérature repose soit sur un nombre très restreint de patients ou sur une pathologie bien ciblée. Pourtant, la grande variabilité des phénomènes déviants observés dans la parole dysarthrique en fonction de la pathologie du patient, de l'avancement de la maladie ou de la sévérité de la dysarthrie requiert d'analyser une large population de patients. L'objectif de cette thèse est d'étudier comment les outils du traitement automatique de la parole pourraient permettre de traiter de larges populations de patients dysarthriques et de focaliser l'attention des experts humains sur des zones de déviance bien identifiées du signal en vue d'analyses plus fines. Ces travaux reposeront notamment sur le système de transcription automatique du LIA et ses activités de recherche autour des mesures de qualité des transcriptions (Lecouteux, 2008 et Senay, 2011). La granularité de la détection des zones de déviance – ici potentiellement le mot ou la séquence de mots – sera dans un second temps affiner par des outils travaillant à des niveaux inférieurs allant jusqu'au phonème (Fredouille, 2011). Ces travaux devront tenter de répondre à différentes questions : • Face à la variabilité des phénomènes de déviance observés dans la parole dysarthrique et répertoriés dans la littérature, quels sont ceux qu'un système de détection automatique est capable de déceler ? • Est ce qu'un système automatique est capable de mettre en évidence les mêmes phénomènes de déviance qu'un expert humain lors d'une évaluation perceptive ? • Les déviances détectées par le système automatique sont-elles pertinentes pour les phonéticiens ? • Est-il possible de mettre en relation les déviances détectées avec la physiopathologie du patient (ex : indices hypokinéthiques pour la maladie de Parkinson, indices paralytiques pour la SLA, …) ? Les travaux autour du système de transcription automatique du LIA devraient également ouvrir des perspectives sur la mise en place d'un système de mesures objectives de l'intelligibilité des patients dysarthriques. Ces travaux de thèse seront réalisés dans le cadre d'une collaboration étroite entre le LIA (Corinne Fredouille) pour son expertise autour des systèmes automatiques, le LPL (Christine Meunier et Alain Ghio) pour son expertise sur les analyses acoustico-phonétiques et les évaluations perceptives, les hôpitaux de la Timone (Dr Danièle Robert) et des Pays d'Aix (François Viallet) pour leur expertise clinique. Ils seront basés sur le corpus de patients dysarthriques élaborés dans le cadre du projet ANR DesPhoAPady (2009-2012 – Fougeron, 2010). Ce corpus présente un large panel de patients souffrant de différentes pathologies (maladie de Parkinson, Sclérose Latérale Amiotrophique, syndrôme cérébelleux) et différents niveaux de sévérité de dysarthrie. Références : J. R. Duffy, R. D. Kent, « Darley's contributions to the understanding, differential diagnosis, and scientific study of the dysarthrias », Aphasiology 15(3):275 – 289, 2001. F. L. Darley, A. E. Aronson, J. R. Brown, « Motor Speech Disorders », Philadelphia: W.B. Saunders, 1975. J. R. Duffy, « Motor speech disorders : substrates, differential diagnosis and management », Motsby- Yearbook, St Louis, 1e édition, 1995. T. S. Kay, « Spectral analysis of stop consonants in individuals with dysarthria secondary to stroke », PhD thesis, Department of Communication Sciences and Disorders, Faculty of the Louisiana State University and Agricultural and Mechanical College, USA, 2012. B. Lecouteux, « Reconnaissance automatique de la parole guidée par des transcriptions a priori », Thèse de doctorat, Université d'Avignon et des Pays Vaucluse, 2008. G. Senay, « Approches semi-automatiques pour la recherche d’information dans les documents audio », Thèse de doctorat, Université d'Avignon et des Pays Vaucluse, 2011. C. Fredouille, G. Pouchoulin, « Automatic detection of abnormal zones in pathological speech », International Congress of Phonetic Sciences (ICPHs'11), Hong Kong, 17-21 Août 2011. C. Fougeron et al., « Developing an acoustic-phonetic characterization of dysarthric speech in French », LREC'10 (international conference on Language Resources and Evaluation), Malte, Mai 2010. Dossier des candidats : . CV détaillé . Notes . Motivation et/ou projet scientifique correspondant au sujet Contact scientifique : Corinne Fredouille et Christine Meunier Contact administratif : nadera.bureau@blri.fr
Title: Detection of deviant zones in pathological speech : contribution of the automatic speech processing against the Human expertise Supervisor : Corinne Fredouille, Christine Meunier Host laboratory : Laboratoire Informatique d'Avignon (collaboration with the Laboratoire Parole et Langage – Aix-en-Provence) Field and doctoral school : computer sciences, doctoral school ED536 of the University of Avignon Date : . deadline for the application : 10th of june . auditions of chosen candidates : 24th of june Grant : 1 684.93€ montly gross (1 368€ net) Scientific description If the definition of the variability range in normal speech is a key issue for the current linguistic theories, a way of dealing with its limits is to attempt to determine its frontiers through pathological variation. As reported by Duffy and Kent, 2001 « Science often takes advantages of nature’s accidents to learn the principles of a process ». Based on this, the knowledge of the pathological speech, based on the understanding of alteration phenomena, « observable » on the speech production of patients suffering of speech disorders becomes a necessity. Dysarthria is a group of speech disorders resulting from neurological impairments of speech motor control. Substantial variations occur in dysarthric speech due to a deficit in the spatio-temporal execution of speech movements that affects different levels of speech production (respiratory, laryngeal and supralaryngeal). The vast majority of research work dedicated to the study of the dysarthric speech relies on perceptual analyses. The main reason is that a dysarthric patient is dysarthric because he/she sounds dysarthric. The most known study, at international level, was done by Darley et al., 1975. This work leads to organize dysarthria into 6 classes (completed with 2 additional classes by Duffy, 1995) on the basis of physiopathological clusters defined from the co-occurrences of the most deviant features perceived by a perceptual jury. The hypothesis underlying the building of these clusters is that a set of simultaneously disturbed features, connected with the neurological injuries, should reflect a typical physiopathological process. If this classification is still used nowadays to evaluate dysarthric speech in clinical practice notably, it remains controversal for a couple of reasons : the subjectivity of perceptual evaluation and the difficulty for a Human being, even with a high expertise, of distinguishing and assessing perceptually the multiple dimensions to take into account when dealing with dysarthric speech. Consequently, different research work has been conducted since the 1980s until now which aims at combining these perceptual analyses with more objective and quantitative approaches such as the instrumental analyses based on acoustic or physiological measure (review of the literature can be found in Kay, 2012). Contrary to the instrumental analyses which can rely on some semi- or full-automatic process, in-depth acoustic analysis of speech necessary to understand the deviant phenomena related to dysarthria still remains very timeconsuming for a Human expert. Based on this, a signifiant proportion of studies in the literature are conducted on a limited number of patients or on a focused pathology. However, the large variability of deviant phenomena observed in dysarthric speech according to the patient’s pathology, the stage of the disease, or the dysarthria severity require the analysis of a large patient population. The aim of this thesis is to study how the automatic speech processing tools could permit to treat large populations of dysarthric patients and to focus Human experts’ attention on speech zones well identified as deviant for further in-depth analyses. This work will rely on the automatic speech transcription developed at the LIA and its activities on the quality measure of transcriptions (Lecouteux, 2008 et Senay, 2011). The granularity of the deviant zone detection – here the word or set of words – will be refined, in a second step, by applying existing detection tools working at lower levels like the phoneme (Fredouille, 2011). This work will attempt to answer the following key issues : • Given the variability of deviant phenomena observed on dysarthric speech and reported in literature, which ones is an automatic detection system able to capture ? • Is an automatic system able to highlight the same deviant phenomena as a Human expert will detect perceptually ? • Are deviant speech zones detected by an automatic system relevant for a phonetician ? • Does a correlation between the type of deviant phenomena detected and the patient’s physiopathology exist (e.g : hypokinetic feature for the Parkinson disease, paralytic features for ALS, …) ? Studies relating to the automatic speech transcription should open up new perspectives on the implementation of an objective system dedicated to the evaluation of the dysarthric patient’s intelligibility. This thesis work will be carried out within a close collaboration between the LIA (Corinne Fredouille) for her expertise on the automatic system dedicated to speech processing, the LPL (Christine Meunier and Alain Ghio) for their expertise on acoustic-phonetic analyses and perceptual evaluations, both the hospitals of La Timone (Dr Danièle Robert) and desPays d’Aix (Pr. François Viallet) for their clinical expertise. It will be based on the dysarthric patient corpus designed for the ANR DesPhoAPady project (2009-2012 – Fougeron, 2010). This corpus includes a large population of patients suffering from various pathologies (Parkinson disease, ALS, cerebelar syndrom, …) and different levels of dysarthria severity. Bibliography : J. R. Duffy, R. D. Kent, « Darley's contributions to the understanding, differential diagnosis, and scientific study of the dysarthrias », Aphasiology 15(3):275 – 289, 2001. F. L. Darley, A. E. Aronson, J. R. Brown, « Motor Speech Disorders », Philadelphia: W.B. Saunders, 1975. J. R. Duffy, « Motor speech disorders : substrates, differential diagnosis and management », Motsby- Yearbook, St Louis, 1e édition, 1995. T. S. Kay, « Spectral analysis of stop consonants in individuals with dysarthria secondary to stroke », PhD thesis, Department of Communication Sciences and Disorders, Faculty of the Louisiana State University and Agricultural and Mechanical College, USA, 2012. B. Lecouteux, « Reconnaissance automatique de la parole guidée par des transcriptions a priori », Thèse de doctorat, Université d'Avignon et des Pays Vaucluse, 2008. G. Senay, « Approches semi-automatiques pour la recherche d’information dans les documents audio », Thèse de doctorat, Université d'Avignon et des Pays Vaucluse, 2011. C. Fredouille, G. Pouchoulin, « Automatic detection of abnormal zones in pathological speech », International Congress of Phonetic Sciences (ICPHs'11), Hong Kong, 17-21 Août 2011. C. Fougeron et al., « Developing an acoustic-phonetic characterization of dysarthric speech in French », LREC'10 (international conference on Language Resources and Evaluation), Malte, Mai 2010. Candidate application form : . detailed CV . marks . Motivation and/or scientific project related to the topic Scientific Contact : Corinne Fredouille et Christine Meunier Administrative Contact : nadera.bureau@blri.fr
| |||||
6-39 | (2013-05-16) Internships in Natural Language Processing and Machine Translation, Dublin City University, Ireland At the Centre for Next Generation Localisation (CNGL) in Dublin, Ireland, we have a number of internships available covering a wide range of topics in Natural Language Processing and Machine Translation based at our Dublin City University site. The internships are available for both basic research and more applied research projects (including development-focused work).
| |||||
6-40 | (2013-05-17) Speech Technology Researcher/Developer (main focus ASR) , Liguwerk GmbH Dresden Germany vacant position as a Dr.-Ing. Rico Petrick
| |||||
6-41 | (2013-06-05) Specialist in speech processing, CUI/University of Geneva The CUI/University of Geneva seeks a qualified candidate for one
|