ISCA - International Speech
Communication Association


ISCApad Archive  »  2017  »  ISCApad #225  »  Jobs

ISCApad #225

Saturday, March 11, 2017 by Chris Wellekens

6 Jobs
6-1(2016-10-03) Ph grant at the laboratoire de phonétique de l’Université de Mons, Belgium

Offre de bourse de doctorat

_____________________________________________________________________________________

Service de Métrologie et Sciences du Langage, Laboratoire de

phonétique,

Université de Mons, Mons, Belgique

_____________________________________________________________________________________

Le service de métrologie et sciences du langage (laboratoire de phonétique de l’Université de Mons,

Belgique) recherche, un spécialiste (M/F) de l’étude de la parole humaine désireux de préparer, en son

sein, une thèse de doctorat.

Profil du candidat (M/F) :

Secteur de formation initiale :

Sciences du langage (linguistique, logopédie, psychologie du langage,…) à titre de formation de

base ou à tout le moins de formation complémentaire approfondie.

Niveau à l’entrée :

Au moins niveau « master » (« bac+5 », 300 crédits) au sens du décret de la Communauté

Française de Belgique organisant l’enseignement supérieur.

Compétences transversales

Aptitude au travail en équipe, créativité, autonomie, curiosité scientifique.

Bonne maîtrise de l’outil informatique (tableurs, gestion de bases de données, traitements de

texte).

Maîtrise au moins quasi-native de la langue française à l’écrit et à l’oral.

Maîtrise de la langue anglaise au moins au niveau C1 du CECR dans les 4 compétences .

Une sensibilité à la problématique des langues étrangères, la maîtrise d’autres langues, des connaissances

en phonétique (en particulier l’analyse objective de la parole), des connaissances et des compétences en

mathématiques appliquées (en particulier en statistiques), constituent des atouts complémentaires.

Track record

La personne recrutée peut se prévaloir de mentions égales ou supérieures à la grande distinction en

bachelier et en master.

Profil du poste:

La personne recrutée se voit octroyer une bourse de doctorat d’un an.

Elle s’engage à tenter d’obtenir, durant ce terme, un financement plus étendu en concourant pour des

postes tels que chercheur FRIA, FRESH et/ou aspirant FNRS.

Au terme de la première année, en cas d’absence du financement alternatif ainsi recherché, la personne

est évaluée par le comité d’accompagnement. Sur cette base, elle peut se voir octroyer –ou non- un

terme additionnel de 2 ans. Durant ce nouveau terme, elle s’engage à poursuivre sa recherche de

financements extérieurs de son doctorat.

En aucun cas, la durée cumulée des bourses de recherche doctorale ne peut excéder 4 années.

Prise de fonctions :

au plus tôt.

Les personnes intéressées sont priées d’adresser, pour le 30 octobre au plus tard, un dossier

comportant :

une lettre de motivation ;

un curriculum vitae ;

tout document jugé utile ;

au format pdf (exclusivement) à l’adresse : bernard.harmegnies@umons.ac.be

Projet scientifique

Des concepts tels que fatigue ou stress sont fréquemment évoqués tant en sciences de la

vie qu’en sciences humaines. Ils se caractérisent non seulement par le fait que leur portée

s’étend aussi bien à la biochimie qu’au psychisme humains, mais surtout par l’idée qu’une

action sur l’esprit peut ici avoir des répercussions sur le corps et vice-versa. Ces notions

demeurent pourtant variablement définies et diversement objectivées, comme l’est aussi,

dans ce contexte, l’interaction entre physiologie et psychisme.

Le projet dans le cadre duquel s’inscrit le doctorat pour lequel il est ici fait appel vise à

élucider ces relations complexes en étudiant l'évolution conjointe, dans une approche intrasujets,

de trois types de variables: (i) des variables situationnelles (tant à variation invoquée

qu’à variation provoquée); (ii) des marqueurs biologiques de l'état du sujet humain

(approche métabonomique et bio-marqueurs spécifiques enregistrés au sein de divers

biofluides); (iii) les mesures révélatrices du traitement du langage par le sujet (gestion de la

parole en émission et en réception).

Les contextes dans lesquels sont recueillies les observations sont ceux du contrôle par le

sujet humain de processus complexes, spécialement en aéronautique, domaine hautement

générateur de « situations-problèmes » potentiellement suscitatrices de phénomènes.

Le doctorat se centre plus particulièrement sur les variables situationnelles et cible

spécifiquement celles qui sont liées aux langues utilisées par le sujet. On sait aujourd’hui que

la réalité physique des sons de parole est notamment influencée par divers facteurs qui s’y

rapportent. Ceux-ci peuvent être liés à l’inscription communautaire du sujet (on peut par

exemple citer les variabilités diatopique, diastratique, ou diachronique), voire aux actions

exogènes visant délibérément à modifier les caractéristiques phoniques des sons de parole,

par exemple dans les contextes de l’enseignement/apprentissage (et/ou de l’utilisation) de

langues non-maternelles. D’autres déterminants, de nature endogène au sujet, peuvent

également se manifester, qu’ils trouvent leur origine dans la sphère cognitive (maîtrise des

langues, expertise multilingue, etc.), soit dans la sphère affective (posture personnelle par

rapport aux langues utilisées). Ces travaux qui, à des titres divers, démontrent l’action de ces

facteurs endogènes sur les productions vocales, ouvrent la voie à un positionnement moins

descriptif mais plus centré sur la valeur indicielle des observations effectuées: puisque ces

facteurs ont un effet sur le signal vocal, la détection de leurs marques dans le signal ouvre la

voie d’une caractérisation, par la seule analyse des productions phoniques, de l’état du

locuteur.

Le secteur des transports, et en particulier celui de l’aéronautique, s’est montré

graduellement plus intéressé par ces perspectives lors des dernières décennies ; à un

moment où nombre d’incidents ou d’accidents y sont aujourd’hui imputables au facteur

humain plus qu’à des défauts techniques, le développement de recherches susceptibles de

contribuer à l’élaboration de systèmes d’alerte propres à détecter des altérations de la

fonctionnalité du pilote et ce, sur la base de variations du seul signal vocal, fait figure de défi

stratégique.

Si plusieurs recherches ont certes démontré l’intérêt de ces perspectives, force est

cependant de constater que les résultats en sont extrêmement diversifiés, voire parfois

contradictoires. Ceci s’explique probablement d’une part par l’insuffisance du volume de

données global recueilli et d’autre part par l’importante diversité méthodologique qui

caractérise le champ. De ce point de vue, trois dimensions apparaissent nécessiter une

attention particulière. D’une part, ces recherches, sont d’ordinaire restreintes à des

productions vocales anglophones, laissent dans l’ombre les autres langues de

communication aéronautique et, ipso facto, négligent les interactions possibles entre le

facteur langue et les divers facteurs étudiés ; peu, par ailleurs, prennent en considération le

caractère fréquemment multilingue des communications aéronautiques et le fait que,

souvent, les agents sont amenés à s’exprimer dans une langue non maternelle ; aucune, par

ailleurs, n’interroge le problème de la perte différentielle de compétence phonique en L2 et

en L1 sous l’effet des conditions adverses de communication.

La thèse visée se centrera en conséquence tant sur les effets exercés par les variables

situationnelles liées aux situations de contrôle de processus complexes sur la performance

multilingue que sur les effets de divers types de multilinguismes sur l’efficacité du contrôle

de processus complexes gérés en contextes multilingues.

Back  Top

6-2(2016-10-06) Two positions at DefinedCrowd Corporation at its RandD center in Lisbon, Portugal

We’re DefinedCrowd Corporation (http://www.definedcrowd.com), headquartered in Seattle, Washington, with a R&D center in Lisbon, Portugal. We provide machine learning training data for artificial intelligence products and applications internationally. Our clients ranges from Fortune 500 companies to cutting-edge AI and robotics companies and we just got invested by Amazon and Sony. We are a Microsoft Accelerator alumni company, and we are recognized as the fastest growing company in the big data arena for AI. As we continue to expand our international footprint at DefinedCrowd, we are looking for talented new members to join this energetic, hardworking and fun team in both Seattle headquarter and in Lisbon R&D center. As we continue to grow at DefinedCrowd, we are constantly challenging ourselves with finding the most efficient way to manage business operation, building most suitable and sustainable models for our lines of business, creating a highly recognized and trusted brand among both clients and crowd communities.

Current job offers:

 

Data Scientist (Seattle or Lisbon)

We’re looking for a Data Scientist with an NLP background reporting to the CTO to drive the data science strategy and quality control using machine learning for DefinedCrowd.

 

Main requirements

 

  • MSc/PhD in Computer Science or equivalent with experience in Machine Learning (including DNNs), Artificial Intelligence or Statistics.

  • Creative thinking with the ability to drive their ideas into technology.

  • NLP background or experience.

  • Proactivity, initiative, positive attitude, ability to solve problems creatively.

  • Excellent speaking and writing skills in English.

 

Nice to have

  • Enterprise experience in this area.

  • Experience with both supervised and unsupervised learning.

  • Data warehousing and ETL.

  • Knowledge of ML platforms such as Azure ML, Google Tensor, IBM Watson.

  • Experience in statistical analysis and respective tools (e.g. SPSS).

  • International publications in this area.


 

Technical Program Manager (Seattle or Lisbon)

We are looking for a Program Manager with an NLP background reporting to the Program Manager Lead in Lisbon. He/she will be responsible for helping managing customer deliverables, defining data quality, working with the Engineering team on the platform features, validating quality against customer requirements and data reporting and analytics. Scripting and basic programming skills will be required.

Required skills:

  • BA or Masters in Linguistics or equivalent. Masters in Computational Linguistics is a plus.

  • Experience in project management: gathering requirements; planning; resource allocation; communication with developers.

  • Technical Skills: Python; Perl; Regular Expressions; SQL; Excel; Microsoft Office Suite;

  • Proficiency in more than 2 languages.

  • Proactivity, initiative, positive attitude, ability to solve problems creatively.

  • Excellent speaking and writing skills in English.

  • Good communication skills and ability to work with ambiguity.

  • Able to work with flexible hours when the interactions with Seattle headquarters is needed or with overseas customers and partners.

 

Please submit your resume to hr@definedcrowd.com. We are looking forward to hearing from you.

Positions opened until filled. Date: 09/10/2016

 

Back  Top

6-3(2016-10-20) Two Master research internships at LIMSI - CNRS, Orsay, France

Two Master research internships (with follow-up PhD scholarship) at LIMSI - CNRS, Orsay, France
Unsupervised Multimodal Character Identification in TV Series and Movies

Keywords : deep learning, speech processing, natural language processing, computer vision

Automatic character identification in multimedia videos is an extensive and challenging problem. Person identities can serve as foundation and building block for many higher level video analysis tasks, for example semantic indexing, search and retrieval, interaction analysis and video summarization. The goal of this project is to exploit textual, audio and video information to automatically identify characters in TV series and movies without requiring any manual annotation for training character models. A fully automatic and unsupervised approach is especially appealing when considering the huge amount of available multimedia data (and its growth rate). Text, audio and video provide complementary cues to the identity of a person, and thus allow to better identify a person than from either modality alone.

In this context, LIMSI (www.limsi.fr) proposes two projects, focusing on two different aspects of this multimodal problem. Depending on the outcome of the internship, both projects may lead to a PhD scholarship (one funding is already secured).

Project 1 ? natural language processing + speech processing

speaker A  ? 'Nice to meet you, I am Leonard, and this is Sheldon. We live across the hall.'
speaker B ? 'Oh. Hi. I?m Penny.'

speaker A ? 'Sheldon, what the hell are you doing?'
speaker C ? I am not quite sure yet. I think I am on to something?

Just looking at these two short conversations, a human can easily infer that 'speaker A' is actually 'Leonard', 'speaker B' is Penny and 'speaker C' is Sheldon. The objective of this project is to combine natural language processing and speech processing to do the same automatically. Building blocks include automatic speech transcription, named entity detection, classification of names (first, second or third person) and speaker diarization. Preliminary works in this direction have already been published in [Bredin 2014] and [Haurilet 2016]

[Bredin 2014] Hervé Bredin, Antoine Laurent, Achintya Sarkar, Viet-Bac Le, Sophie Rosset, Claude Barras. Person Instance Graphs for Named Speaker Identification in TV Broadcast. Odyssey 2014, The Speaker and Language Recognition Workshop.
[Haurilet 2016] Monica-Laura Haurilet, Makarand Tapaswi, Ziad Al-Halah, Rainer Stiefelhagen. Naming TV Characters by Watching and Analyzing Dialogs. WACV 2016. IEEE Winter Conference on Applications of Computer Vision.

Project 2 ? speech processing + computer vision

This project aims at improving (acoustic) speaker diarization using the visual modality. Indeed, it was shown in a recent paper [Bredin 2016] that recent advances in deep learning for computer vision led to very reliable face clustering performance ? whereas speaker diarization is very bad at processing TV series and movies (mostly because current state of the art has not been designed to process this kind of content).

The first task is to design deep learning approaches (based on recurrent neural networks) to address talking-face detection (e.g. deciding, among all visible people, which one is currently speaking) by combining the audio and visual (e.g. lip motion) streams. The second task is to combine talking-face detection and face clustering to guide and improve speaker diarization (i.e. who speaks when?). Read [Bredin 2016] for more information on this kind of approach.

[Bredin 2016] Hervé Bredin, Grégory Gelly. Improving speaker diarization of TV series using talking-face detection and clustering. ACM Multimedia 2016, 24th ACM International Conference on Multimedia.

Profile: Master student in machine learning (experience in natural language processing, computer vision and/or speech processing is appreciated)
Location: LIMSI - CNRS, Orsay, France
Duration: 5/6 months
Salary: according to current regulations
Contact: Hervé Bredin (bredin@limsi.fr) with CV + cover letter + reference letter(s)

Back  Top

6-4(2015-10-23) Vacancy for a one-year post-doctoral researcher position at the Radboud University, Nijmegen, The Netherlands

Vacancy for a one-year post-doctoral researcher position
           at the Radboud University, Nijmegen, The Netherlands
     *** Computational modelling of human spoken-word recognition  ***

As a post-doctoral researcher, you will join the NWO-Vidi funded project
'Ignoring the merry in marry: The effect of individual differences in
attention and proficiency on non-native spoken-word recognition in noise'
headed by Dr. Odette Scharenborg. This project investigates the effect of
noise on non-native spoken-word recognition using a range of tasks tapping
into different processes underlying human spoken-word recognition, and the
effect of individual differences in attention and proficiency on
non-native spoken-word recognition in noise.

You will conduct research on or related to the computational modelling of
human spoken-word recognition. Research will focus on determining the best
method for the automatic classification of speech segments, building a
computational model of non-native human spoken-word recognition using Deep
Neural Networks, running simulations and comparing the model?s output with
existing human data. You will communicate your findings through papers in
peer-reviewed research journals and at international conferences.

What we expect:
- You hold a PhD in artificial intelligence, computer science,
computational (psycho)linguistics, or a related discipline
- You have a good knowledge of speech and human and/or automatic speech
processing
- You have experience with computational modelling
- You have experience with Deep Neural Networks
- You have a good command of spoken and written English
- You have multiple journal publications, two of which as a first author
- You are a team player who enjoys working with people from different
backgrounds

More information:
Odette Scharenborg
O.Scharenborg@let.ru.nl
http://odettescharenborg.ruhosting.nl

How to apply:
Please note: Only job applications uploaded via the university website are
taken into consideration. The job vacancy will soon become available via
http://www.ru.nl/werken/alle-vacatures/

The application should consists of:
-Motivation letter
-CV
-List of publications
-Names of two referents

Closing date: Sunday 20 November 2016.

Starting date: The preferred starting date is 1 January 2017.

Back  Top

6-5(2016-11-02) PhD stipends at The Centre for Acoustic Signal Processing Research, Aalborg University, Denmark

PhD stipends at The Centre for Acoustic Signal Processing Research, Aalborg University, Denmark

 

The Centre for Acoustic Signal Processing Research (CASPR) will have a number of fully funded PhD stipends available in 2017.

 

We are looking for highly motivated, independent, and outstanding students that desire to do a successful 3-year PhD programme at Aalborg University. The ideal candidates must have strong expertise in one or more of the following disciplines: statistical signal processing, auditory perception, machine learning, information theory, or estimation theory. Good English verbal and written skills are a must. Excellent undergraduate and master degree grades are desired.

 

PhD positions in Denmark are fully funded, i.e. no tuition fees, and come with a salary. The salary is subject to a pay grade system based on prior working experience since completing your undergraduate degree. The yearly gross salary is in the range 41.500 ? 50.100 Euros.

 

You may obtain further information about the PhD stipends from Associate Professor Jan Østergaard (jo@es.aau.dk), Associate Professor Zheng-Hua Tan (zt@es.aau.dk), or Professor Jesper Jensen (jje@es.aau.dk), CASPR, Aalborg University, concerning the scientific aspects of the stipends.

 

Webpage for the positions: http://caspr.es.aau.dk/open-positions/

Back  Top

6-6(2016-11-09) Internship on Search Engine Development at ELDA in Paris (France),

ELDA is opening an internship on Search Engine Development in Paris (France), starting on January 2017 for 6 months.

Profile

  • MSc. in Computer Science;
  • Basic knowledge in data structures and algorithms;
  • Basic knowledge in Web applications architecture;
  • Python and / or JavaScript language skills;
  • Technical English
  • Hands-on knowledge of a database system (ideally PostgreSQL);
  • Knowledge of a search engine (Solr, Elasticsearch, Lucene) will be appreciated.

Duties

In the software development team at ELDA and under the supervision of an Engineer specialised in Natural Language Processing and Web Application Development, you will participate in the following tasks:

  • produce a state-of-the-art overview on most powerful research engines that are currently available, such as Solr, Elasticsearch, or the full-text search features provided by current database systems, such as PostgreSQL.
  • help specifying the full-text search needs for the LREC conference proceedings;
  • help choosing the technical solution that best fits the context;
  • participate in the design of a database structure (data schema) for the contents of the LREC proceedings web sites;
  • harvest the LREC proceedings sites and populate the aforementioned database with all the relevant information extracted from the contents of the LREC proceedings sites;
  • implement a search solution that is exhaustive, robust and works throughout all the LREC proceedings.

You will also participate in the regular meetings of the software development team at ELDA.

Application

This 6-month internship is based in Paris 13th district (Les Gobelins).
It should start in January 2017.

Applicants should email a cover letter addressing the points listed above together with a curriculum vitæ to Vladimir Popescu (vladimir@elda.org).

The internship is subject to a monthly allowance, commensurate with the candidate's educational qualifications and according to the French laws.

http://www.elra.info/en/opportunities/internships/

-*-*-*-*-*-*-*-

ELDA ('Evaluations and Language resources Distribution Agency') is a key player of the Human Language Technology domain. Operational body of ELRA, the European Language Resources Association, a European not-for-profit organisation promoting language resources in a European context, ELDA is in charge of executing a number of tasks on behalf of ELRA, including both the distribution and the production of Language Resources. Within the production projects, ELDA is often in the position of coordinating resource annotations, as well as performing quality control of these annotations.

Thus, ELDA supports the organization of ELRA?s biannual international scientific conference, LREC, the Language Resources and Evaluation Conference, which brings together, an increasing number (1200+) of top-tier researchers from all over the world, who submit and present scientific research articles.

In order to ease the navigation in this thesaurus of scientific articles, ELDA has set up a set of Web sites gathering the articles themselves, as well as the corresponding metadata (authors, titles, article abstracts, etc.).

In this context, ELDA wants to consolidate these sites, allowing the users to rely on a robust and exhaustive search throughout the article collections for all the editions of the LREC conference.

Back  Top

6-7(2016-11-10) Offre de thèse LIMSI, Orsay, France

Sujet: offre de thèse en CIFRE, agents conversationnels

Sujet de thèse : Génération pour un agent conversationnel adaptatif
Directrices de thèse : Sophie Rosset et Anne-Laure Ligozat

Dans le cadre du projet Nanolifi (un agent conversationnel au service
de la ville, Le LIMSI recrute un.e doctorant.e en informatique.

L'objectif principal de la thèse est de modéliser des activités
dialogiques dans un cadre conversationnel humain-robot tout
venant. L'approche portera sur l'utilisation de méthodes à base
d'apprentissage non supervisé, par exemple de type zero-shot learning
ou par renforcement. La représentation sémantique et dialogique
manipulée devra intégrer des informations de type linguistique telles
que fournies par des outils de traitement automatique des langues
disponibles. La mesure d'évaluation devra prendre en compte la notion
d'engagement tant de l'utilisateur que du système. La base de
connaissances servant de support au cadre applicatif sera constitué à
partir des documents contenant l'information sur la ville.

Description complète
======================
http://www.adum.fr/as/ed/voirpropositionT.pl?site=PSaclay&matricule_prop=13069

Pré-requis
===========
* Master 2 en Informatique (ou équivalent), avec au moins une spécialité en
  * Apprentissage
  * Traitement automatique de la langue
  * Traitement de la parole


Informations pratiques
=======================

* Début de thèse : début 2017
* inscription EDSTIC Univ. Paris Saclay
* Financement CIFRE

Candidature
=============

Dossier de candidature : lettre de motivation, CV, résultats de Licence et master,
Copie à Sophie Rosset (rosset@limsi.fr), Anne-Laure Ligozat (annlor@limsi.fr)

Back  Top

6-8(2016-11-19) Stage de master recherche dans le cadre du projet ANR Vocadom, Grenoble, France

L'équipe GETALP du LIG propose un stage de master recherche dans le cadre du projet ANR Vocadom.

-- 

2016-2017 M2R INFO GETALP

 
Stage M2R Informatique

Commande vocale à large vocabulaire pour la domotique par apprentissage profond et “word embedding”
Large vocabulary voice command recognition for automation of the home using deep learning and word embedding

Responsable(s) : Michel Vacher, François Portet, Benjamin Lecouteux
Mots-clés : automatic speech recognition, Deep Neural Network, Word Embedding, Home Automation, reconnaissance automatique de Parole, apprentissage profond, représentation vectorielle des mots, domotique
Durée du stage : 5 mois, possibilité de continuer en thèse (financement ANR)
Lieu du stage : équipe GETALP du LIG, bâtiment IMAG, Domaine universitaire de Grenoble
Contexte du stage
Dans le cadre du projet VOCADOM financé par l’ANR, nous cherchons à concevoir un système domotique contrôlé à distance par la voix et qui peut être utilisé dans des conditions réelles (bruit, présence de plusieurs personnes). Outre les problèmes de bruit, l’un des défis à relever est de constamment s’adapter aux utilisateurs finaux en prenant en compte de toute l’information disponible (capteurs audio et capteurs domotiques). L’une de ces adaptions consiste à comprendre l’énoncé de parole des utilisateurs sans
avoir de contrainte sur le vocabulaire et la syntaxe que les utilisateurs utiliseront. Par exemple, l’ordre standard pour allumer la lumière est “Nestor allume la lumière”, mais il pourrait être “Nestor, on n’y voit rien” ou «S’il te plaît Nestor allume la lumière “.
Sujet de stage
Le but de ce stage est de permettre cette association “énoncé vocal” <-> “ordre domotique” avec le minimum de contraintes lexicales et syntaxiques. Des études précédentes ont utilisé une distance phonétique de Levenshtein qui est bien adaptée lorsque les prononciations sont proches. Par conséquent, nous proposons d’opérer non seulement au niveau phonétique mais également au niveau lexical en travaillant sur le treillis de sortie du décodeur du système de reconnaissance vocale.
Le travail proposé consistera en une première étape pour faire une étude bibliographique liées à ce domaine. La deuxième étape consistera à développer une méthode pour explorer le réseau à la sortie du décodeur KALDI ASR. Celle-ci utilisera un modèle externe de mots basés sur le Word Embedding (acquis par apprentissage profond) afin de pouvoir associer un score de proximité entre un énoncé connu et un nouvel énoncé. Le système développé sera ensuite évalué sur un corpus synthétique (construit par synthèse vocale) et sur le corpus enregistré dans l’habitat intelligent du laboratoire.
L’étudiant recruté pourra bénéficier des études antérieures, dans lesquelles les enregistrements ont été faits dans un véritable appartement avec plusieurs chambres, chacune équipée de micros. Les participants y ont joué des scénarios réalistes de la vie quotidienne (AVQ) (dormir, se lever, faire sa toilette, préparer un repas, déjeuner, se détendre, sortir …). Cela nous a permis de recueillir un corpus réaliste contenant les commandes vocales enregistrées dans diverses conditions.
Compétences souhaitées : langage C++, bonne maîtrise de la langue française

Références
M. Vacher, S. Caffiau, F. Portet, B. Meillon, C. Roux, E. Elias, B. Lecouteux, P. Chahuara (2015). “Evaluation of a context-aware voice interface for Ambient Assisted Living: qualitative user study vs. quantitative system evaluation“, ACM – Transactions on Speech and Language Processing, Association for Computing Machinery, 2015, Special Issue on Speech and Language Processing for AT (Part 3), 7 (issue 2), pp.5:1-5:36.
D. Povey, L. Burget, M. Agarwal, P. Akyazi, F. Kai, A. Ghoshal, O. Glembek, N. Goel, M. Karafiát, A. Rastrow, R. C. Rose, P. Schwarz, and S. Thomas (2011). “The subspace gaussian mixture model—a structured model for speech recognition“, Computer Speech & Language, vol. 25, no. 2, pp. 404 – 439.
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). “A neural probabilistic language model“. Journal of machine learning research, 3(Feb), 1137-1155.

-------------------------------------------------------------------------

Michel VACHER Ingénieur de Recherche CNRS - HDR - Directeur adjoint Institut Carnot LSI http://www.carnot-lsi.com - Laboratoire d'Informatique de Grenoble - LIG - UMR 5217 Groupe d'Etude en Traduction/Traitement Automatique des Langues et de la Parole 700 avenue Centrale - Bâtiment IMAG - Bureau 330 Domaine Universitaire - 38401 St Martin d'Hères URL labo : http://www.liglab.fr/ perso : http://lig-membres.imag.fr/vacher/ Mel. : Michel.Vacher@imag.fr | tel: 0033 (0)4 57 42 14 38 Adresse postale : LIG - Bâtiment IMAG - CS 40700 - 38058 GRENOBLE CEDEX 9

Back  Top

6-9(2016-11-20) 3 subjets Master Recherche 2 in NLP at Loria Nancy for 2016-2017

 3 subjets Master Recherche 2 in NLP at Loria Nancy for 2016-2017:



Subject 1

Title: Data selection for the training of deep neural networks in the framework of automatic speech recognition

Supervisor: Irina Illina

Team and lab: MultiSpeech

Contact: illina@loria.fr

Co-supervisor: Dominique Fohr

Team and lab: MultiSpeech

Contact : dominique.fohr@loria.fr

 

Motivations and context

More and more audio/video appear on Internet each day. About 300 hours of multimedia are uploaded per minute. Nobody is able to go thought this quantity of data. In these multimedia sources, audio data represents a very important part. Classical approach for spoken content retrieval from audio documents is an automatic speech recognition followed by a text retrieval. In this internship, we will focus on the speech recognition system.

One of the important modules of an automatic speech recognition system is the acoustic model: it models the sounds of speech, mainly phonemes. Currently, the best performing models are based on deep neural networks. These models are trained on a very large amount of audio data, because the models contain millions of parameters to estimate. For training acoustic models, it is necessary to have audio documents for which the exact text transcript is available (supervised training).

The Multi-Genre Broadcast (MGB) Challenge is an evaluation of speech recognition systems, using TV recordings in English or Arabic. The speech data is broad and multi-genre, spanning the whole range of TV output, and represents a challenging task for speech technology. Speech Data cover the multiple genres in broadcast TV, categorized in terms of 8 genres: advice, children?s, comedy, competition, documentary, drama, events and news.

The problem with MGB data challenge is that the exact transcription of audio documents is not available. Only subtitles of TV recordings are given. These subtitles are sometimes far from what is actually pronounced: some words may be omitted, hesitations are rarely transcribed and some sentences are reformulated.

In this internship, we will focus on the problem of data selection for efficient acoustic model training.

Objectives

A subtitle is composed of a text, a start time appearance (timecode) on the screen and an end time of appearance. These start and end times are given relative to the beginning of the program. It is easy to associate subtitle and the corresponding audio segment.

We have at our disposal a very large audio corpus with the corresponding subtitles and we want to develop data selection methods for obtaining high performance acoustic models. That is to say with a word error rate as small as possible. If we use all the training data, the errors in the subtitles will lead to poor quality acoustic models and therefore a high recognition word error rate.

We propose to use a deep neural network (DNN) to classify the segments into two categories: audio segments corresponding to subtitles and audio segments not corresponding to subtitles. The student will analyze what information, acoustic and/or linguistic, is relevant to this selection task and can be used as input of the DNN.

The student will validate the proposed approaches using the automatic transcription system of TV broadcast developed in our team.

Required skills background in statistics, natural language processing and computer program skills (Perl, Python).

Localization and contacts: Loria laboratory, Speech team, Nancy, France

irina.illina@loria.fr dominique.fohr@loria.fr

Candidates should email a detailed CV with diploma

===========================================================================

Subject 2

Title:  Domain adaptation of neural network language model for speech recognition

Supervisor: Irina Illina

Team and lab: MultiSpeech

Contact: illina@loria.fr

Co-supervisor: Dominique Fohr

Team and lab: MultiSpeech

Contact : dominique.fohr@loria.fr

Motivation and Context

Language models (LMs) play a key role in modern automatic speech recognition systems and ensure that the output respects the pattern of the language. In the state-of-the-art systems, the language model is a combination of n-gram LMs and neural network LMs because they are complementary. These LM are trained on huge text corpora.

The language models are trained on a corpus of varied texts, which provides average performance on all types of data. However, document content is generally heavily influenced by the domain, which can include topic, genre (documentary, news, etc.) and speaking style. It has been shown that domain adaptation of LMs to small amounts of matched in-domain text data provide significant improvements in both perplexity and word error rate. The objective of the internship is to adapt a neural networks based language model to the domain of an audio document to be recognized. For this, we will use a small specific text corpus.

The Multi-Genre Broadcast (MGB) Challenge is an evaluation campaign of speech recognition systems, using TV recordings in English or Arabic. The speech data is broad and multi-genre, spanning the whole range of TV output, and represents a challenging task for speech technology. Speech data covers the multiple genres in broadcast TV, categorized in terms of 8 genres: advice, children?s, comedy, competition, documentary, drama, events and news.

During the internship, the student will develop LM adaptation methods in the context of the MGB data.

Goals and Objectives

Neural network LM adaptation can be categorized as either feature-based or model-based.

In the feature-based adaptation, the input of the neural network is augmented with auxiliary features, which model domain, topic information, etc. However, these auxiliary features must be learn during the training of the LM model and thus require whole model retraining.

Model-based adaptation consists in adding complementary layers and training these layers with domain-specific adaptation data. An advantage of this method is that full retraining is not necessary. Another model-based adaptation method is fine-tuning: after training the model with the whole training data, the model is tuned with the target domain data. The downside of this approach is the lack of the optimization objective.

During the internship, the student will perform a bibliographic study on model adaptation approaches. Depending on the pros and cons of these approaches, we will propose a method specific to MGB data. This method may include changing the architecture of the neural network.

The student will validate the proposed approaches using the automatic transcription system of radio broadcast developed in our team.

Required skills background in statistics, natural language processing and computer program skills (Perl, Python).

Localization and contacts: Loria laboratory, Speech team, Nancy, France

irina.illina@loria.fr dominique.fohr@loria.fr

Candidates should email a detailed CV with diploma

===========================================================================

Subject 3

Title: Using Wikipedia to search for proper names relevant to the audio document transcription

Supervisor: Irina Illina

Team and lab: MultiSpeech

Contact: illina@loria.fr

Co-supervisor: Dominique Fohr

Team and lab: MultiSpeech

Contact : dominique.fohr@loria.fr

Motivation and Context

More and more audio/video appear on Internet each day. About 300 hours of multimedia are uploaded per minute. Nobody is able to go thought this quantity of data. In these multimedia sources, audio data represents a very important part. Classical approach for spoken content retrieval from audio documents is an automatic speech recognition followed by a text retrieval.

An automatic speech recognition system uses a lexicon containing the most frequent words of the language and only the words of the lexicon can be recognized by the system. New proper names (PNs) appear constantly, requiring dynamic updates of the lexicons used by the speech recognition system. These PNs evolve over time and no vocabulary will ever contains all existing PNs. These missing proper names can be very important for the understanding of the test document.

In this study, we will focus on the problem of proper names in automatic recognition systems. The problem is to find relevant proper names for the audio document we want to transcribe. For this task, we will use a remarkable source of information: Wikipedia, free online encyclopedia, the largest and most popular reference work on the internet. Wikipedia contains a huge number of proper names.

Goals and Objectives

We assume that in an audio document to transcribe we have missing proper names, i.e. proper names that are pronounced in the audio document but that are not in the lexicon of the automatic speech recognition system; these proper names cannot be recognized (out-of-vocabulary proper names, OOV PNs)

The goal of this internship is to find a list of relevant OOV PNs that correspond to an audio document. We will use Wikipedia as a source of potential proper names.

Assuming that we have an approximate transcription of the audio document and a Wikipedia dump, two main points will be addressed:

  • How to represent a Wikipedia page and in which space? One possibility is to use word embeddings (for instance Mikolov?s word2vec ).
  • Using the previous representation, how to select relevant pages according to the approximate transcription of the audio document? The automatic speech recognition systems of broadcast news have a 10-20% word error rate. If we project documents in a continuous space, different distances can be studied.

In a first step, we can consider Wikipedia page as a simple text page. In a second step, the student should use the structure of the Wikipedia pages (page links, tables, headings, infobox).

During the internship, the student will investigate methodologies based on deep neural networks.

The student will validate the proposed approaches using the automatic transcription system of radio broadcast developed in our team.

Required skills background in statistics, natural language processing and computer program skills (Perl, Python).

Localization and contacts: Loria laboratory, Speech team, Nancy, France

irina.illina@loria.fr dominique.fohr@loria.fr

Candidates should email a detailed CV with diploma

===========================================================================

 
--
Dominique Fohr
Research scientist CNRS
LORIA-INRIA MultiSpeech Group
Address: Bâtiment LORIA, BP 239, F54506 Vandoeuvre, France
Email: dominique.fohr@loria.fr    Tel: (+33) 3.83.59.20.27
Back  Top

6-10(2016-12-02) Tenure senior research position at INRIA Bordeaux France
Flowers Lab at Inria (Bordeaux, France)
Deadline: 16th december
 
We are searching highly qualilified candidates for a tenure senior research position (full time research, no teaching mandatory). 
Candidates should have an outstanding academic track record in one or two of the following domains:
 
1) Computational modelling of cognitive development, including the following research topics and methods:
   - Models of exploration and active learning in humans and animals
   - Models of human reinforcement learning and decision making
   - Bayesian or neuronal models of development
   - Models of autonomous lifelong learning
   - Models of tool learning 
   - Sensorimotor, language and social development
   - Strong experience of collaborations with developmental psychologists or neuroscientists
 
2) Lifelong autonomous machine learning and artificial intelligence, including:
   - Unsupervised deep reinforcement learning
   - Intrinsic motivation 
   - Developmental learning, curriculum learning
   - Contextual bandit algorithms
   - Multitask and transfer learning
   - Hierarchical learning
   - Strong experience in benchmarking with robotic or virtual world setups
 
As the Flowers lab domains of application are robotics/HRI/HCI and educational technologies, experience in one of these two
domains would be a clear asset. 
 
Experience in writing successful grant proposals will also be considered positively.
 
Candidates should have a strong postdoc experience after their PhD, or may already be at the level of occupying a research position
in a university or research organization.  
 
The Flowers Lab: developmental robotics and lifelong multitask machine learning
===================================================================
 
The Flowers Lab, headed by Pierre-Yves Oudeyer, gathers a team of ~20 members and has been one of the pioneers of developmental robotics and lifelong machine learning
and artificial intelligence in the last decade, in particular through developping models of intrinsically motivated learning of repertoires of skills that have both contributed 
to advance understanding of human curiosity and development, and to advance incremental online multitask machine learning techniques in difficult high-dimensional 
robotic spaces
 
This work in the Flowers lab are conducted in the context of large international projects (e.g. ERC grant, European projects 3rdHand and DREAM, HFSP project Neurocuriosity),
with interdisicplinary collaborations with other labs in neuroscience, psychology, machine learning and robotics. The successful candidates would be directly
involved in these international collaborations. 
 
The Flowers Lab has is also developping applications of these concepts and techniques in the domain of educational technologies, including adaptive
intelligent tutoring systems (using bandit algorithms), educational robotics, and software that stimulate curiosity and learning in humans.
 
The Flowers lab has recently spin-off the Pollen Robotics startup company, and is involved in multiple collaborations with industrials through Inria's strong
support towards impacting both science and industry. 
 
Inria and EnstaParistech
===================
 
The lab is within Inria, which is a prestigious, and also the largest, public European research insitution focused on computer science, mathematics and their applications.
Inria's teams and researchers (> 2800 employees) have received prestigious awards, coordinate many international projects, and have created strong innovations now used in many
parts of industry. Inria research center in Bordeaux gathers around 300 researchers. 
The Flowers Lab is also associated to EnstaParisTech, which is a prestigious French engineering school (university).
 
Bordeaux 
========
 
The Flowers lab in Bordeaux is located in a great building on the border of one of the world most famous vineyard, and 10mn by tram from Bordeaux town center
(and 2 hours from Paris through high-speed trains): https://www.inria.fr/en/centre/bordeaux
 
Web
===
 
Flowers web site: https://flowers.inria.fr
Lifelong intrinsically motivated learning in robots: http://www.pyoudeyer.com/active-learning-and-artificial-curiosity-in-robots/
 
How to apply
===========
 
CVs and letters of motivation should be sent to Pierre-Yves Oudeyer (pierre-yves.oudeyer@inria.fr) before 16th december.
Then, a pre-selection will be made and a full application with a detailed research statement and program will have to be submitted early february.
The successful candidate would begin to work between september and december 2017.
 
Pierre-Yves Oudeyer 
Research director, Inria
Head of Flowers Lab
Inria and Ensta ParisTech
Back  Top

6-11(2016-12-19) Postdoc at LIMSI, Orsay, France


 LIMSI offers a one-year postdoctoral position on unsupervised identification of characters in TV series.


 --- Keywords ---
Deep learning, speech processing, natural language processing, computer vision

--- Project summary ---
Automatic character identification in multimedia videos is an extensive and challenging problem. Person identification serves as foundation and building block for many higher level video analysis tasks, for example semantic indexing, search and retrieval, interaction analysis and video summarization.

The goal of this project is to exploit textual, audio and video information to automatically identify characters in TV series and movies without requiring any manual annotation for training character models. A fully automatic and unsupervised approach is especially appealing when considering the huge amount and growth of available multimedia data. Text, audio and video provide complementary cues to the identity of a person, and thus allow to better identify a person than from either modality alone.

To this end, we will address three main research questions: unsupervised clustering of speech turns (i.e. speaker diarization) and face tracks in order to group similar tracks of the same person without prior labels or models; unsupervised identification by propagation of automatically generated weak labels from various sources of information (such as subtitles and speech transcripts); and multimodal fusion of acoustic, visual and textual cues at various levels of the identification pipeline.

While there exist many generic approaches to unsupervised clustering , they are not adapted to heterogeneous audiovisual data (face tracks vs. speech turns) and do not perform as well on challenging TV series and movies content as they do on other controlled data. Our general approach is therefore to first over­cluster the data and make sure that clusters remain pure , before assigning names to these clusters in a second step. On top of domain specific improvements for either modality alone, we expect joint multimodal clustering to take advantage of three modalities and improve clustering performance over each modality alone.

Then, unsupervised identification aims at assigning character names to clusters in a completely automatic manner (i.e. using only available information already present in the speech and video ). In TV series and movies, character names are usually introduced and re­iterated throughout the video. We will detect and use addresser/­addressee relationships in both speech transcripts (using named entity detection techniques) and video (using mouth movements, viewing direction and focus of attention of faces). This allows to assign names to some clusters, learn discriminative models and assign names to the remaining clusters.

For evaluation, we will extend and further annotate a corpus of four TV series (57 episodes) and one movie series (8 movies), a total of about 50 hours of video. This diverse data covers different filming styles, type of stories and challenges contained in both video and audio. We will evaluate the different steps of this project on this corpus, and also make our annotations publicly available for other researchers working in the field.
---


--- Information ---
PhD in machine learning (experience in natural language processing, computer vision and/or speech processing is appreciated)
Location: LIMSI - CNRS, Orsay, France
Duration: 12 months, starting date to be defined with the selected candidate
Contact: Hervé Bredin (bredin@limsi.fr)

Back  Top

6-12(2017-01-12) Internship INA Paris: Segmentation Parole/Musique de documents multimédias à l’aide de réseaux de neurones profonds

Segmentation Parole/Musique de documents multimédias à

l’aide de réseaux de neurones profonds

Stage de fin d’études d’Ingénieur ou de Master 2 – 2016-2017

Mots clés: Deep Learning, Segmentation Audio, Machine Learning, Music Information

Retrieval, Open Data

Contexte

Les missions de l’institut national de l’audiovisuel (Ina) consistent à archiver et à valoriser la

mémoire audio-visuelle française (radio, télévision et médias Web). A ce jour, plus de 15 millions

d’heures de documents télé et radio sont conservés, dont 1,5 millions d’heures numérisées. En

raison de la masse de données considérée, il n’est techniquement pas possible de procéder à une

description manuelle, systématique et détaillée de l’ensemble des archives. Il est donc nécessaire

d’utiliser des techniques d’analyse automatique du contenu pour optimiser l’exploitation de cette

masse de données.

Objectifs du stage

La segmentation Parole/Musique (SPM) consiste à segmenter un flux audio en zones homogènes de

parole et de musique. Cette étape est nécessaire en amont de tâches d’indexation haut niveau, telles

que la reconnaissance de la parole, du locuteur, du morceau ou du genre musical. Pour ces

différentes raisons, cette tâche a suscité beaucoup d’intérêts au sein des communautés de traitement

de la parole, ainsi qu’en indexation musicale.

L’utilisation de systèmes de SPM à l’Ina répond à trois cas d’usage principaux. En premier lieu, il

s’agit de localiser rapidement les zones d’intérêt au sein des médias, pour fluidifier les processus de

description des archives, réalisés manuellement par des documentalistes. La description manuelle

des archives est coûteuse, et réalisée avec un niveau de détail variable: les journaux télévisés étant

décrits plus finement que les fonds radio anciens. Les systèmes SPM peuvent ainsi permettre de

faciliter la navigation dans des fonds d’archives sous-documentés. Le dernier cas d’usage

correspond à la segmentation en morceaux de musique: consistant à détecter le début et la fin des

morceaux. Cette tâche permet de mesurer la durée des extraits musicaux présents dans les archives,

et ainsi rémunérer les sociétés d’auteurs concernées lorsque les archives sont commercialisées.

A ce jour, un certain nombre de situations restent difficiles pour les systèmes SMS. Il s’agit

notamment la différentiation entre voix parlée et voix chantée, notament dans certains styles

musicaux où les propriétés spectrales de la voix chantée et parlée sont similaires. Une autre

difficulté rencontrée est liée aux cas où la parole est superposée à la musique, ce qui arrive assez

fréquemment dans les émissions radio et télé. Une autre difficulté rencontrée par les systèmes

actuels est la liée à la finesse de la segmentation temporelle, généralement de l’ordre de la seconde.

L’objectif du stage consiste à concevoir des systèmes basés sur l’utilisation de réseaux de neurones

profonds pour la segmentation parole/musique d’archives audio-visuelles. Les méthodes proposées

devront prendre en charge la diversité des archives de l’Ina (archives radio des années 1930 à nos

jours). Une partie du stage sera consacrée à l’analyse des corpus existants, et à la constitution d’un

corpus annoté (interprète, morceau, genre, locuteur, ...) permettant d’avoir un maximum de contrôle

sur l’ensemble des paramètres testés lors des évaluations. L’autre partie du stage sera consacré à la

mise au point d’architectures basées sur des réseaux de neurones profonds pour la SPM, qui sera

réalisée dans la continuité des travaux en cours utilisant des réseaux de neurones convolutionnels.

Le langage de programmation utilisé dans le cadre de ce stage sera Python. Le stagiaire aura accès

aux ressources de calcul de l’Ina (cluster et serveurs GPU).

Conditions du stage

Le stage se déroulera sur une période de 6 mois, au sein de l’équipe recherche de l’Ina. Il aura lieu

sur le site Bry2, situé au 18 Avenue des frères Lumière, 94366 Bry-sur-Marne. Le stagiaire sera

encadré par David Doukhan (ddoukhan@ina.fr) et Jean Carrive (jcarrive@ina.fr), et percevra une

rémunération mensuelle de 527,75 euros/mois.

Bibliographie

Jimena, R. L., Hennequin, R., & Moussallam, M. (2015). Detection and characterization of singing

voice using deep neural networks.

Peeters, G. (2007). A generic system for audio indexing: Application to speech/music segmentation

and music genre recognition. In Proc. DAFX (Vol. 7, pp. 205-212).

Pinto, N., Doukhan, D., DiCarlo, J. J., & Cox, D. D. (2009). A high-throughput screening approach

to discovering good forms of biologically inspired visual representation. PLoS Comput Biol, 5(11),

e1000579.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.

Back  Top

6-13(2017-01-15) Postdoctoral Positions in Linguistics/Sciences du Langage, LPL, CNRS, Aix-Marseille, France

 

Call for Postdoctoral Positions in Linguistics/Sciences du Langage

Institution: Laboratoire Parole et Langage (CNRS, Aix-Marseille Université)

Location: Aix-en-Provence, France

No. of positions: 2

Duration: 18-24 months

Application deadline: 1 March 2017

The Laboratoire Parole et Langage (CNRS, Aix-Marseille Université) invites applications for two postdoctoral positions to be supported by a grant from the A*MIDEX Foundation. The funded project explores the relationship between social variables and linguistic representation, and seeks to develop and extend an explicit cognitive model that accounts for the effects of socio-indexical cues on production and perception. The empirical basis for the project involves a combination of experimentation (production, perception, ERP), corpus analysis, and computational modeling. The selected applicants will engage with an interdisciplinary team from the Institute of Language, Communication, and the Brain which includes experts in neuroscience, psychology, computer science, and mathematics, among other areas.

A special emphasis of the project concerns issues in prosody and intonation, so an interest in, or experience with, prosody is highly desirable. In addition, the ideal applicant will have experience in one or more of the following areas:

 Sociolinguistics (quantitative)

 Machine learning and statistical modeling (esp. sound structure/phonetics)

 Design and analysis of speech corpora

 Prosody and meaning (esp. information structure)

Knowledge of French is not essential. Each postdoctoral appointment is for approximately 18-24 months depending on the starting date. Candidates must have completed all PhD requirements before the starting date. The starting date is flexible, though the position should be filled by 1 June 2017.

Applications should include (i) a cover letter that relates the applicants’ experience and interests to the project, (ii) a comprehensive CV, (iii) a PDF copy of all publications or a list of links where these can be accessed, and (iv) the names and contact information of at least two references.

Applications in French or English may be sent by email to Oriana Reid-Collins at oriana.reid-collins@lpl-aix.fr.

For further inquiries regarding the position or the project, please contact James Sneed German (principal investigator) at james.german@lpl-aix.fr.

Back  Top

6-14(2017-01-20) Postdoc in Mental Health, Affective Computing and Machine Learning at CMU, Pittsburgh, PA, USA

Postdoc in Mental Health, Affective Computing and Machine Learning

              Carnegie Mellon University, School of Computer Science

              University of Pittsburg, School of Medicine

 

*** US citizenship or a green card is required to be eligible for consideration. ***

 

The Multimodal Communication and Machine Learning Laboratory (MultiComp Lab) at Carnegie Mellon University is seeking creative and energetic applicants for a two-year postdoctoral position. This opportunity is part of NIMH-funded training program based at the University of Pittsburg School of Medicine. The position includes a competitive salary with full benefits and travel resources.

 

Using recent progress in machine learning and artificial intelligence, the postdoc will study patient’s multimodal behaviors (verbal, visual and vocal) during semi-structured clinical interviews to identify behavior markers of mental health disorders (e.g., depression, schizophrenia, suicidal ideation). The postdoc will work under the supervision of Dr. Louis-Philippe Morency (CMU MultiComp Lab’s director), in collaboration with clinicians and researchers at University of Pittsburgh’s Western Psychiatric Institute and Clinic.

 

The successful applicant will have an extensive research experience in automatic multimodal behavior analysis in the mental health domain, including facial and gesture analysis, acoustic signal processing, linguistic computation and multimodal machine learning.

 

Required

  • PhD in computer science or mental-health related field (at the time of hire)
  • Research experience in human behavior analysis, affective computing and machine learning
  • US citizenship or a green card is required to be eligible for consideration

 

Desired

  • Publications in top machine learning, speech processing and/or computer vision conferences and journals.
  • Research involving clinical patients with mental health disorders (e.g., depression, schizophrenia, suicidal ideation)
  • Experience mentoring graduate and undergraduate students

 

Job details

  • Preferred start date: May 1st, 2017 (negotiable)
  • Candidate will work under the supervision of Dr. Louis-Philippe Morency, MultiComp Lab’s director, at CMU Pittsburgh campus.
  • Research will be performed in collaboration with clinicians and researchers at University of Pittsburgh’s Western Psychiatric Institute and Clinic.
  • Competitive salary with full benefits and travel resources.

 

How to apply

  • Email applications should be sent to morency@cs.cmu.edu with the title “Postdoc application – NIMH program”
  • A single PDF file titled FirstNameLastName.pdfshould be attached to the email, including:
    • a brief cover letter (with expected date of availability),
    • a CV including list of publications and email addresses of 3 references,
    • two representative publications (including full citation information)

 

=====================

Louis-Philippe Morency

Assistant Professor

School of Computer Science

Carnegie Mellon University

Multimodal Commination and Machine Learning Laboratory

https://www.cs.cmu.edu/~morency/

 

Back  Top

6-15(2017-01-21) Internship at INRIA Bordeaux, France

Stage de 6 mois pour étudiants en Master2 à INRIA Bordeaux

 

Titre du Sujet de Stage :Analyse de la parole pour le diagnostic différentiel entre maladie de Parkinson et l'atrophie multisystématisée

 

Description :

La maladie de Parkinson (MP) et l'atrophie multisystématisée (AMS) sont des maladies neurodégénératives. La dernière appartient au groupe des troubles parkinsoniens atypiques et est responsable d?un pronostic péjoratif. Dans les premiers stades de la maladie, les symptômes de MP et AMS sont très similaires, surtout pour AMS-P où le syndrome parkinsonien prédomine. Le diagnostic différentiel entre AMP-P et MP peut être très difficile dans les stades précoces de la maladie, tandis que la certitude de diagnostic précoce est important pour le patient en raison du pronostic divergent. En effet, malgré des efforts récents, aucun marqueur objectif valide n'est actuellement disponible pour guider le clinicien dans ce diagnostic différentiel. La nécessité de ces marqueurs est donc très élevé dans la communauté de la neurologie, en particulier compte tenu de la gravité du pronostic de AMS-P.

Les troubles de la parole, communément appelés dysarthrie, sont un symptôme précoce commun aux deux maladies et d'origine différente. Notre approche consiste à utiliser la dysarthrie, grâce à un traitement numérique des enregistrements vocaux des patients, comme un vecteur pour distinguer entre MP et AMS-P dans les stades précoces de la maladie.

L'objectif du stage est d'utiliser des techniques connues de mesure de perturbation de la voix ainsi que des techniques récemment développées par l'équipe GeoStat d'Inria pour faire une étude expérimentale préliminaire sur le pouvoir discriminant de ces différentes mesures. Cette études se fera sur des bases de données médicales existantes.

Le stage déboucherait sur une bourse de thèse de doctorat dans le cadre d'une allocation ANR qui finance ce projet de recherche. Les partenaires cliniques de ce projet sont des centres du CHU-Bordeaux et du CHU-Toulouse de renommée internationale sur MP et AMS.

 

Responsable du stage :Dr. Khalid Daoudi, équipe GeoStat (khalid.daoudi@inria.fr).

Lieu du stage : INRIA- Bordeaux Sud Ouest (http://www.inria.fr/bordeaux). Bordeaux, France.

Durée du Stage : 6 mois

Rémunération : 500euros/mois

 

Connaissances requises : De bonnes connaissances en traitement de la parole/signal ainsi qu'en programmation C++ et Matlab sont nécessaires. Des connaissances en apprentissage statistique (Machine learning) seraient un grand plus.

 

Les candidatures doivent être adressées à khalid.daoudi@inria.fr

Back  Top

6-16(2017-01-22) Stage de 6 mois au LIA Avignon, France
Stage de 6 mois au LIA Avignon
 

Le stage que nous proposons se situe dans le cadre d?une collaboration entre la société MediaWen International et le Laboratoire d?Informatique d?Avignon (LIA). MediaWen propose des solutions en ligne pour le sous-titrage, la traduction et le doublage de vidéo sur le web. Une plate-forme de travail collaboratif inclut les différentes briques technologiques et permet d?accélérer ou d?automatiser les différents traitements.

Dans ce cadre, MediaWen et le LIA souhaitent explorer la faisabilité et l?intérêt d?une brique technologique autour de la détection automatique de la langue parlée. Les deux originalités majeures seront de pouvoir ajouter facilement une langue à partir d?un ensemble réduit d?exemples audio et de définir une stratégie interactive dans laquelle le critère à optimiser est le temps de travail de l?opérateur (à qualité de production identique).

L?objectif du stage proposé est de mettre en place cette brique en se basant sur la plate-forme ALIZE (développé en C++) qui a déjà donné lieu à plusieurs implémentations de systèmes de reconnaissance de la langue parlée.Une solution basée sur le paradigme des i-Vectors sera choisie. L?approche retenue sera dans un premier temps développée et testée en utilisant des données internes du laboratoire (notamment les données NIST) et en simulant les réponses de l?opérateur. Elle sera ensuite intégréedans les outils de MediaWen et testée sur les données correspondantes.

Une poursuite en thèse est envisageable selon le degré de la réussite de ce stage.

 

Le stagiaire sera encadré au LIA par Driss MATROUF (MCF-HDR) et Jean-François BONASTRE (Professeur). Il bénéficiera du soutien des spécialistes de MediaWen, pleinement associés au déroulé de ce stage.

 

 

Profil et Niveau :

Master en informatique, mathématiques ou traitement du signal. Un bon niveau en développement logiciel, dont la connaissance de C++, est requis.

Motivation, curiosité scientifique et rigueur seront des qualités demandées.

 

Durée :

5 à 6 mois (prolongation possible)

 

Rémunération :

~530 Euros/mois (indemnités légales pour un stage de niveau Master)

 
Contact:
driss.matrouf@univ-avignon.fr
Back  Top

6-17(2017-01-27) Two R and D engineers at INRIA Nancy France

Our team at Inria Nancy is recruiting two R&D engineers for an ambitious industrial
project on voice processing in audiovisual contents. The mission is to develop, evaluate,
and improve software prototypes based on the latest scientific advances and transfer them
to the partner software development company and the sound creation studio which initiated
the project. The resulting commercial software will be a reference in the professional
audiovisual field.

*R&D engineer A*
Start date: May 2017
Duration: 18 months
Missions:
- speaker recognition based on ALIZE
- speech enhancement by multichannel deep learning
Profile:
- MSc or PhD in computer science, signal processing, or machine learning
- operational skills in software engineering (version control, tests, software quality)
and Python 3 language (numpy, scipy, Keras)
- experience in speaker recognition or speech enhancement would be a plus
Salary: 2048 to 2509? net/month depending on experience

*R&D engineer B*
Start date: May-June 2017
Duration: 16 months
Missions:
- speech recognition based on Kaldi
- concatenative speech synthesis
Profile:
- MSc or PhD in computer science, signal processing, or machine learning
- operational skills in software engineering (version control, tests, software quality)
and Python 3 (numpy, scipy, Keras) and Java (Java SE 8) languages
- experience in speech recognition or synthesis would be a plus
Salary: 2048 to 2509? net/month depending on experience

*Work environment*
Nancy is one of the top cities for young engineers in France with cheap accomodation, a
vibrant cultural scene, and good connections to Paris (1.5h), Luxemburg (1.5h), Belgium,
and Germany. Inria Nancy is a 500-people research institute dedicated to computer
science. The Multispeech team (https://team.inria.fr/multispeech/) is a 30-people group
covering various fields of speech science, with a strong emphasis on machine learning and
signal processing.

*To apply*
Send a CV, a motivation letter and 1 to 3 optional recommendation letters to
emmanuel.vincent@inria.fr. Mention which position(s) you are applying for. Applications
will be assessed on a rolling basis until March 17. Please apply as soon as possible
before that date.

Back  Top

6-18(2017-02-01) Maitre de conférences, Ecole centrale Marseille France

L'Ecole Centrale Marseille ouvre un poste de Maitre de Conférences en
Informatique au concours 2017 dont le profil enseignement et recherche
est précisé ci-dessous (poste en cours de publication).

Liens utiles  :
Ecole Centrale Marseille https://www.centrale-marseille.fr/
Laboratoire d?Informatique Fondamentale : http://www.lif.univ-mrs.fr/

=================================
Profil de poste MC en informatique à l?ECM

- Enseignement

Le/la maître de conférence recruté(e) devra être capable d?assurer un
enseignement attractif pour des élèves ingénieurs généralistes.
Il/elle aura vocation à s'intégrer au sein de l'équipe pédagogique
informatique pour assurer des enseignements de tronc commun
(Algorithmie, modélisation objet, stockage et traitement des données),
participer aux enseignants en informatique dans les options de
deuxième et troisième année, s?investir dans des proposer des projets
et suivre des groupes d'étudiants tout au long de leurs réalisations,
mais également participer à des actions de formation continue ou en
alternance. Il/elle jouera un rôle dans l'animation, la coordination
et l'évolution des enseignements, et participera aux actions
transverses multidisciplinaires de l'École Centrale Marseille.

Contacts  : Pascal Préa (pascal.prea@centrale-marseille.fr)

- Recherche  :

Le/la candidate devra en priorité développer des recherches dans le
cadre de projets initiés entre le Laboratoire d?Informatique
Fondamentale de Marseille (LIF) et l?Ecole Centrale Marseille (ECM).
Ces deux projets ont pour thèmes d?une part le traitement de données
massives, qu?il s?agisse de modèles de classification, d?optimisation
ou encore de visualisation, et d?autre part l?apprentissage profond,
l?apprentissage de représentations et les domaines d?applications
associés. Ces projets couvrent des thèmes relevant notamment des
équipes ACRO, BDA, QARMA et TALEP du LIF.

Au-delà de cette priorité thématique, toute candidature d?excellence
dans le périmètre du LIF est éligible.

Par ailleurs, la capacité du/de la candidat(e) à enrichir la dimension
technologique des recherches menées dans le cadre de ces projets et à
participer à des partenariats industriels est un plus incontestable.

Contact Recherche : Thierry Artières (thierry.artieres@centrale-marseille.fr)

Back  Top

6-19(2017-02-13) CDD POST-DOCTORANT 18 mois *Analyse multimodale de contenus audiovisuels*

*CDD POST-DOCTORANT 18 mois*
*Analyse multimodale de contenus audiovisuels*

L?équipe LINKMEDIA (IRISA & Inria Rennes) travaille au développement des futurs
technologies permettant la description et l?accès aux contenus multimédias par le biais
de leur analyse. Les domaines de compétence de l?équipe sont la vision par ordinateur, le
traitement de la parole et du langage, le traitement des contenus audio, la recherche
d?information et la fouille de données. En particulier, l?équipe participe au projet FUI
NexGenTV portant sur l?analyse et l?enrichissement  de contenus télévisés. La télévision
évolue de l?écran du téléviseur vers des applications multi-écrans où le spectateur
regarde la télévision tout en explorant le web, cherchant des compléments d?informations
ou réagissant sur les réseaux sociaux. Dans ce contexte, NexGenTV cherche à apporter des
solutions d?édition de contenus enrichis multi-écrans par le bais de fonctionnalités
telles la détection de temps forts, l?enrichissement de programmes par des informations
complémentaires et, plus généralement, l?optimisation de l?expérience utilisateur en
favorisant l?interaction adaptée aux attentes de l?utilisateur. Au sein du projet,
l?IRISA s?intéresse à l?analyse des contenus audiovisuels, de la parole et des réseaux
sociaux.

Dans ce contexte, nous souhaitons recruter un chercheur post-doctorant spécialisé dans
l?analyse de contenus audiovisuels pour développer, étudier et évaluer des approches
innovantes relatives à l?analyse des personnes au sein des contenus télévisés. On
cherchera notamment à concevoir des approches multimodales (voix+visage) permettant aussi
bien la détection de personnes connues que la mise en relation de vidéos d?un même
intervenant. Une première piste de travail s?appuie sur des travaux récents de l?équipe
en apprentissage de représentations multimodales à l?aide de réseaux neuronaux. On pourra
également étudier l?usage de ces derniers pour la représentation et la comparaison des
voix. Dans un second temps, on s?intéressera à l?exploitation de tels modèles pour
enrichir un contenu live avec des extraits de documents archivés, combinant
identification des intervenants et pertinence sémantique.

Les recherches envisagées seront menées dans l?équipe LINKMEDIA de l?IRISA (Rennes,
France), en collaboration étroite avec les partenaires du projet NexGenTV, notamment avec
EURECOM.

Le candidat devra posséder une thèse dans un domaine proche du sujet de recherche, de
préférence dans l?un des domaines suivants : modélisation multimodale, traitement
automatique de la parole, reconnaissance du locuteur, vision par ordinateur. On attend
également du candidat qu?il renforce la compétence de l?équipe en apprentissage neuronal
appliqué à l?analyse des contenus multimédia.

Pour candidater, merci d?adresser un CV accompagné d?une lettre de motivation.

Employeur : Centre National de la Recherche Scientifique
Lieu d?exercice : IRISA, Rennes
Contrat : CDD 18 mois, dès que possible à partir de mars 2017
Rémunération : 2 815? mensuels bruts
Contact : Guillaume Gravier (prenom.nom@irisa.fr)

Back  Top

6-20(2017-02-20) Several possitions of Research Engineers at Audio Analytic Labs, Cambridge, UK

Audio Analytic Labs, the research division of Audio Analytic Ltd, has several Research
Engineer positions currently open in the field of Automatic Sound Recognition.

These could be of interest to your PhD students or Post-Docs finishing their contracts in
your teams and looking to follow up with an industrial position.

The complete job specification is copied below.

We are also open to answering questions from people interested in our company but not yet
available for employment.

More generally, we are open to finding concrete and mutually beneficial ways to
collaborate with academic partners on research projects, either through joint projects
supported by specific funding, or via secondments and internships.

For more information about our company, please visit the company?s website on
http://www.audioanalytic.com/ , or feel free to contact me directly.

I would be very grateful if you could propagate the attached job offer to your
institutions? career services, or if you could forward it directly to people who you
think may be directly interested in applying.

Hoping this will be useful, and of interest to your alumni.

Many thanks, and best regards,

- Sacha K.

Director of AALabs

AudioAnalytic Ltd.

INDUSTRIAL POSITION OPEN:

*Full Time Audio Analytics Research Engineer*

Location: Cambridge, Cambridgeshire, United Kingdom

Full-time, immediate start.

Audio Analytic Ltd. is leading the world of acoustically connected things. Our unique
software is used by smart home companies the world over to make devices aware of sounds
around them. If a smoke alarm goes off or a glass panel is broken by intruders while
no-one is at home, our software will immediately recognise the sound and tell the device
to alert the home owner and the smart home so they can both take appropriate protective
action. We give smart home owners sound peace of mind. More information is available on:

http://www.audioanalytic.com

We are looking for people who thrive as part of a dedicated and innovative team, love
tough challenges, and are passionate about audio/sound, DSP and Machine Learning.

Responsibilities

As part of our R&D team, you will contribute to researching and evaluating new algorithms
to push the limits of our unique sound recognition system. Responsibilities include
developing new algorithms in house, identifying and reporting on state of the art
methods, and evaluating both types of solutions on large scale field data sets.

Technical Skills

Must have either a Master?s degree with 2 years industrial experience or a PhD, in one of
the following topics: Digital Signal Processing of Audio Signals, Machine Learning
applied to Audio Signals, Automatic Speech/Speaker Recognition, Music Information
Retrieval, Acoustic Events Detection, Statistical Speech Synthesis, Thematic Indexing of
Audio tracks (e.g., Speaker Diarization, Acoustic Segmentation of Video Documents etc.).

Experience as a post-doc research engineer, either academic or industrial, will be a
significant plus.

Required:

     Demonstrable skills in Digital Signal Processing and/or Machine Learning applied to
Audio Signals.

     Demonstrable experience dealing with at least one type of Machine Learning algorithm
(e.g., Deep Neural Networks, Hidden Markov Models, Support Vector Machines, Decision
Trees etc.) applied to the processing of Audio Signals.

     Scripting and algorithm prototyping: Python, bash.

     Programming: C/C++ coding and code optimisation. CUDA/GPU programming a plus.

     Development under Linux/Unix mandatory, Windows optional.

Desirable:

     Hardware design knowledge a plus but not a requirement.

     Demonstrable interest in porting DSP/Machine Learning algorithms to either embedded
platforms or high performance computing platforms a plus but not a requirement.

General Skills

     Ability to deliver on research and evaluation methodology.

     Good communication skills.

     Excellent problem-solving skills.

     Track record of academic publications a plus but not a requirement.

     Enjoy working as a member of a team and using their own initiative.

     Self-confident and highly motivated.

     Ability to deal confidently with a variety of people at all levels.

     Able to manage own workload and meet deadlines.

     Good organisational skills.

     Good standard of written and spoken English.

Remuneration

This is a great opportunity to join a successful company with a huge potential for
growth. The successful candidate will be compensated with an attractive package
appropriate to qualifications and experience, to include a competitive salary and stock
options.

How to Apply

To apply for this vacancy, please send a covering letter and copy of a recent CV to
jobs@audioanalytic.com, with reference AA-RES-ENG-2016 in the email title.

Please note that it is company policy not to accept job applications from recruitment
consultants.

Back  Top

6-21(2017-02-21) Acting Assistant Professor, Department of Linguistics, University of Washington, WA, USA

Acting Assistant Professor, Department of Linguistics, University of Washington, Washington, USA, associated with the professional MS program and Ph.D. track in Computational Linguistics.

For additional details and to apply, please go to: http://ap.washington.edu/ahr/academic-jobs/position/aa22332/

Application deadline: May 31, 2017, open until filled. Priority will be given to application received before March 1, 2017.
Back  Top

6-22(2017-02-21) Several positions at Fluent.ai in Montreal, Canada

Fluent.ai is  looking for both permanent full-time employees as well as interns. Please link this page: http://www.fluent.ai/careers/#toggle-id-3.

Fluent.ai is a startup based in Montreal, Canada. We are working on new deep learning and related techniques to enable acoustic-only speech recognition. By associating speech to intent without requiring a speech-to-text translation, Fluent.ai opens a wide variety of new applications and provides higher accuracy and more robust performance compared to existing methods. We are looking to expand our technology and research teams and are inviting applications for various permanent and internship based roles. Joining Fluent.ai provides you an opportunity to be an early team member leading work on an exciting, disruptive technology poised for rapid growth. The technology has already been validated by many academic experts as well as industrial customers in diverse sectors. Now we are looking for the right people to share our vision and hustle to achieve execution excellence in select sectors. You will be joining a diverse, dedicated, smart and fun team. We work hard, we don?t always agree, but we always laugh out loud and we always move forward together. What we offer: We offer a great working environment and a competitive mix of salary and options. We are keen to interact with talented people and will get back to the selected candidates quickly. We are an equal opportunity employer and value diversity at our company. We do not discriminate based on origin, religion, gender, age, sexual orientation, or disability. We are looking for both permanent full-time employees as well as interns. Please link this page: http://www.fluent.ai/careers/#toggle-id-3 Let me know if you have any questions, and I will be happy to answer those. About Fluent.ai Fluent.ai is a startup based in Montreal, Canada. We are working on new deep learning and related techniques to enable acoustic-only speech recognition. By associating speech to intent without requiring a speech-to-text translation, Fluent.ai opens a wide variety of new applications and provides higher accuracy and more robust performance compared to existing methods. We are looking to expand our technology and research teams and are inviting applications for various permanent and internship based roles. Joining Fluent.ai provides you an opportunity to be an early team member leading work on an exciting, disruptive technology poised for rapid growth. The technology has already been validated by many academic experts as well as industrial customers in diverse sectors. Now we are looking for the right people to share our vision and hustle to achieve execution excellence in select sectors. You will be joining a diverse, dedicated, smart and fun team. We work hard, we don?t always agree, but we always laugh out loud and we always move forward together. What we offer: We offer a great working environment and a competitive mix of salary and options. We are keen to interact with talented people and will get back to the selected candidates quickly. We are an equal opportunity employer and value diversity at our company. We do not discriminate based on origin, religion, gender, age, sexual orientation, or disability.

 

Vikrant Tomar

Fluent.ai

Back  Top

6-23(2017-02-22) Language modeling scientist at Siri team at Apple

 

Title: Language Modeling Scientist – Siri Speech team at Apple

 

 

 

Job Summary

 

Play a part in the next revolution in human-computer interaction. Contribute to a product that is redefining mobile computing. Create groundbreaking technology for large scale systems, spoken language, big data, and artificial intelligence.  And work with the people who created the intelligent assistant that helps millions of people get things done — just by asking. Join the Siri Speech team at Apple.

 

 

 

The Siri team is looking for exceptionally skilled and creative Scientists and Engineers eager to get involved in hands-on work improving the Siri experience.

 

 

 

Key Qualifications

 

  • Experience building, testing, and tuning language models for ASR

  • Ability to implement experiments using scripting languages (Python, Perl, bash) and tools written in C/C++

  • Experience working with standard speech recognition toolkits (such as HTK, Attila, Kaldi, SRILM, OpenFST or equivalent proprietary systems)

  • Large scale data analysis experience using distributed clusters (e.g. MapReduce, Spark)

 

 

 

Description

 

The speech team is seeking a research scientist to participate in the language modeling effort for Siri. In order to estimate language model probabilities, you will make use of very large amounts of training text drawn from diverse sources. You will be part of a group that has responsibility for the entire domain of language modeling in multiple languages including, among other things, text processing, data selection, language model adaptation, neural network modeling, improving language model training infrastructure, experimenting with new types of language models etc.

 

 

 

Education

 

PhD or Masters in Computer Science or related field

 

3+ years of experience in language modeling for ASR

 

 

 

Apply online at jobs.apple.com

 

Search for: “Language Modeling Scientist”

 

Back  Top

6-24(2017-05-06) CHI 2017 Workshop on Designing Speech, Acoustic, and Multimodal Interactions, Denver, CO, USA

Call for participation and late submissions: CHI 2017 Workshop on
Designing Speech, Acoustic, and Multimodal Interactions

http://www.dgp.toronto.edu/dsli2017/

** Positions papers or requests for participation will be accepted until
March 3rd, please contact the organizers at
dsli2017-submissions@cs.toronto.edu

This workshop aims to bring together interaction designers, usability
researchers, and general HCI and speech processing practitioners. Our goal
is to create, through an interdisciplinary dialogue, momentum for
increased research and collaboration in:

* Formally framing the challenges to the widespread adoption of speech,
acoustic, and natural language interaction,
* Taking concrete steps toward developing a framework of user-centric
design guidelines for speech-, acoustic-, and language-based interactive
systems, grounded in good usability practices,
* Establishing directions to take and identifying further research
opportunities in designing more natural interactions that make use of
speech and natural language, and
* Identifying key challenges and opportunities for enabling and designing
multi-input modalities for a wide range of emerging devices such as
wearables, smart home personal assistants, or social robots.

We invite the submission of position papers demonstrating research,
design, practice, or interest in areas related to speech, acoustic.
language, and multimodal interaction that address one or more of the
workshop goals, with an emphasis, but not limited to, applications such as
mobile, wearable, smart home, social robots, or pervasive computing.

Alternatively, we invite interested participants to submit a brief
position paper outlining their experience and/or interest in the workshop
topics.

Position papers should be 4-6 pages long, in the ACM SIGCHI extended
abstract format and include a brief statement justifying the fit with the
workshop's topic. Summaries of previous research are welcome if they
contribute to the workshop's multidisciplinary goals (e.g. a speech
processing research in clear need of HCI expertise). Submissions will be
reviewed according to:
* Fit with the workshop topic
* Potential to contribute to the workshop goals
* A demonstrated track of research in the workshop area (HCI and/or
speech, acoustic, or multimodal processing).

Submission papers or requests for participation should be sent to:
dsli2017-submissions@cs.toronto.edu


--

Back  Top

6-25(2017-06-19) CfP CBMI 2017, 15th International Workshop on Content-Based Multimedia Indexing , Firenze, Italy

CBMI 2017, June 19-21, Firenze, Italy
15th International Workshop on Content-Based Multimedia Indexing
Last Call for Papers - Extended deadline
------------------------------------------------------------
http://cbmi2017.micc.unifi.it
https://twitter.com/cbmi2017

CBMI aims at bringing together the various communities involved in all aspects of content-based multimedia indexing for retrieval, browsing, management, visualization and analytics.

The 15th edition of CBMI will be organized in Firenze, Italy, 19-21 June 2017.
The scientific program will include invited keynote talks and regular, special and demo sessions.

Authors are encouraged to submit previously unpublished research papers in the broad field of content-based multimedia indexing and applications. We wish to highlight significant contributions addressing the main problem of search and retrieval but also the related and equally important issues of multimedia content management, user interaction, large-scale search, learning in retrieval, social media indexing and retrieval. Additional special sessions (http://www.micc.unifi.it/cbmi2017/call-for-special-session-papers/) are planned in Deep Learning for Multimedia indexing, Multimedia for Cultural Heritage, Sparse Data Machine Learning for Domains in Multimedia and Synergetic media production architecture.

The CBMI proceedings are traditionally indexed and distributed by IEEE Xplore and ACM DL. In addition, authors of the best papers of the conference will be invited to submit extended versions of their contributions to a special issue of Multimedia Tools and Applications journal (http://www.micc.unifi.it/cbmi2017/mtap-special-issue/) (MTAP).
An additional MTAP special issue will accept extended versions of CBMI 2017 papers (http://static.springer.com/sgw/documents/1600994/application/pdf/CFP_Soft+Computing+Techniques+and+Applications+on+Multimedia+Data+Analyzing+Systems.pdf)

Topics:
Topics of interest include, but are not limited to, the following:
* Audio and visual and multimedia indexing;
* Multimodal and cross-modal indexing;
* Deep learning for multimedia indexing;
* Visual content extraction;
* Audio (speech, music, etc) content extraction;
* Identification and tracking of semantic regions and events;
* Social media analysis;
* Metadata generation, coding and transformation;
* Multimedia information retrieval (image, audio, video, text);
* Mobile media retrieval;
* Event-based media processing and retrieval;
* Affective/emotional interaction or interfaces for multimedia retrieval;
* Multimedia data mining and analytics;
* Multimedia recommendation;
* Large scale multimedia database management;
* Summarization, browsing and organization of multimedia content;
* Personalization and content adaptation;
* User interaction and relevance feedback;
* Multimedia interfaces, presentation and visualization tools;
* Evaluation and benchmarking of multimedia retrieval systems;
* Applications of multimedia retrieval, e.g., medicine, lifelogs, satellite imagery, video surveillance;
* Cultural heritage applications.

Paper submission
Authors are invited to submit full-length and special session papers of 6 pages and short (poster) and demo papers of 4 pages maximum. The submissions are peer reviewed in single blind process. The language of the workshop is English.
Important dates
* Full/short paper submission deadline: extended to March 14, 2017
* Demo paper submission deadline: extended to March 14, 2017
* Special Session paper submission deadline: extended to March 14, 2017
* Notification of acceptance: April 28, 2017
* Camera-ready papers due: May 9, 2017
* MTAP special issue paper submission: October 15, 2017 (tentative) http://www.micc.unifi.it/cbmi2017/mtap-special-issue/

Technical Program Chairs
Rita Cucchiara, Univ. of Modena e Reggio Emilia, Italy
Tao Mei, Microsoft Research Asia, China

Details on the conference are available on the website: http://cbmi2017.micc.unifi.it

Back  Top

6-26(2017-02-28) Postdoctoral Researcher (Speech/Audio Processing), University of Eastern Finland, Joensuu Campus, Finland
*****
Postdoctoral Researcher (Speech/Audio Processing)

 

The University of Eastern Finland, UEF, is one of the largest multidisciplinary universities in Finland. We offer education in nearly one hundred major subjects, and are home to approximately 15,000 students and 2,800 members of staff. We operate on three campuses in Joensuu, Kuopio and Savonlinna. In international rankings, we are ranked among the leading 300 universities in the world.

The Faculty of Science and Forestry operates on the Kuopio and Joensuu campuses of the University of Eastern Finland. The mission of the faculty is to carry out internationally recognised scientific research and to offer research-education in the fields of natural sciences and forest sciences. The faculty invests in all of the strategic research areas of the university. The faculty?s environments for research and learning are international, modern and multidisciplinary.  The faculty has approximately 3,800 Bachelor?s and Master?s degree students and some 490 postgraduate students. The number of staff amounts to 560. http://www.uef.fi/en/lumet/etusivu

We are now inviting applications for

a Postdoctoral Researcher (Speech/Audio Processing), School of Computing, Joensuu Campus 

The Machine Learning research group of the School of Computing at the University of Eastern Finland (http://www.uef.fi/en/web/cs) is looking for a highly motivated researcher to work in the group.

The current research topics in the group include speaker and language recognition, voice conversion, spoofing and countermeasures for speaker recognition, robust feature extraction, and analysis of environmental sounds. Prior experience in these topics is a plus, though we invite candidates widely from general speech/audio/language processing, machine learning or signal processing background. We expect the new Postdoctoral Researcher to bring in complementary skills and expertise.

The recruited Postdoctoral Researcher will take a major role in advancing research in one of the above-listed (or closely related) topics. He or she will also have a significant role in the supervision of students and certain administrative duties, and he or she will work closely with Associate Professor Kinnunen and the other members of the group. The position is strongly research-focused.

The School of Computing, located in Joensuu Science Park, provides modern research facilities with access to high-performance computing services. Our research group hosted the Odyssey 2014 conference  (http://cs.uef.fi/odyssey2014/), is a partner in the ongoing H2020 funded OCTAVE project (https://www.octave-project.eu/) focused on voice biometrics, is a co-founder of the Automatic Speaker Verification and Countermeasures (ASVspoof) challenge series (http://www.spoofingchallenge.org/) and has hosted international summer schools. We take actively part in international benchmarking and other collaboration activities. We follow a multidisciplinary research perspective that targets at understanding the speech signal, as well as applying the acquired knowledge to new application areas.

A person to be appointed as a postdoctoral researcher shall hold a suitable doctoral degree that has been awarded less than five years ago. The doctoral degree should be in spoken language technology, electrical engineering, computer science, machine learning or a closely related field. He/she should be comfortable with Unix/Linux, Matlab/Octave and/or Python, processing of large datasets and with strong hands-on experience and creative out-of-the-box problem solving attitude.

The position will be filled from May 1, 2017 until December 31, 2018. The continuation of the position will be agreed separately.

The positions of postdoctoral researcher shall always be filled for a fixed term (UEF University Regulations 31 §).

The salary of the position is determined in accordance with the salary system of Finnish universities and is based on level 5 of the job requirement level chart for teaching and research staff (?2.865,30/ month). In addition to the job requirement component, the salary includes a personal performance component, which may be a maximum of 46.3% of the job requirement component.

For further information on the position, please contact: Associate Professor Tomi Kinnunen, email: tkinnu@cs.uef.fi, tel. +358 50 442 2647.  For further information on the application procedure, please contact: Executive Head of Administration Arja Hirvonen, tel. +358 44 716 3422, email: arja.hirvonen@uef.fi.

A probationary period is applied to all new members of the staff.

The electronic application should contain the following appendices:

- a cover letter indicating the position to be applied for and a free-worded motivation letter
a résumé or CV 
a list of publications
- copies of the applicant's academic degree certificates/ diplomas, and copies of certificates / diplomas relating to the applicant?s language proficiency, if not indicated in the academic degree certificates/diplomas
- the names and contact information of at least two referees

The application needs to be submitted no later than March 24, 2017 (by 24:00 EET) by using the electronic application form:

Apply for the job
 
The job ad and the application form can also be located under http://www.uef.fi/en/uef/en-open-positions  (seek for the position 'Postdoctoral Researcher (Speech/Audio Processing)').
Back  Top

6-27(2017-02-28) MCF en informatique pour les Sciences Humaines, Sorbonne, Paris, France

Un poste de MCF en informatique pour les Sciences Humaines, notamment en traitement automatique du langage et/ou de la parole, est ouvert à l'Université Paris Sorbonne (www.paris-sorbonne.fr/IMG/pdf/27-7_mcf_766.pdf). Le candidat enseignera l'Informatique dans les différentes formations de licence et de master du département d'Informatique, Mathématiques et de Linguistique appliquées de l'UFR de Sociologie et d'Informatique pour les Sciences Humaines. Il devra s'inscrire dans un ou plusieurs axes de l'équipe de linguistique computationnelle (www.stih.paris-sorbonne.fr/) : (1) Sémantiques, connaissances et corpus (2) Paralinguistique, cognition et physiologie.

Personne à contacter : Claude.Montacie@Paris-Sorbonne.fr

Back  Top



 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA