ISCA - International Speech
Communication Association

ISCApad Archive  »  2015  »  ISCApad #205  »  Academic and Industry Notes

ISCApad #205

Wednesday, July 08, 2015 by Chris Wellekens

4 Academic and Industry Notes
4-1Carnegie Speech


Carnegie Speech produces systems to teach people how to speak another language understandably. Some of its products include NativeAccent, SpeakIraqi, SpeakRussian, and ClimbLevel4. You can find out more at You can also read about awarding it a Best Breakout Idea of 2009 at:


4-2Research in Interactive Virtual Experiences at USC CA USA

REU Site: Research in Interactive Virtual Experiences



The Institute for Creative Technologies (ICT) offers a 10-week summer research program for undergraduates in interactive virtual experiences. A multidisciplinary research institute affiliated with the University of Southern California, the ICT was established in 1999 to combine leading academic researchers in computing with the creative talents of Hollywood and the video game industry. Having grown to encompass a total of 170 faculty, staff, and students in a diverse array of fields, the ICT represents a unique interdisciplinary community brought together with a core unifying mission: advancing the state-of-the-art for creating virtual reality experiences so compelling that people will react as if they were real.


Reflecting the interdisciplinary nature of ICT research, we welcome applications from students in computer science, as well as many other fields, such as psychology, art/animation, interactive media, linguistics, and communications. Undergraduates will join a team of students, research staff, and faculty in one of several labs focusing on different aspects of interactive virtual experiences. In addition to participating in seminars and social events, students will also prepare a final written report and present their projects to the rest of the institute at the end of summer research fair.


Students will receive $5000 over ten weeks, plus an additional $2800 stipend for housing and living expenses.  Non-local students can also be reimbursed for travel up to $600.  The ICT is located in West Los Angeles, just north of LAX and only 10 minutes from the beach.


This Research Experiences for Undergraduates (REU) site is supported by a grant from the National Science Foundation. The site is expected to begin summer 2013, pending final award issuance.


Students can apply online at:

Application deadline: March 31, 2013


For more information, please contact Evan Suma at


4-3Announcing the Master of Science in Intelligent Information Systems

Carnegie Mellon University


degree designed for students who want to rapidly master advanced content-analysis, mining, and intelligent information technologies prior to beginning or resuming leadership careers in industry and government. Just over half of the curriculum consists of graduate courses. The remainder provides direct, hands-on, project-oriented experience working closely with CMU faculty to build systems and solve problems using state-of-the-art algorithms, techniques, tools, and datasets. A typical MIIS student completes the program in one year (12 months) of full-time study at the Pittsburgh campus.  Part-time and distance education options are available to students employed at affiliated companies. The application deadline for the Fall 2013 term is December 14, 2012. For more information about the program, please visit


4-4Master in linguistics (Aix-Marseille) France

Master's in Linguistics (Aix-Marseille Université): Linguistic Theories, Field Linguistics and Experimentation TheLiTEx offers advanced training in Linguistics. This specialty focuses Linguistics is aimed at presenting in an original way the links between corpus linguistics and scientific experimentation on the one hand and laboratory and field methodologies on the other. On the basis of a common set of courses (offered within the first year), TheLiTEx offers two paths: Experimental Linguistics (LEx) and Language Contact & Typology (LCT) The goal of LEx is the study of language, speech and discourse on the basis of scientific experimentation, quantitative modeling of linguistic phenomena and behavior. It focuses on a multidisciplinary approach which borrows its methodologies to human physical and biological sciences and its tools to computer science, clinical approaches, engineering etc.. Among the courses offered: semantics, phonetics / phonology, morphology, syntax or pragmatics, prosody and intonation, and the interfaces between these linguistic levels, in their interactions with the real world and the individual, in a biological, cognitive and social perspective. Within the second year, a set of more specialized courses is offered such as Language and the Brain and Laboratory Phonology. LCT aims at understanding the world's linguistic diversity, focusing on language contact, language change and variation (European, Asian and African languages, Creoles, sign language, etc.).. This specialty focuses, from a a linguistic and sociolinguistic perspective, on issues of field linguistics and taking into account both the human and socio-cultural dimension of language (speakers, communities). It also focuses on documenting rare and endangered languages and to engage a reflection on linguistic minorities. This path also provides expertise and intervention models (language policy and planning) in order  to train students in the management of contact phenomena and their impact on the speakers, languages and societies More info at:




A new, one-year Master in Brain and Cognition will begin its activities in the Academic Year 2014-15 in Barcelona, Spain, organized by the Universitat Pompeu Fabra (

The core of the master's programme is composed of the research groups at UPF's Center for Brain and Cognition  ( These groups are directed by renowned scientists in areas such as computational neuroscience, cognitive neuroscience, psycholinguistics, vision, multisensory perception, human development and comparative cognition. Students will  be exposed to the ongoing research projects at the Center for Brain and Cognition and will be integrated in one of its main research lines, where they will conduct original research for their final project.

Application period is now open. Please visit the Master web page or contact for further information.


4-6Masters à la Sorbonne (Paris)

Les masters d'Ingénierie de la langue de Paris-Sorbonne, ILGII (R) et IILGI (P), sont maintenant regroupés dans une seule spécialité de la mention Littérature, Philosophie, Linguistique.
Les deux années du master Langue et Informatique apportent des connaissances fondamentales sur la langue et son traitement automatique, sur les interactions langagières et la modélisation des phénomènes paralangagiers, ainsi que sur l'ingénierie des connaissances. Les enseignements de spécialité développent également des savoirs et des savoir-faire : analyse et compréhension de textes ; reconnaissance et synthèse de la parole ; sciences affectives et systèmes de dialogue ; résumé et traduction assistés par ordinateur; extraction et construction des connaissances ; intelligence économique. Les enseignements méthodologiques du tronc commun de la mention permettent d'articuler ces enseignements spécialisés avec ce qui relève de l'épistémologie de la littérature, de la philologie et de la linguistique. Ce master comporte deux parcours : un parcours professionnel « Ingénierie de la Langue pour la Société Numérique (ILSN) »  et un parcours recherche « Informatique, Langue et Interactions (ILI) ». La différenciation entre les deux parcours se fait au semestre 4.



4-7The International Standard Language Resource Number (ISLRN)

JRC, the EC?s Joint Research Centre, an important LR player: First to adopt the ISLRN initiative


The Joint Research Centre (JRC), the European Commission's in house science service, is the first organisation to use the International Standard Language Resource Number (ISLRN) initiative and has requested ISLRN 13-digit unique identifiers to its Language Resources (LR).
Thus, anyone who is using JRC LRs may now refer to this number in their own publications.


The current JRC LRs (downloadable from with an ISLRN ID are:




The International Standard Language Resource Number (ISLRN) aims to provide unique identifiers using a standardised nomenclature, thus ensuring that LRs are correctly identified, and consequently, recognised with proper references for their usage in applications within R&D projects, product evaluation and benchmarking, as well as in documents and scientific papers. Moreover, this is a major step in the networked and shared world that Human Language Technologies (HLT) has become: unique resources must be identified as such and meta-catalogues need a common identification format to manage data correctly.
The ISLRN portal can be accessed from,


*** About the JRC ***

As the Commission's in-house science service, the Joint Research Centre's mission is to provide EU policies with independent, evidence-based scientific and technical support throughout the whole policy cycle.
Within its research in the field of global security and crisis management, the JRC develops open source intelligence and analysis systems that can automatically harvest and analyse a huge amount of multi-lingual information from the internet-based sources. In this context, the JRC has developed Language Technology resources and tools that can be used for highly multilingual text analysis and cross-lingual applications.
To find out more about JRC's research in open source information monitoring, please visit To access media monitoring applications directly, go to


*** About ELRA ***
The European Language Resources Association (ELRA) is a non-profit making organisation founded by the European Commission in 1995, with the mission of providing a clearing house for language resources and promoting Human Language Technologies (HLT).
To find out more about ELRA, please visit our web site:

For more information, contact


4-8New Masters in Machine Learning, Speech and Language Processing at Cambridge University, UK
New Masters in Machine Learning, Speech and Language Processing
This is a new twelve-month full-time MPhil programme offered by the Computational and Biological Learning Group (CBL) and the Speech Group in the Cambridge University Department of Engineering, with a unique, joint emphasis on both machine learning and on speech and language technology. The course aims: to teach the state of the art in machine learning, speech and language processing; to give students the skills and expertise necessary to take leading roles in industry; to equip students with the research skills necessary for doctoral study.
UK and EU students applications should be completed by 9 January 2015 for admission in October 2015. A limited number of studentships may be available for exceptional UK and eligible EU applicants. 

Self-funding students who do not wish to be considered for support from the Cambridge Trusts have until 30 June 2015 to submit their complete applications.

More information about the course can be found here:


4-9MediaEval 2015 Multimedia Benchmark


Call for Participation
MediaEval 2015 Multimedia Benchmark Evaluation
Early registration deadline: 1 May 2015

MediaEval is a multimedia benchmark evaluation that offers tasks promoting research and
innovation in areas related to human and social aspects of multimedia. MediaEval 2015
focuses on aspects of multimedia including and going beyond visual content, such as
language, speech, music, and social factors. Participants carry out one or more of the
tasks offered and submit runs to be evaluated. They then write up their results and
present them at the MediaEval 2015 workshop.

For each task, participants receive a task definition, task data and accompanying
resources (dependent on task) such as shot boundaries, keyframes, visual features, speech
transcripts and social metadata. In order to encourage participants to develop techniques
that push forward the state-of-the-art, a 'required reading' list of papers will be
provided for each task.

Participation is open to all interested research groups. To sign up, please click the
?MediaEval 2015 Registration? link at:

The following tasks are available to participants at MediaEval 2015:

*QUESST: Query by Example Search on Speech Task*
The task involves searching FOR audio content WITHIN audio content USING an audio content
query. This task is particularly interesting for speech researchers in the area of spoken
term detection or low-resource/zero-resource speech processing. The primary  performance
metric will be the normalized cross entropy cost (Cnxe).

*Multimodal Person Discovery in Broadcast TV (New in 2015!)*
Given raw TV broadcasts, each shot must be automatically tagged with the name(s) of
people who can be both seen as well as heard in the shot. The list of people is not known
a priori and their names must be discovered in an unsupervised way from provided text
overlay or speech transcripts. The task will be evaluated on a new French corpus
(provided by INA) and the AGORA Catalan corpus, using standard information retrieval
metrics based on a posteriori collaborative annotation of the corpus.

*C@merata: Querying Musical Scores*
The input is a natural language phrase referring to a musical feature (e.g., ?consecutive
fifths?) together with a classical music score, and the required output is a list of
passages in the score which contain that feature. Scores are in the MusicXML format,
which can capture most aspects of Western music notation. Evaluation is via versions of
Precision and Recall relative to a Gold Standard produced by the organisers.

*Affective Impact of Movies (including Violent Scenes Detection)*
In this task participating teams are expected to classify short movie scenes by their
affective content according to two use cases: (1) the presence of depicted violence, and
(2) their emotional impact (valence, arousal). The training data consists of short
Creative Commons-licensed movie scenes (both professional and amateur) together with
human annotations of violence and valence-arousal ratings. The results will be evaluated
using standard retrieval and classification metrics.

*Emotion in Music (An Affect Task)*
We aim at detecting emotional dynamics of music using its content. Given a set of songs,
participants are asked to automatically generate continuous emotional representations in
arousal and valence.

*Retrieving Diverse Social Images*
This task requires participants to refine a ranked list of Flickr photos with location
related information using provided visual, textual and user credibility information.
Results are evaluated with respect to their relevance to the query and the diverse
representation of it.

*Placing: Multimodal Geo-location Prediction*
The Placing Task requires participants to estimate the locations where multimedia items
(photos or videos) were captured solely by inspecting the content and metadata of these
items, and optionally exploiting additional knowledge sources such as gazetteers.
Performance is evaluated using the distance to the ground truth coordinates of the
multimedia items.

*Verifying Multimedia Use (New in 2015!)*
For this task, the input is a tweet about an event that has the profile to be of interest
in the international news, and the accompanying multimedia item (image or video).
Participants must build systems that output a binary decision representing a verification
of whether the multimedia item reflects the reality of the event in the way purported by
the tweet. The task is evaluated using the F1 score. Participants are also requested to
return a short explanation or evidence for the verification decision.

*Context of Experience: Recommending Videos Suiting a Watching Situation (New in 2015!)*
This task develops multimodal techniques for automatic prediction of multimedia in a
specific consumption context. In particular, we focus on the context of predicting movies
that are suitable to watch on airplanes. Input to the prediction methods are movie
trailers, and metadata from IMDb. Output is evaluated using the Weighted F1 score, with
expert labels as ground truth.

*Reliability of Social Multimedia Annotations (New in 2015!)*
Input is a set of underwater photos with user-generated annotations and other addition
social information taken from a social scuba divers website, and output is a ranked list
of the least reliable user-generated annotations. Systems will be evaluated using a
labeling of fish species created by expert annotators.

*Synchronization of Multi-User Event Media*
This task addresses the challenge of automatically creating a chronologically-ordered
outline of multiple multimedia collections corresponding to the same event. Given N media
collections (galleries) taken by different users/devices at the same event, the goal is
to find the best (relative) time alignment among them and detect the significant
sub-events over the whole gallery. Performance is evaluated using ground truth time codes
and actual event schedules.

*DroneProtect: Mini-drone Video Privacy Task (New in 2015!)*
Recent popularity of mini-drones and their rapidly increasing adoption in various areas,
including photography, news reporting, cinema, mail delivery, cartography, agriculture,
and military, raises concerns for privacy protection and personal safety. Input to the
task is drone video, and output is version of the video which protects privacy while
retaining key information about the event or situation recorded.

*Search and Anchoring in Video Archives*
The 2015 Search and Anchoring in Video Archives task consists of two sub-tasks: search
for multimedia content and automatic anchor selection. In the ?search for multimedia
content? sub-task, participants use multimodal textual and visual descriptions of content
of interest to retrieve potentially relevant video segments from within a collection. In
the ?automatic anchor selection? sub-task, participants automatically predict key
elements of videos as anchor points for the formation of hyperlinks to relevant content
within the collection. The video collection consists of professional broadcasts from BBC
or semi-professional user generated content. Participant submissions will be assessed
using professionally-created anchors, and crowdsourcing-based evaluation.

MediaEval 2015 Timeline
(dates vary slightly from task to task, see the individual task pages for the individual

Mid?March-May: Registration and return usage agreements.
May-June: Release of development/training data.
June-July: Release of test data.
Mid-Aug.: Participants submit their completed runs, and receive results.
End Aug: Participants submit their 2-page working notes papers.
14-15 September: MediaEval 2015 Workshop, Wurzen, Germany. Workshop as a satellite event
of Interspeech 2015, held nearby in Dresden the previous week.

We ask you to register by 1 May (because of the timing of the first wave of data
releases). After that point, late registration will be possible, but we encourage teams
to register as early as they can.

For questions or additional information please contact Martha Larson or visit

The ISCA SIG SLIM: Speech and Language in Multimedia ( is a key
supporter of MediaEval. This year, the MediaEval workshop will be held as a satellite
event of Interspeech (

A large number of organizations and projects make a contribution to MediaEval
organization, including the projects (alphabetical): Camomile
(, CrowdRec (, EONS
(, PHENICX (, Reveal
(, VideoSense (, Visen



4-10AASP TC Challenges

 Three years after its launch, the AASP TC Challenges series has achieved its goal of
stimulating ground-breaking approaches to hot topics in Audio and Acoustic Signal
Processing. The challenges run so far have been great successes, leading to unprecedented
participation, new publicly available datasets, and highly attended special sessions and
-    the CHiME Challenge on speech separation and recognition in domestic environments
-    the D-Case Challenge on detection and classification of acoustic scenes and events
-    the REVERB Challenge on single- and multichannel speech dereverberation

The coming year will see the unravelling of the ACE Challenge on acoustic
characterization of environments (, and future challenges
are now eagerly awaited by academics and industrials.

In order to pursue this endeavor, we are issuing a call for expressions of interest in
organizing new challenges. This is an open call with no deadline. Prospective organizers
should provide a brief description of the challenge, the planned test data and evaluation
methodology, and their value to the community. Challenges at the crossroads of other
communities such as speech processing or machine learning are especially welcome. For
more details, see

The AASP TC Challenges Subcommittee will help organizers run a successful Challenge by
providing scientific and organizational feedback, sharing industrial sponsorship
contacts, and awarding official prizes to the most reproducible challenges entries.

We are looking forward to your proposals!

On behalf of the AASP TC Challenges Subcommittee
Emmanuel Vincent, Chair


4-11Campagne d'évaluation MULTILING

La campagne d'évaluation Multiling sur le résumé automatique
s'interesse cette année au résumé de conversations orales à travers la
tâche CCCS (Call-Center Conversation Summarization). Les systèmes
participant seront évalués sur leur capacité à générer un résumé par
abstraction d'une conversation téléphonique qui raconte les problèmes
rencontrés par l'appelant et les solutions apportées par l'agent.

La deadline pour les soumissions est le 24 avril, ce qui vous laisse
juste le temps de participer.

Les données d'apprentissage proviennent des corpus DECODA (français),
LUNA (italien), et traductions manuelles de ces corpus en anglais. 100
conversations de chaque corpus sont annotées avec plusieurs résumés de
référence et 1000 conversations supplémentaires sont fournies pour
l'apprentissage non supervisé. Le corpus contient les transcriptions
manuelles de chaque conversation.

Si vous êtes interessé par cette tâche, merci de me contacter et de
vous renseigner sur

Benoit Favre.


4-12Questionnaire sur une éthique pour la communauté de parole

Pendant  l'atelier  Ethique et TRaitemeNt Automatique des Langues (ETeRNAL à TALN, a été proposé un questionnaire  afin de connaître les usages et les attentes de la communauté en matière  d'éthique.

Connectez-vous sur le site de l'atelier, ou directement
( ou et
consacrez 5 minutes pour nous donner votre avis.

Le résultat de ce questionnaire ne sera significatif que s'il est rempli par beaucoup de
monde, y compris par ceux qui ne se sentent pas a priori concernés par le sujet ou ceux
qui pensent que l'éthique n'est pas un problème.

Le résultat de l'enquête sera diffusé largement, sur le site de l'atelier, mais également
sur le blog créé à la suite de l'atelier (

5 minutes, promis!


Gilles Adda (LIMSI)


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA