ISCA - International Speech
Communication Association

ISCApad Archive  »  2012  »  ISCApad #172  »  Events  »  Other Events

ISCApad #172

Sunday, October 07, 2012 by Chris Wellekens

3-3 Other Events
3-3-1(2012-10-01) Human Activity and Vision Summer School, INRIA, Sophia Antipolis, France
Human Activity and Vision Summer School 
- Monday 1st to Friday 5th of October 2012 - INRIA, Sophia-Antipolis/Nice on the French Riviera - website: == Overview The Human Activity and Vision Summer School will address the broad domains of human activity modeling and human behavior recognition, with an emphasis on vision sensors as capturing modality. Courses will comprise both tutorials and presentations of state-of-the-art methods by active researchers in the field. The goal of the courses will be to cover most of the whole human activity analysis chain, starting from the low level processing of videos and audio for detection and feature extraction, to medium level (tracking and behavior cue extraction) and higher level modeling and recognition using both supervised and unsupervised techniques. Applications of the different methods to action and activity recognition in different domains ranging from Activities of Daily Living to surveillance (individual behavior recognition, crowd monitoring) will be considered. Presentation of real uses cases, market needs, and current bottlenecks in the surveillance domain will also be addressed, with one half day devoted to presentations and panel discussions with professional and industrial presenters. See list of topics and speaker below. == Audience The summer school is open to young researchers (in particular master or Ph.D. students) and researchers from both the academia and industry working or interested in the human activity analysis domain or connected fields like surveillance. == Application/Registration The registration is Euros 300. This includes all the courses, coffee breaks and lunch. The fee does not include accommodation or dinners. A limited number of cheap accommodations for students are available. To apply for a position at the Summer School and find more practical information, please go to: == List of topics and confirmed speakers * Object detection and tracking - Francois Fleuret (Idiap Research Institute) - Alberto del Bimbo and Federico Pernici (Università di Firenze) - Cyril Carincotte (Multitel) - Jean-Marc Odobez (Idiap research Institute) * Crowd analysis and Simulation - Mubarak Shah (University of Central Florida) - Paola Goatin (INRIA) - Cyril Carincotte (Multitel) * Action and behavior recognition - Ivan Laptev (INRIA) - Ben Krose (University of Amsterdam) - Francois Bremond (INRIA) * Social Behavior Analysis - Elisabeth Oberzaucher (University of Vienna) - Hayley Hung (University of Amsterdam) * Unsupervised activity discovery and active learning - Tao Xiang (University of Queen Mary) - Jean-Marc Odobez and Remi Emonet (IDIAP) * Body and head Pose estimation - Cheng Chen (Idiap Research Institute) - Guillaume Charpiat (INRIA) * Audio processing - Maurizio Omologo (Foundation Bruno Kessler) - Bertrand Ravera (Thales Communication France) Jean-Marc Odobez, IDIAP Senior Researcher, EPFL Maitre d'Enseignement et de Recherche (MER) IDIAP Research Institute ( Tel: +41 (0)27 721 77 26 Web: 
Back  Top

3-3-2(2012-10-22) cfp participation and papers/ 2nd International Audio/Visual Emotion Challenge and Workshop (AVEC 2012)
2nd International Audio/Visual Emotion Challenge and Workshop (AVEC 2012)

in conjunction with ACM ICMI 2012, October 22, Santa Monica, California, USA 

Register and download data and features: 



The Audio/Visual Emotion Challenge and Workshop (AVEC 2012) will be the second competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and audiovisual emotion analysis, with all participants competing under strictly the same conditions. The goal of the Challenge is to provide a common benchmark test set for individual multimodal information processing and to bring together the audio and video emotion recognition communities, to compare the relative merits of the two approaches to emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. A second motivation is the need to advance emotion recognition systems to be able to deal with naturalistic behavior in large volumes of un-segmented, non-prototypical and non-preselected data as this is exactly the type of data that both multimedia retrieval and human-machine/human-robot communication interfaces have to face in the real world.

We are calling for teams to participate in emotion recognition from acoustic audio analysis, linguistic audio analysis, video analysis, or any combination of these. As benchmarking database the SEMAINE database of naturalistic video and audio of human-agent interactions, along with labels for four affect dimensions will be used. Emotion will have to be recognized in terms of continuous time, continuous valued dimensional affect in the dimensions arousal, expectation, power and valence. Two Sub-Challenges are addressed: The Word-Level Sub-Challenge requires participants to predict the level of affect at word-level and only when the user is speaking. The Fully Continuous Sub-Challenge involves fully continuous affect recognition, where the level of affect has to be predicted for every moment of the recording.

Besides participation in the Challenge we are calling for papers addressing the overall topics of this workshop, in particular works that address the differences between audio and video processing of emotive data, and the issues concerning combined audio-visual emotion recognition

Topics include, but are not limited to:

Audio/Visual Emotion Recognition:
. Audio-based Emotion Recognition
. Linguistics-based Emotion Recognition
. Video-based Emotion Recognition
. Social Signals in Emotion Recognition
. Multi-task learning of Multiple Dimensions 
. Novel Fusion Techniques as by Prediction 
. Cross-corpus Feature Relevance 
. Agglomeration of Learning Data 
. Semi- and Unsupervised Learning 
. Synthesized Training Material 
. Context in Audio/Visual Emotion Recognition 
. Multiple Rater Ambiguity

. Multimedia Coding and Retrieval
. Usability of Audio/Visual Emotion Recognition 
. Real-time Issues

Important Dates

Paper submission
July 31, 2012

Notification of acceptance
August 14, 2012

Camera ready paper and final challenge result submission 
August 18, 2012

October 22, 2012


Björn Schuller (Tech. Univ. Munich, Germany) 
Michel Valstar University of Nottingham, UK) 
Roddy Cowie (Queen's University Belfast, UK) 
Maja Pantic (Imperial College London, UK)

Program Committee

Elisabeth André, Universität Augsburg, Germany
Anton Batliner, Universität Erlangen-Nuremberg, Germany
Felix Burkhardt, Deutsche Telekom, Germany
Rama Chellappa, University of Maryland, USA
Fang Chen, NICTA, Australia
Mohamed Chetouani, Institut des Systèmes Intelligents et de Robotique (ISIR), Fance
Laurence Devillers, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), France
Julien Epps, University of New South Wales, Australia
Anna Esposito, International Institute for Advanced Scientific Studies, Italy
Raul Fernandez, IBM, USA
Roland Göcke, Australian National University, Australia
Hatice Gunes, Queen Mary University London, UK
Julia Hirschberg, Columbia University, USA
Aleix Martinez, Ohio State University, USA
Marc Méhu, University of Geneva, Switzerland
Marcello Mortillaro, University of Geneva, Switzerland
Matti Pietikainen, University of Oulu, Finland
Ioannis Pitas, University of Thessaloniki, Greece
Peter Robinson, University of Cambridge, UK
Stefan  Steidl, Uinversität Erlangen-Nuremberg, Germany
Jianhua Tao, Chinese Academy of Sciences, China
Fernando de la Torre, Carnegie Mellon University, USA
Mohan Trivedi, University of California San Diego, USA
Matthew Turk, University of California Santa Barbara, USA
Alessandro Vinciarelli, University of Glasgow, UK
Stefanos Zafeiriou, Imperial College London, UK

Please regularly visit our website for more information.
Back  Top

3-3-3(2012-10-26) CfP Interdisciplinary Workshop on Laughter and other Non-Verbal Vocalisations in Speech, Dublin Ireland

Call for Papers for the Interdisciplinary Workshop on Laughter and other Non-Verbal Vocalisations in Speech
26-27 October 2012, Dublin, Ireland

Following the previous workshops on laughter held in Saarbruecken (2007) and Berlin (2009), we have the pleasure to announce a forthcoming workshop in Dublin in October 2012.

The studies of non-verbal vocal interaction is proving to be important in many research areas such as phonetics and discourse analysis, and also in more technology-oriented fields such as social signal processing and human behaviour understanding. Previous research has shown that laughter and other nonverbal vocalisations (e.g., breath sounds, yawning, sighing) have important functions in social interaction, for example, giving feedback, signaling engagement, and regulating turn-taking. However, much of the phonetic characteristics of non-verbal vocalisations, and the relationship between social functions and non-verbal vocalisations is still unknown.

The goal of this workshop is to bring together scientists from diverse research areas and to provide an exchange forum for interdisciplinary discussions in order to gain a better understanding of laughter and other non-verbal vocalisations. The workshop will consist of invited talks, oral presentations of ongoing research and discussion papers.

The keynote speakers are Marc Mehu (Swiss Center for Affective Sciences) and Jens Edlund (KTH Stockholm).

We invite research contributions concerning laughter and other non-verbal vocalisations from the fields of phonetics, linguistics, psychology, conversation analysis, and human-machine interaction. In particular, topics related to the following aspects are very much welcomed:

* Multimodality: visual aspects of non-verbal vocalisations, incl. smiles
* Entrainment and alignment: `timing together' of non-verbal vocalisations
* Emotion/affect and social behaviour: decoding and encoding of emotion/socio-related states in non-verbal vocalisations
* Interjections and grammaticalization: relation between non-verbal vocalisations and grammaticalization
* Computational models: automatic processing of non-verbal vocalisations

The workshop is supported by SSPnet (

Submission procedure
Researchers are invited to submit an abstract of their work, including work in progress. Please send your abstract of max. 2 pages (plain text) in PDF format to trouvain (at) specifying `Dublin workshop' in the subject line and providing
1. For each author: name, title, affiliation in the body of the mail
2. Title of abstract

Attendees are asked to register by email trouvain (at) before 1 October 2012. A registration fee of 30 Euros has to be paid on site (in cash).

Important dates
* Abstract submission deadline: 31 August 2012
* Notification of acceptance/rejection: 7 September 2012
* Registration deadline by email: 1 October 2012
* Workshop date: 26-27 October 2012

Trinity College, Dublin, Ireland


Nick Campbell, Trinity College Dublin
Juergen Trouvain, Saarland University
Khiet Truong, University of Twente

Contact information
Juergen Trouvain
Saarland University
FR 4.7 Computational Linguistics and Phonetics
Campus C7.2

Back  Top

3-3-4(2012-10-26) ICMI-2012 Workshop on Speech and Gesture Production in Virtually and Physically Embodied Conversational Agents, S.Monica, CA, USA
ICMI-2012 Workshop on Speech and Gesture Production in Virtually and Physically Embodied Conversational Agents
CONFERENCE: 14th ACM International Conference on Multimodal Interaction (ICMI-2012)
LOCATION: Santa Monica, California, USA
  * Submission deadline: Monday, June 4, 2012
  * Notification: Monday, July 30, 2012
  * Camera-ready deadline: Monday, September 10, 2012
  * Workshop: Friday, October 26, 2012
This full day workshop aims to bring together researchers from the embodied conversational agent (ECA) and sociable robotics communities to spark discussion and collaboration between the related fields. The focus of the workshop will be on co-verbal behavior production — specifically, synchronized speech and gesture — for both virtually and physically embodied platforms. It will elucidate the subject in consideration of aspects regarding planning and realization of multimodal behavior production. Topics discussed will highlight common and distinguishing factors of their implementations within each respective field. The workshop will feature a panel discussion with experts from the relevant communities, and a breakout session encouraging participants to identify design and implementation principles common to both virtually and physically embodied sociable agents.
Under the focus of speech-gesture-based multimodal human-agent interaction, the workshop invites submissions describing original work, either completed or still in progress, related to one or more of the following topics:
  * Computational approaches to:
    - Content and behavior planning, e.g., rule-based or probabilistic models
    - Behavior realization for virtual agents or sociable robots
  * From ECAs to physical robots: potential and challenges of cross-platform approaches
  * Behavior specification languages and standards, e.g., FML, BML, MURML
  * Speech-gesture synchronization, e.g., open-loop vs. closed-loop approaches
  * Situatedness within social/environmental contexts
  * Feedback-based user adaptation
  * Cognitive modeling of gesture and speech
Workshop contributions should be submitted via e-mail in the ACM publication style to in one of the following formats:
  * Full paper (5-6 pages, PDF file)
  * Short position paper (2-4 pages, PDF file)
  * Demo video (1-3 minutes, common file formats, e.g., AVI or MP4) including an extended abstract (1-2 pages, PDF file)
If a submission exceeds 10MB, it should be made available online and a URL should be provided instead.
Submitted papers and abstracts should conform to the ACM publication style; for templates and examples, follow the link:
Accepted papers will be included in the workshop proceedings in ACM Digital Library; video submissions and accompanying abstracts will be published on the workshop website. Contributors will be invited to give either an oral or a video presentation at the workshop.
  * Dan Bohus (Microsoft Research)
  * Kerstin Dautenhahn (University of Hertfordshire)
  * Jonathan Gratch (USC Institute for Creative Technologies)
  * Alexis Heloir (German Research Center for Artificial Intelligence)
  * Takayuki Kanda (ATR Intelligent Robotics and Communication Laboratories)
  * Jina Lee (Sandia National Laboratories)
  * Stacy Marsella (USC Institute for Creative Technologies)
  * Maja Matarić (University of Southern California)
  * Louis-Philippe Morency (USC Institute for Creative Technologies)
  * Bilge Mutlu (University of Wisconsin-Madison)
  * Victor Ng-Thow-Hing (Honda Research Institute USA)
  * Catherine Pelachaud (TELECOM ParisTech)
  * Ross Mead (University of Southern California)
  * Maha Salem (Bielefeld University)
  * Workshop Questions and Submissions (
  * Ross Mead (
Back  Top

3-3-5(2012-10-29) Workshop on Audio and Multimedia Methods for Large‐Scale Video Analysis, Nara, Japan

Audio and Multimedia Methods for Large‐Scale Video Analysis

First ACM International Workshop at ACM Multimedia 2012
29 October ‐ 2 November in Nara, Japan

***Extended submission deadline: July 15th  2012 ***

Media  sharing sites on the Internet and the one‐click upload ca‐
pability of smartphones have led to a deluge of online multimedia
content.  Everyday, thousands of videos are uploaded into the web
creating an ever‐growing demand for methods to make  them  easier
to  retrieve,  search,  and  index. While visual information is a
very important part of a video, acoustic information  often  com‐
plements  it.  This  is  especially true for the analysis of con‐
sumer‐produced, unconstrained videos from social media  networks,
such as YouTube uploads or Flickr content.

The diversity in content, recording equipment, environment, qual‐
ity, etc. poses significant challenges to the  current  state  of
the  art in multimedia analytics. The fact that this data is from
non‐professional and consumer sources means  that  it  often  has
little or no manual labeling. Large‐scale multi‐modal analysis of
audio‐visual material can help overcome this problem, and provide
training  and testing material across modalities for language un‐
derstanding, human action recognition, and  scene  identification
algorithms,  with  applications  in robotics, interactive agents,
etc. Speech and audio provide a natural modality to summarize and
interact  with the content of videos. Therefore, speech and audio
processing is critical for multimedia analysis that  goes  beyond
traditional classification and retrieval applications.

The  goal of the 1st ACM International Workshop on Audio and Mul‐
timedia Methods for Large‐Scale Video Analysis (AMVA) is to bring
together  researchers  and  practitioners  in this newly emerging
field, and to foster discussion on future directions of the topic
by providing a forum for focused exchanges on new ideas, develop‐
ments, and results. The aim is to build a strong community and  a
venue that at some point can become its own conference.

Topics include novel acoustic and multimedia methods for
  * video retrieval, search, and organization
  * video navigation and interactive services
  * information extraction and summarization
  * combination, fusion, and integration of the audio,
    visual, and other streams
  * feature extraction and machine learning on 'wild' data

Submissions: Workshop submissions of 4‐6 pages should be  format‐
ted  according to the ACM Multimedia author kit. Submission  sys-
tem link:

Important dates:
Workshop paper submission: July 1st, 2012 
Notification of acceptance: August 7th, 2012
Camera ready submission to Sheridan: August 15, 2012

Gerald Friedland, ICSI Berkeley (USA)
Daniel P. W. Ellis, Columbia University (USA)
Florian  Metze,  Carnegie‐Mellon  University (USA)

Panel Chair:
Ajay Divakarian, SRI/Sarnoff (USA)


Back  Top

3-3-6(2012-11-01) AMTA Workshop on Translation and Social Media (TSM 2012)
 AMTA Workshop on Translation and Social Media

            (TSM 2012)


Call for Papers


November 1st, 2012

    San Diego, CA, USA


--------------- The Workshop ---------------
During the last couple of years, user generated content on the World Wide Web has increased significantly. Users post status updates, comments, news and observations on services like Twitter; they communicate with networks of friends through web pages like Facebook; and they produce and publish audio and audio-visual content, such as comments, lectures or entertainment in the form of videos on platforms such as YouTube, and as Podcasts, e.g., via iTunes.

Nowadays, users do not publish content mainly in English anymore, instead they publish in a multitude of languages. This means that due to the language barrier, many users cannot access all available content. The use of machine and speech translation technology can help bridge the language barrier in these situations.

However, in order to automatically translate these new domains we expect several obstacles to be overcome:

·       Speech recognition and translation systems need to be able to rapidly adapt to rapidly changing topics as user generated content shifts in focus and topic.

·       Text and speech in social media will be extremely noisy, ungrammatical and will not adhere to conventional rules, instead following its own, continuously changing conventions.

At the same time we expect to discover new possibilities to exploit  social media content for improving speech recognition and translation systems in an opportunistic way, e.g., by finding and utilizing parallel corpora in multiple languages addressing the same topics, or by utilizing additional meta-information available to the content, such as tags, comments, key-word lists. Also, the network structure in social media could provide valuable information in translating its content.

The goal of this workshop is to bring together researchers in the area of machine and speech translation in order to discuss the challenges brought up by the content of social media, such as Facebook, Twitter, YouTube videos and podcasts.

--------------- Call for Papers ---------------

We expect participants to submit discussion papers that argue for new research and techniques necessary for dealing with machine and speech translation in the domain outlined above, as well as papers presenting results of related and potentially preliminary research that is breaking new ground.

--------------- Important Dates ---------------
·       Full Paper submission deadline: July 31st

·       Acceptance/Rejection: August 25th

·       Camera Ready Paper: September 1st
·Workshop: November 1st
--------------- Organizing Committee ---------------

·       Chairs: Satoshi Nakamura (NAIST, Japan) and Alex Waibel (KIT, Germany)

·       Program Chairs: Graham Neubig (NAIST, Japan), Sebastian Stüker (KIT, Germany), and Joy Ying Zhang (CMU-SV, USA)

·       Publicity Chair: Margit Rödder (KIT, Germany)

Back  Top

3-3-7(2012-11-13) International Conference on Asian Language Processing 2012 (IALP 2012),Hanoi, Vietnam


International Conference on Asian Language Processing 2012 (IALP 2012)
Hanoi, Vietnam, Nov 13-15, 2012
Paper Submission deadline: Jul 1,2012

The International Conference on Asian Language Processing (IALP) is a series
of conferences with unique focus on Asian Language Processing. The
conference aims to advance the science and technology of all the aspects of
Asian Language Processing by providing a forum for researchers in the
different fields of language study all over the world to meet. The first
meeting of the series was held in Singapore in 1986 and was called the
'International Conference on Chinese Computing (ICCC)' then. This meeting
initiated the study of Chinese and oriental languages processing in
Singapore and resulted in the formation of COLIPS in Singapore in 1988, as
well as the publication of the journal 'Communications of COLIPS' in 1991,
which is known as 'International Journal on Asian Language Processing'

Over the years, IALP has developed into one of important anaual events on
nature language processing in Asia. IALP 2008 was held in Chiang Mai
University, Thailand and the proceedings were indexed by ISTP/ISI. IALP 2009
was held in Singapore and was co-organized by COLIPS and IEEE Singapore
Computer Chapter. IALP 2010 was held in Harbin and was co-organized by
COLIPS and IEEE Singapore Computer Chapter, Chinese Information Processing
Society of China and Heilongjiang Institute of Technology (HIT). IALP 2011
was held in Penang, Malaisia and jointly organized by Chinese and Oriental
Languages Information Processing Society (COLIPS) of Singapore, IEEE
Singapore Computer Chapter, and Universiti Sains Malaysia. The proceedings
of IALP 2009,2010 and 2011 were published by CPS (Conference Publication
Services) and submitted for indexing in EI, ISTP/ISI and Current Contents on

This year, the International Conference on Asian Language Processing 2012
(IALP 2012) will be jointly organized by Chinese and Oriental Languages
Information Processing Society (COLIPS) of Singapore, IEEE Vietnam Computer
Chapter, and Hanoi University of Science and Technology (and MICA
Institute). The conference will be held in Hanoi, Vietnam on Nov 13-15,
2012. The 2012 edition (IALP 2012) will focus on under-resourced languages
studies. We will continue to work with CPS to publish the conference
proceesings. They will be included in the IEEE Xplore digital library and
submitted for indexing in INSPEC, EI, ISTP/ISI and Current Contents on

Hanoi (Vietnamese: Hà Noi, 'River Interior') is the capital and
second-largest city of Vietnam. As the capital of Vietnam for almost a
thousand years. Hanoi hosts more cultural sites than any city in Vietnam,
including over 600 pagodas and temples.Hanoi is is the social, cultural and
economic center of the country.The Old Quarter, near Hoan Kiem lake, has the
original street layout and architecture of old Hanoi. At the beginning of
the 20th century the city consisted of only about 36 streets, most of which
are now part of the old quarter. Each street then had merchants and
households specialized in a particular trade, such as silk traders,
jewellery, etc. The street names nowadays still reflect these
specializations, although few of them remain exclusively in their original
commerce. The area is famous for its small artisans and merchants, including
many silk shops. Local cuisine specialties as well as several clubs and bars
can be found here also. A night market (near Ðong Xuân market) in the heart
of the district opens for business every Friday, Saturday, and Sunday
evening with a variety of clothing, souvenirs and food.

We welcome you to Vietnam to experience the nature, history, and cultural in
one of best countries in South-East Asia.


Paper submissions are invited on substantial, original and unpublished
research in all aspects of Asian Language Processing, including, but not
limited to:

 - Under-resourced language studies
 - Input and output of large character sets of Asian languages
 - Typesetting and font designs of Asian languages
 - Asian character encoding and compression
 - Multimodal representations and processing
 - Voice input and output
 - Phonology and morphology
 - Lexical semantics and word sense
 - Grammars, syntax, semantics and discourse
 - Word segmentation, chunking, tagging and syntactic parsing
 - Word sense disambiguation, semantic role labeling and semantic parsing
 - Discourse analysis
 - Language, linguistic and speech resource development
 - Evaluation methods and user studies
 - Machine learning for natural language
 - Text analysis, understanding, summarization and generation
 - Text mining and information extraction, summarization and retrieval
 - Text entailment and paraphrasing
 - Text Sentiment analysis, opinion mining and question answering
 - Machine translation and multilingual processing
 - Linguistic, psychological and mathematical models of language,
computational psycholinguistics, computational linguistics and mathematical
 - Language modeling, statistical methods in natural language processing and
speech processing
 - Spoken language processing, understanding, generation and translation
 - Rich transcription and spoken information retrieval
 - Speech recognition and synthesis
 - Natural language applications, tools and resources, system evaluation
 - Asian language learning, teaching and computer-aided language learning
 - NLP in vertical domains, such as biomedical, chemical and legal text
 - NLP on noisy unstructured text, such as email, blogs, and SMS
 - Special hardware and software for Asian language computing


Submissions must describe substantial, original, completed and unpublished
work. Wherever appropriate, concrete evaluation and analysis should be
included. Submissions will be judged on correctness, originality, technical
strength, significance, relevance to the conference, and interest to the
attendees. Each submission will be reviewed by three program committee
members. Accepted papers will be presented in one of the oral sessions or
poster sessions as determined by the program committee.
As the reviewing will be blind, manuscripts must not include the authors'
names and affiliations. Authors should ensure that their identities are not
revealed in any way in the paper. Self-references that reveal the author's
identity, e.g., 'We previously showed (Smith, 1991) ...', must be avoided.
Instead, use citations such as 'Smith previously showed (Smith, 1991) ...'.
Papers that do not conform to these requirements will be rejected without

All submissions must be electronic and in Portable Document Format (PDF)
only. Paper submissions should follow the IEEE Proceedings' two-column
format without exceeding four (4) pages including references. We strongly
recommend the use of the LaTeX style files or Microsoft Word style files
according to IEEE Proceedings' format. Submissions must conform to the
official style guidelines.

The official language of the conference is English. Papers submitted should
be written in English.

Papers may be submitted until July 1, 2012, in PDF format via the START



Submission deadline         Jul 1, 2012
Notification of acceptance  Aug 3, 2012
Final manuscript due        Aug 17, 2012
Earlybird registration due  Aug 19, 2012
Regular registration due    Oct 31, 2012
Conference date             Nov 13-15, 2012


To get other details and the latest information about the conference, please
visit the conference website at

Pham Thi Ngoc Yen and Deyi Xiong
Program Co-chairs, IALP 2012


Back  Top

3-3-8(2012-11-21) Albayzin 2012 Language Recognition Evaluation, Madrid Spain

Albayzin 2012 Language Recognition Evaluation

The Albayzin 2012 Language Recognition Evaluation (Albayzin 2012 LRE) is supported by the Spanish Thematic Network on Speech Technology (RTTH) and organized by the Software Technologies Working Group (GTTS) of the University of the Basque Country, with the key collaboration of Niko Brümmer, from Agnitio Research, South Africa, for defining the evaluation criterion and coding the script used to measure system performance. The evaluation workshop will be part of IberSpeech 2012, to be held in Madrid, Spain from 21 to 23 November 2012. 
As in previous Albayzin LRE editions, the goal of this evaluation is to promote the exchange of ideas, to foster creativity and to encourage collaboration among research groups worldwide working on language recognition technology. To this end, we propose a language recognition evaluation similar to those carried out in 2008 and 2010, but under more difficult conditions. This time the application domain moves from TV Broadcast speech to any kind of speech found in the Internet, and no training data will be available for some of the target languages (aiming to reflect a common situation for low-resource languages). 
The change in the application domain pursues two objectives: first, the task should reflect a practical application (in this case, indexing of multimedia content in the Internet); and second, the task should be challenging enough for state-of-the-art systems to yield a relatively poor performance. 
Audio signals for development and evaluation will be extracted from YouTube videos, which will be heterogeneous regarding duration, number of speakers, ambient noise/music, channel conditions, etc. Besides speech, signals may contain music, noise and any kind of non-human sounds. In any case, each signal will contain a minimum amount of speech. As for previous evaluations, each signal will contain speech in a single language, except for signals corresponding to Out-Of-Set (OOS) languages, which might contain speech in two or more languages, provided that none of them are target languages. 
Overall, the Albayzin 2012 LRE introduces some interesting novelties with regard to previous Albayzin LRE editions and NIST Language Recognition Evaluations. The most remarkable novelties are the type of signals used for development and test and the evaluation criterion. All the details can be found in the Albayzin 2012 LRE Plan.


Deadline: July 16th 2012 Procedure: Submit an e-mail to the organization contact:, with copy to the Chairs of the Albayzin 2012 Evaluations: and, providing the following information:

  • Group name
  • Group ID
  • Institution
  • Contact person
  • Email address
  • Postal address

Data delivery

Starting from June 15th 2012, and once registration data are validated, the training (108 hours of broadcast speech for 6 target languages) and development (around 2000 audio segments including 10 target languages and Out-Of-Set languages) datasets will be released via web (only to registered participants).


  • May 18 2012: The evaluation plan is released and registration is open.
  • June 15 2012: Training and development data are released via web.
  • July 16 2012: Registration deadline.
  • September 3 2012: Evaluation data are released via web and system submission is open.
  • September 24 2012: Deadline for submitting system results and system descriptions.
  • October 15 2012: Preliminary results and evaluation keyfile are released via web.
  • November 21-23 2012: Albayzin 2012 LRE Workshop at IberSpeech 2012, Madrid, Spain.


Luis Javier Rodríguez Fuentes Software Technologies Working Group (GTTS) Department of Electricity and Electronics (ZTF-FCT) University of the Basque Country (UPV/EHU) Barrio Sarriena s/n 48940 Leioa - SPAIN
web: e-mail: phone: +34 946012716 fax: +34 946013071

Back  Top

3-3-9(2012-11-28) International Workshop on Spoken Dialog Systems (IWSDS 2012) Paris F
International Workshop on Spoken Dialog Systems (IWSDS 2012)

Towards a Natural Interaction with Robots, Knowbots and Smartphones.

Paris, France, November 28-30, 2012

Second Announcement

Following the success of IWSDS'2009 (Irsee, Germany), IWSDS'2010
(Gotemba Kogen Resort, Japan) and IWSDS'2011 (Granada, Spain),the
Fourth International Workshop on Spoken Dialog Systems (IWSDS 2012)
will be held in Paris (France) on November 28-30, 2012.

The IWSDS Workshop series provides an international forum for the
presentation of research and applications and for lively discussions
among researchers as well as industrialists, with a special interest
to the practical implementation of Spoken Dialog Systems in everyday
applications. Scientific achievements in language processing now
results in the development of successful applications such as IBM
Watson, Evi, Apple Siri or Google Assistant for access to knowledge
and interaction with smartphones, while the coming of domestic
robots advocates for the development of powerful communication means
with their human users and fellow robots.

We therefore put this year workshop under the theme
'Towards a Natural Interaction with Robots, Knowbots and Smartphones',
which covers:

-Dialog for robot interaction (including ethics),
-Dialog for Open Domain knowledge access,
-Dialog for interacting with smartphones,
-Mediated dialog (including multilingual dialog involving Speech
-Dialog quality evaluation.

We would also like to encourage the discussion of common issues of
theories, applications, evaluation, limitations, general tools and
techniques, and therefore also invite the submission of original
papers in any related area, including but not limited to:

-Speech recognition and semantic analysis,
-Dialog management, Adaptive dialog modeling,
-Recognition of emotions from speech, gestures, facial expressions
and physiological data,
-Emotional and interactional dynamic profile of the speaker during
dialog, User modeling,
-Planning and reasoning capabilities for coordination and conflict
-Conflict resolution in complex multi-level decisions,
-Multi-modality such as graphics, gesture and speech for input and output,
-Fusion, fission and information management, Learning and adaptability
-Visual processing and recognition for advanced human-computer interaction,
-Spoken Dialog databases and corpora, including methodologies and ethics,
-Objective and subjective Spoken Dialog evaluation methodologies,
strategies and paradigms,
-Spoken Dialog prototypes and products, etc.

We particularly welcome papers that can be illustrated by a
demonstration, and we will organize the conference in order to best
accommodate these papers, whatever their category.


We distinguish between the following categories of submissions:

Long Research Papers are reserved for reports on mature research
results. The expected length of a long paper should be in the range
of 8-12 pages.

Short Research Papers should not exceed 6 pages in total. Authors
may choose this category if they wish to report on smaller case
studies or ongoing but interesting and original research efforts

Demo - System Papers: Authors who wish to demonstrate their system
may choose this category and provide a description of their system
and demo. System papers should not exceed 6 pages in total.

As usual, it is planned that a selection of accepted papers will be
published in a book by Springer following the conference.


Deadline for submission: July 16, 2012
Notification of acceptance: September 15, 2012
Deadline for final submission of accepted paper: October 8, 2012
Deadline for Early Bird registration: October 8, 2012
Final program available online: November 5, 2012
Workshop: November 28-30, 2012

VENUE: IWSDS 2012 will be held as a two-day residential seminar in
the wonderful Castle of Ermenonville near Paris, France, where all
attendees will be accommodated.

IWSDS Steering Committee: Gary Geunbae Lee(POSTECH, Pohang,
Korea), Ramón López-Cózar (Univ. of Granada, Spain), Joseph Mariani
(LIMSI and IMMI-CNRS, Orsay, France), Wolfgang Minker (Ulm Univ.,
Germany), Satoshi Nakamura (Nara Institute of Science and
Technology, Japan)

IWSDS 2012 Program Committee: Joseph Mariani (LIMSI & IMMI-CNRS,
Chair), Laurence Devillers (LIMSI-CNRS & Univ. Paris-Sorbonne 4),
Martine Garnier-Rizet (IMMI-CNRS), Sophie Rosset (LIMSI-CNRS)

Organization Committee: Martine Garnier-Rizet (Chair), Lynn
Barreteau, Joseph Mariani (IMMI-CNRS)

Supporting organizations (to be completed): IMMI-CNRS and
LIMSI-CNRS (France), Postech (Korea), University of Granada (Spain),
Nara Institute of Science and Technology and NICT (Japan), Ulm
University (Germany)

Scientific Committee: To be announced

Sponsors: To be announced

Please contact
or visit
to get more information.
Back  Top

3-3-10(2012-12-02) SLT 2012: 4-th IEEE Workshop on Spoken Language Technology, Miami Florida, December 2-5, 2012

SLT 2012: IEEE Workshop on Spoken Language Technology, Miami Florida, December 2-5, 2012


The Fourth IEEE Workshop on Spoken Language Technology (SLT) will be held between December 2-5, 2012 in Miami, FL. The goal of this workshop is to allow the speech/language processing community to share and present recent advances in various areas of spoken language technology. SLT will include oral and poster presentations. In addition, there will be three keynote addresses by well-known experts on topics such as machine learning and speech/language processing. The workshop will also include free pre-workshop tutorials on introduction or recent advances in spoken language technology.

Submission of papers in all areas of spoken language technology is encouraged, with emphasis on the following topics:

  • Speech recognition and synthesis
  • Spoken language understanding
  • Spoken dialog systems
  • Spoken document summarization
  • Machine translation for speech
  • Question answering from speech
  • Speech data mining
  • Spoken document retrieval
  • Spoken language databases
  • Multimodal processing
  • Human/computer interaction
  • Educational and healthcare applications
  • Assistive technologies
  • Natural Language Processing

Important Deadlines

Paper Submission

July 20, 2012


September 7, 2012

Demo Submission

September 6, 2012

Demo Notification

October 5, 2012


December 2-5, 2012

Submission Procedure

Prospective authors are invited to submit full-length, 4-6 page papers, including figures and references, to the SLT 2012 website. All papers will be handled and reviewed electronically. Please note that the submission dates for papers are strict deadlines.


Back  Top

3-3-11(2012-12-03) UNSW Forensic Speech Science Conference, Sydney, 2012
UNSW Forensic Speech Science Conference, Sydney, 2012 

The Forensic Voice Comparison Laboratory at the School of Electrical Engineering & Telecommunications, University of New South Wales will host a Forensic Speech Science Conference on 3 December 2012 as a satellite event to the 14th Australasian International Conference on Speech Science and Technology (SST-12).  

We welcome submissions related to all aspects of forensic speech science.  

Abstract submission deadline: 5 October 2012 

For more infomation see: 

Back  Top

3-3-12(2012-12-06) 9th International Workshop on Spoken Language Translation, Hong Kong, China

The 9th International Workshop on Spoken Language Translation will take
place in Hong Kong on December 6-7, 2012.

The International Workshop on Spoken Language Translation (IWSLT) is a
yearly scientific workshop, associated with an open evaluation campaign on
spoken language translation, where both scientific papers and system
descriptions are presented. 

Details can be found on the conference website

Back  Top

3-3-13(2012-12-15) CfP 3rd Workshop on 'Cognitive Aspects of the Lexicon' (CogALex), Mumbai, India

2nd Call for Papers

3rd Workshop on 'Cognitive Aspects of the Lexicon' (CogALex)

Post-conference workshop at COLING 2012
(December 15, Mumbai, India)

Submission deadline: October 15, 2012

Invited speaker: Alain Polguère (Université de Lorraine & ATILF CNRS, France)



The aim of this workshop is to bring together researchers involved in the construction and application of electronic dictionaries to discuss modifications of existing resources in line with the users' needs, thereby fully exploiting the advantages of the digital form. Given the breadth of the questions, we welcome reports on work from many perspectives, including but not limited to: computational lexicography, psycholinguistics, cognitive psychology, language learning and ergonomics.


The way we look at dictionaries, their creation and use, has changed dramatically over the past 30 years. (1) While being considered as an appendix to grammar in the past, they have in the meantime moved to centre stage. Indeed, there is hardly any task in NLP which can be conducted without them. (2) Also, many lexicographers work nowadays with huge digital corpora, using language technology to build and to maintain the lexicon. (3) Last, but not least, rather than being static entities (data-base view), dictionaries are now viewed as graphs, whose nodes and links (connection strengths) may change over time. Interestingly, properties concerning topology, clustering and evolution known from other disciplines (society, economy, human brain) also apply to dictionaries: everything is linked, hence accessible, and everything is evolving. Given these similarities, one may wonder what we can learn from these disciplines.

In this 3rd edition of the CogALex workshop we therefore intend to also invite scientists working in these fields, our goals being to broaden the picture, i.e. to gain a better understanding concerning the mental lexicon and to integrate these findings into our dictionaries in order to support navigation. Given recent advances in neurosciences, it appears timely to seek inspiration from neuroscientists studying the human brain. There is also a lot to be learned from other fields studying graphs and networks, even if their object of study is something else than language, for example biology, economy or society.


This workshop is about possible enhancements of existing electronic dictionaries. To perform the groundwork for the next generation of electronic dictionaries we invite researchers involved in the building of such dictionaries. The idea is to discuss modifications of existing resources by taking the users' needs and knowledge states into account, and to capitalize on the advantages of the digital media. For this workshop we invite papers including but not limited to the following topics which can be considered from various points of view: linguistics, neuro- or psycholinguistics (associations, tip-of-the-tongue problem), network-related sciences (complex graphs, network topology, small-world problem), etc.

1) Analysis of the conceptual input of a dictionary user

- What does a language producer start from (bag of words)?
- What is in the authors' minds when they are generating a message and looking for a word?
- What does it take to bridge the gap between this input and the desired output (target word)?

2) The meaning of words

- Lexical representation (holistic, decomposed)
- Meaning representation (concept based, primitives)
- Revelation of hidden information (vector-based approaches: LSA/HAL)
- Neural models, neurosemantics, neurocomputational theories of content representation.

3) Structure of the lexicon

- Discovering structures in the lexicon: formal and semantic point of view (clustering, topical structure)
- Creative ways of getting access to and using word associations
- Evolution, i.e. dynamic aspects of the lexicon (changes of weights)
- Neural models of the mental lexicon (distribution of information concerning words, organisation of the mental lexicon)

4) Methods for crafting dictionaries or indexes

- Manual, automatic or collaborative building of dictionaries and indexes (distributional semantics, crowd-sourcing, serious games, etc.)
- Impact and use of social networks (Facebook, Twitter) for building dictionaries, for organizing and indexing the data (clustering of words), and for allowing to track navigational strategies, etc.
- (Semi-) automatic induction of the link type (e.g. synonym, hypernym, meronym, association, collocation, ...)
- Use of corpora and patterns (data-mining) for getting access to words, their uses, and combinations (associations)

5) Dictionary access (navigation and search strategies), interface issues

- Semantic-based search
- Search (simple query vs multiple words)
- Context-dependent search (modification of usersí goals during search)
- Recovery
- Navigation (frequent navigational patterns or search strategies used by people)
- Interface problems, data-visualisation


- Deadline for paper submissions: October 15, 2012
- Notification of acceptance: November 5, 2012
- Camera-ready papers due: November 15, 2012
- Workshop date: December 15, 2012




Alain Polguère (Université de Lorraine & ATILF CNRS, France)


* Barbu, Eduard (Universidad de Jaén, Spain)
* Barrat, Alain (Centre de physique théorique, CNRS & Aix-Marseille University)
* Bilac, Slaven (Google Tokyo, Japan)
* Bel Enguix, Gemma (LIF, Aix-Marseille University, France)
* Bouillon, Pierrette (TIM, Faculty of Translation and Interpretating, Geneva, Switzerland)
* Cook, Paul (The University of Melbourne, Australia)
* Cristea, Dan (University of Iasi, Romania)
* Fairon, Cedrick (CENTAL, Université catholique de Louvain, Belgium)
* Fazly, Afsaneh (University of Toronto, Canada)
* Fellbaum, Christiane (University of Princeton, USA)
* Ferret, Olivier (CEA LIST, Palaiseau, France)
* Fontenelle, Thierry (Translation Centre for the Bodies of the European Union, Luxemburg)
* Granger, Sylviane (Université Catholique de Louvain, Belgium)
* Grefenstette, Gregory (3DS Exalead, Paris, France)
* Hansen-Schirra, Silvia (University of Mainz, FTSK, Germany)
* Heid, Ulrich (University of Hildesheim, Germany)
* Hirst, Graeme (University of Toronto, Canada)
* Hovy, Ed (ISI, Los Angeles, USA)
* Joyce, Terry (Tama University, Kanagawa-ken, Japan)
* Kwong, Olivia (City University of Hong Kong, China)
* L'Homme, Marie Claude (OLST, University of Montreal, Canada)
* Lapalme, Guy (RALI, University of Montreal, Canada)
* Mititelu, Verginica (RACAI, Bucharest, Romania)
* Pirrelli, Vito (ILC, Pisa, Italy)
* Polguère, Alain (Université de Lorraine & ATILF CNRS, France)
* Rapp, Reinhard (University of Leeds, UK)
* Ruette, Tom (KU Leuven, Belgium)
* Schwab, Didier (LIG, Grenoble, France)
* Serasset, Gilles (IMAG, Grenoble, France)
* Sharoff, Serge (University of Leeds, UK)
* Sinopalnikova, Anna (FIT, BUT, Brno, Czech Republic)
* Sowa, John (VivoMind Research, LLC, USA)
* Tiberius, Carole (Institute for Dutch Lexicology, The Netherlands)
* Tokunaga, Takenobu (TITECH, Tokyo, Japan)
* Tufis, Dan (RACAI, Bucharest, Romania)
* Valitutti, Alessandro (University of Helsinki and HIIT, Finland)
* Vossen, Piek (Vrije Universiteit, Amsterdam, The Netherlands)
* Wehrli, Eric (LATL, University of Geneva, Switzerland)
* Zock, Michael (LIF, CNRS, Aix-Marseille University, France)
* Zweigenbaum, Pierre (LIMSI - CNRS, Orsay & ERTIM - INALCO, Paris, France)


Michael Zock (LIF-CNRS, Marseille, France), michael.zock AT
Reinhard Rapp (University of Leeds, UK), reinhardrapp AT

For more details see:

Back  Top

3-3-14(2013-01-17) Tralogy II: The quest for meaning: where are our weak points and what do we need?, CNRS, Paris


Tralogy II: Human and Machine Translation. The quest for meaning: where are our weak points and what do we need?
Dates and venue of the Conference: January 17-18, 2013 - CNRS Headquarters Auditorium, Paris (France) ****** Submission Deadline extended to October 15, 2012 ******
The conclusions of the first Tralogy Conference (3-4 March 2011 at the CNRS in Paris) were clear: none of the specialist branches of the language industry can individually hope to offer all the intellectual and professional tools needed to function effectively in the sector. They all need each other: translation has always been interdisciplinary and the translation profession even more so. Accordingly, on the occasion of the second Tralogy Conference, we would like to ask each of our prospective participants not only to present specific contributions from their specialist fields and research into the question of meaning, but also, and in particular, to highlight the limits they face in their specialist fields and research within the wider context of the potential applications of their work. What we would like to find out by the end of Tralogy II is what each of us does not know how to do. We are therefore hoping that, as we map out our respective weak points, these will coincide with the points of contact made at the Conference and with the areas in which there is room for improvement. We will therefore give priority to concise presentations (the published articles will of course be longer) in order to leave time for discussions. And the key question that emerged from Tralogy I will remain at the heart of this analysis: how to measure the quality of a translation with regard to its use.
Canada was the country invited to participate in Tralogy I. This time we would like to honour languages that are very much alive but with lower numbers of users. We have therefore decided to organise this conference under the joint patronage of the Baltic States, Member States of the European Union: Estonia, Latvia and Lithuania.
Call for papers:
To submit a paper:
















Back  Top

3-3-15(2013-02-11) International Conference on Bio-inspired Systems and Signal Processing BIOSIGNALS, Barcelona
International Conference on Bio-inspired Systems and Signal Processing BIOSIGNALS 
website: February 11 - 14, 2013 Barcelona, Spain In 
Collaboration with: UVIC Sponsored by: INSTICC INSTICC is Member of: WfMC 
IMPORTANT DATES: Regular Paper Submission: September 3, 2012 (deadline extended) 
Authors Notification (regular papers): October 23, 2012 
Final Regular Paper Submission and Registration: November 13, 2012
The conference will be sponsored by the Institute for Systems and Technologies of Information, 
Control and Communication (INSTICC) and held In Collaboration with the Universitat 
de Vic (UVIC). INSTICC is Member of the Workflow Management Coalition (WfMC). 
We would like to highlight the presence of the following keynote speakers:
 - Pedro Gomez Vilda, Universidad Politecnica de Madrid, Spain 
- Christian Jutten, GIPSA-lab, France 
- Adam Kampff, Champalimaud Foundation, Portugal 
- Richard Reilly, Trinity College Dublin, Ireland 
- Vladimir Devyatkov, Bauman Moscow State Technical University, Russian Federation 
Details of which can be found on the Keynotes webpage available at: 
Submitted papers will be subject to a double-blind review process. All accepted papers
 (full, short and posters) will be published in the conference proceedings, under an ISBN 
reference, on paper and on CD-ROM support. JHPZ A short list of presented papers 
will be selected so that revised and extended versions of these papers will be published 
by Springer-Verlag in a CCIS Series book. The proceedings will be submitted for indexation 
by Thomson Reuters Conference Proceedings Citation Index (ISI), INSPEC, DBLP and 
EI (Elsevier Index). All papers presented at the conference venue will be available at the 
SciTePress Digital Library ( 
SciTePress is member of CrossRef ( 
We also would like to highlight the possibility to submit to the following Special Session: 
- 3rd International Special Session on Multivariable Processing for 
Biometric Systems - MPBS ( 
Please check further details at the BIOSIGNALS conference website
Back  Top

3-3-16(2013-06-01) 2nd CHiME Speech Separation and Recognition Challenge, Vancouver, Canada

 2nd CHiME Speech Separation and Recognition Challenge
          Supported by IEEE Technical Committees

                Deadline: January 15, 2013
        Workshop: June 1, 2013, Vancouver, Canada


Following the success of the 1st PASCAL CHiME Speech Separation and
Recognition Challenge, we are happy to announce a new challenge
dedicated to speech recognition in real-world reverberant, noisy conditions,
that will culminate in a dedicated satellite workshop of ICASSP 2013.

The challenge is supported by several IEEE Technical Committees and by
an Industrial Board.


The challenge consists of recognising distant-microphone speech mixed in
two-channel nonstationary noise recorded over a period of several weeks
in a real family house. Entrants may address either one or both of the
following tracks:

Medium vocabulary track: WSJ 5k sentences uttered by a static speaker

Small vocabulary track: simpler commands but small head movements


You will find everything you need to get started (and even more) on the
challenge website:
- a full description of the challenge,
- clean, reverberated and multi-condition training and development data,
- baseline training, decoding and scoring software tools based on HTK.

Submission consists of a 2- to 8-page paper describing your system and
reporting its performance on the development and the test set. In
addition, you are welcome to submit an earlier paper to ICASSP 2013,
which will tentatively be grouped with other papers into a dedicated

Any approach is welcome, whether emerging or established.

If you are interested in participating, please email us so we can
monitor interest and send you further updates about the challenge.


The best challenge paper will distinguished by an award from the
Industrial Board.


July 2012          Launch
October 2012       Test set release
January 15, 2013   Challenge & workshop submission deadline
February 18, 2013  Paper notification & release of the challenge results
June 1, 2013       ICASSP satellite workshop


Masami Akamine, Toshiba
Carlos Avendano, Audience
Li Deng, Microsoft
Erik McDermott, Google
Gautham Mysore, Adobe
Atsushi Nakamura, NTT
Peder A. Olsen, IBM
Trausti Thormundsson, Conexant
Daniel Willett, Nuance


Conexant Systems Inc.
Audience Inc.
Mitsubishi Electric Research Laboratories


Emmanuel Vincent, INRIA
Jon Barker, University of Sheffield
Shinji Watanabe & Jonathan Le Roux, MERL
Francesco Nesta & Marco Matassoni, FBK-IRST

Back  Top

3-3-17(2013-06-18) Urgent Cf Participation NTCIR-10 IR for Spoken Documents Task (SpokenDoc-2)
Call for Participation

    NTCIR-10 IR for Spoken Documents Task (SpokenDoc-2)


The growth of the internet and the decrease of the storage costs are
resulting in the rapid increase of multimedia contents today. For
retrieving these contents, available text-based tag information is
limited. Spoken Document Retrieval (SDR) is a promising technology for
retrieving these contents using the speech data included in them.
Following the NTCIR-9 SpokenDoc task, we will continue to evaluate the
SDR based on a realistic ASR condition, where the target documents are
spontaneous speech data with high word error rate and high
out-of-vocabulary rate.


The new speech data, the recordings of the first to sixth annual
Spoken Document Processing Workshop, are going to be used as the
target document in SpokenDoc-2. The larger speech data, spoken
lectures in Corpus of Spontaneous Japanese (CSJ), are also used as in
the last SpokenDoc-1. The task organizers are going to provide
reference automatic transcriptions for these speech data. These
enabled researchers interested in SDR, but without access to their own
ASR system to participate in the tasks. They also enabled comparisons
of the IR methods based on the same underlying ASR performance.

Targeting these documents, two subtasks will be conducted.

Spoken Term Detection: 
  Within spoken documents, find the occurrence positions of a queried
  term. The evaluation should be conducted by both the efficiency
  (search time) and the effectiveness (precision and recall).

Spoken Content Retrieval: 
  Among spoken documents, find the segments including the relevant
  information related to the query, where a segment is either a
  document (resulting in document retrieval task) or a passage
  (passage retrieval task). This is like an ad-hoc text retrieval
  task, except that the target documents are speech data.

Please visit
A link to the NTCIR-10 task participants registration page
is now available from this page.

Please note that the registration deadline is Jun 30, 2012 (for
all NTCIR-10 tasks).


Kiyoaki Aikawa (Tokyo University of Technology)
Tomoyosi Akiba (Toyohashi University of Technology)
Xinhui Hu (National Institute of Information and Communications Technology)
Yoshiaki Itoh (Iwate Iwate Prefectural University)
Tatsuya Kawahara (Kyoto University)
Seiichi Nakagawa (Toyohashi University of Technology)
Hiroaki Nanjo (Ryukoku University)
Hiromitsu Nishizaki (University of Yamanashi)
Yoichi Yamashita Ritsumeikan University)

If you have any questions, please send e-mails to the task
organizers mailing list:

Back  Top

3-3-18(2013-07-03) CorpORA and Tools in Linguistics, Languages and Speech, Strasbourg, France

Colloque organisé par l’Unité de   Recherche 1339

                                                               Linguistique, Langues, Parole   (LiLPa)

                                                                  Université de Strasbourg – Unistra

                                                                                    3 – 5 juillet 2013

                                                                                 Strasbourg - France


CorpORA and Tools in Linguistics,  Languages and Speech:

                                                                        Status, Uses and Misuse

          Conference  organised by the  Research Unit 1339 Linguistics, Languages and Speech (LiLPa)

                                                              University  of Strasbourg – UNISTRA

                                                                                  3 – 5 July 2013

                                                                            Strasbourg - France

Back  Top

3-3-19(2013-10-23) 5ème Journées de Phonétique Clinique (JPhC) , Liège (Belgique).

 5ème Journées de Phonétique Clinique (JPhC) qui auront lieu à Liège les 23, 24, 25 octobre 2013.        


Ces journées ont vu le jour à Paris en 2005 ( En 2007, elles se sont déroulées à Grenoble, en 2009 à Aix-en-Provence ( et en 2011 à Strasbourg ( Elles ont lieu tous les deux ans. L’année 2013 sera Liégeoise (Belgique). En effet, elles  seront organisées par le service de Logopédie de la Voix de l'Université de Liège de psychologie: cognition et comportement) en étroite             collaboration avec le Laboratoires  d'Images, Signaux et Dispositifs de Télécommunications de l’Université Libre de               Bruxelles.


La phonétique réunit principalement des chercheurs, enseignants-chercheurs, ingénieurs, médecins et orthophoniste / logopèdes ;   différentes corps de métiers complémentaires qui poursuivent le même objectif : une meilleure connaissance des processus d'acquisition, de  développement et de dégénérescence du langage, de la parole et de la voix. Cette approche           interdisciplinaire vise à optimiser les connaissances  fondamentales relatives à la communication parlée, dans le but de mieux comprendre,  évaluer, et remédier aux troubles de la parole et de la  voix chez le sujet pathologique.         


Dans ce contexte, cette série de colloques internationaux  sur la production et la perception de la parole, chez le sujet           pathologique, représente une opportunité pour des professionnels, des chercheurs confirmés etdes jeu nes chercheurs de formations différentes de présenter des résultats expérimentaux nouveaux et d’échanger des idées de diverses           perspectives. Les communications porteront sur les études de la parole et de la voix pathologiques, chez l’adulte et chez l’enfant.


Nous espérons vous  voir nombreux à ces 5ème  Journées   de Phonétique Clinique. Vous trouverez plus  d’informations en visitant  le site à l’adresse suivante :


Back  Top

3-3-20(2014) Speech Prosody 2014 Dublin.
Speech Prosody 2014 in Dublin.
Back  Top

3-3-21Call for Participation MediaEval 2012 Multimedia Benchmark Evaluation

Call for Participation
MediaEval 2012 Multimedia Benchmark Evaluation
Please register by 31 May 2012

MediaEval is a multimedia benchmark evaluation that offers tasks promoting research and innovation in areas related to human and social aspects of multimedia. MediaEval 2012 focuses on aspects of multimedia including and going beyond visual content, including speech, language, audio and social factors. Participants carry out one or more of the tasks offered and submit runs to be evaluated. They then write up their results and present them at the MediaEval 2012 workshop.

For each task, participants receive a task definition, task data and accompanying resources (dependent on task) such as shot boundaries, keyframes, visual features, speech transcripts and social metadata. In order to encourage participants to develop techniques that push forward the state-of-the-art, a 'required reading' list of papers will be provided for each task. Participation is open to all interested research groups. Please sign up via (regular sign up will remain open until 31 May).

The following tasks are available to participants at MediaEval 2012:

Placing Task
This task involves automatically assigning geo-coordinates to Flickr videos using one or more of: Flickr metadata, visual content, audio content, social information (Data: Creative Commons Flickr data, predominantly English language, extended from the 2011 data set.)

Social Event Detection Task
This task requires participants to discover events and detect media items that are related to either a specific social event or an event-class of interest. By social events we mean that the events are planned by people, attended by people and that the social media are captured by people. (Data: URLs of images and videos available on Flickr and other internet archives together with metadata).

Spoken Web Search Task
This task involves searching FOR audio content WITHIN audio content USING an audio content query. It is particularly interesting for speech researchers in the area of spoken term detection. (Data: Audio from four different Indian languages and four South African languages. Each of the ca. 2000 data item is an 8 KHz audio file 4-30 secs in length.)

Tagging Task
Given a set of tags and a video collection, participants are required to automatically assign the tags to each video based on a combination of modalities, i.e., speech, metadata, audio and visual. (Data: Creative Commons internet video, nearly exclusively English, extended from the 2011 collection.)

Affect Task: Violent Scenes Detection
This task requires participants to deploy multimodal features to automatically detect portions of movies containing violent material. Any features automatically extracted from the video, including the subtitles, can be used by participants. (Data: A set of ca. 18 Hollywood movies that must be purchased by the participants.)

Visual Privacy Task
For this task, participants propose methods whereby human faces occurring in digital imagery can be obscured so as to render them unrecognizable.  An optimal balance should be struck between obscuring identity and maintaining the quality of the viewing experience from the user perspective. (Data: about 100 high resolution video files of ca 1m30s each and containing one or more persons in an indoor environment.)

Brave New Tasks
This year, MediaEval will also run three new tasks in the areas of social media, spoken content search and hyperlinking, and music tagging. These tasks are 'by invitation only' and are not included in the general registration form. In order to receive an invitation, please contact the task organizers.

MediaEval 2012 Timeline (dates vary slight from task to task, see the individual task pages for the individual deadlines:

31 May: Last day for regular sign up
1 June: Latest day for development data release
1 July: Latest day for test data release
ca. 10 September: Run submission deadline
28 September: Working notes papers due
4-5 October: MediaEval 2012 Workshop, Pisa, Italy*
*The workshop is timed so that it is possible to attend the 12th European Conference on Computer Vision ECCV 2012 (, held 7-13 October in Firenze, Italy, in the same trip.

MediaEval 2012 Coordination
Martha Larson, Delft University of Technology
Gareth Jones, Dublin City University

For questions or additional information please contact Martha Larson or visit visit

MediaEval 2012 Organization Committee:

Robin Aly, University of Twente, Netherlands
Xavier Anguera, Telefonica, Spain
Atta Badii, University of Reading, UK
Etienne Barnard, CSIR, South Africa
Claire-Helene Demarty, Technicolor, France
Maria Eskevich, Dublin City University, Ireland
Gerald Friedland, ICSI, USA
Isabelle Ferrané, University of Toulouse, France
Guillaume Gravier, IRISA, France
Claudia Hauff, TU Delft, Netherlands
Gareth Jones, Dublin City University, Ireland
Pascal Kelm, Technical University of Berlin, Germany
Christoph Kofler, Delft University of Technology, Netherlands
Chattun Lallah, University of Reading, UK
Martha Larson, TU Delft, Netherlands
Cynthia Liem, TU Delft, Netherlands
Florian Metze, CMU, USA
Vasileios Mezaris, ITI Certh, Greece
Roeland Ordelman, University of Twente and Netherlands Institute for Sound and Vision, Netherlands
Nicola Orio, Università degli Studi di Padova, Italy
Geoffroy Peeters, Institut de Recherche et Coordination Acoustique/Musique Paris, France
Cedric Penet, Technicolor, France
Tomas Piatrik, Queen Mary University of London, UK
Adam Rae, Yahoo! Research, Spain
Nitendra Rajput, IBM Research, India
Markus Schedl, Johannes Kepler Universität Linz, Austria
Sebastian Schmiedeke, Technical University of Berlin, Germany
Mohammad Soleymani, University of Geneva, Switzerland
Robin Sommer, ICSI/LBNL, USA
Raphael Troncy, Eurecom, France

A large number of projects make a contribution to MediaEval organization, including (alphabetically): AXES (, Chorus+ (, CUbRIK (, Glocal (, IISSCoS (, LinkedTV (, Promise (, Quaero (, Sealinc Media (, VideoSense ( and SocialSensor (

Back  Top

3-3-22CfProposals 42nd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)
Call for Proposals
42nd IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP 2017)

Sponsored By The IEEE Signal Processing Society


This Call for Proposal is distributed on behalf of IEEE Signal Processing Society Conference Board for the 42nd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) to be held in March or April of 2017. ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing theory and applications. The series is sponsored by the IEEE Signal Processing Society and has been held annually since 1976. The conference features world-class speakers, tutorials, exhibits, and over 120 lecture and poster sessions. ICASSP is a cooperative effort of the IEEE Signal Processing Society Technical Committees:

  • Audio and Acoustic Signal Processing
  • Bio Imaging and Signal Processing
  • Design and Implementation of Signal Processing Systems
  • Image, Video, and Multidimensional Signal Processing
  • Industry DSP Technology Standing Committee
  • Information Forensics and Security
  • Machine Learning for Signal Processing
  • Multimedia Signal Processing
  • Sensor Array and Multichannel Systems
  • Signal Processing Education Standing Committee
  • Signal Processing for Communications and Networking
  • Signal Processing Theory and Methods
  • Speech and Language Processing

The conference organizing team is advised to incorporate into their proposal the following items.

  • Proposed Dates (March or April 2017)
  • Organizing Committee Members
    • Name
    • Biographical information
    • Membership in the IEEE Signal Processing Society
  • List of scientific and research groups who reside in the local area who are in favor of the proposal and who are committed to attend and participate.
  • Proposed budget. (For advice on building an IEEE budget please contact Kartik Patel at
  • Support that can be anticipated from the local government, universities and or corporations
  • Why this location?
    • Airport information
    • Customs and Visa regulations
    • Hotel and convention center information (i.e. space diagrams, maps, etc.)
    • Tourist destinations (i.e. museums, natural wonders, etc.)
    • Average weather conditions for the time of year

Submission of Proposal
Proposals for ICASSP are currently being accepted for 2017. Proposals should be sent no later than 15 August 2012. Notification of acceptance will be made after ICIP 2012 in Orlando, FL. Send the proposal to Lisa Schwarzbek, Manager, Conference Services IEEE Signal Processing Society (

For additional guidelines for ICASSP please contact Lisa Schwarzbek, Manager, Conference Services (

Proposal Presentation
Proposals that are of interest to the Conference Board may be asked to present their proposal at the Conference Board meeting to be held in Orlando, Florida tentatively scheduled for Thursday, 4 October 2012.

Back  Top

 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA