ISCA - International Speech
Communication Association


ISCApad Archive  »  2012  »  ISCApad #167  »  Events  »  ISCA Supported Events  »  (2012-09-09) Special Session at Interspeech 2012 Speech and Audio Analysis of Consumer and Semi-Professional Multimedia

ISCApad #167

Sunday, May 13, 2012 by Chris Wellekens

3-2-7 (2012-09-09) Special Session at Interspeech 2012 Speech and Audio Analysis of Consumer and Semi-Professional Multimedia
  

  Special Session at Interspeech 2012

Speech and Audio Analysis of Consumer and Semi-Professional Multimedia

             http://interspeech2012.org/Special.html

**********************************************************************


Consumer-grade and semi-professional multimedia material (video) is becoming abundant on the Internet and other online archives. It is easier than ever to download material of any kind. With cell-phones now featuring video recording capability along with broadband connectivity, multimedia material can be recorded and distributed across the world just as easily as text could just a couple of years ago. The easy availability of vast amounts of text gave a huge boost to the Natural Language Processing and Information Retrieval research communities, The above-mentioned multimedia material is set to do the same for multi-modal audio and video analysis and generation. We argue that the speech and language research community should embrace that trend, as it would profit vastly from the availability of this material, and has significant own know-how and experience to contribute, which will help shape this field.

Consumer-created (as opposed to broadcast news, “professional style”) multimedia material offers a great opportunity for research on all aspects of human-to-human as well as man-machine interaction, which can be processed offline, but on a much larger scale than is possible in online, controlled experiments. Speech is naturally an important part of these interactions, which can link visual objects, people, and other observations across modalities. Research results will inform future research and development directions in interactive settings, e.g. robotics, interactive agents, etc., and give a significant boost to core (offline) analysis techniques such as robust audio and video processing, speech and language understanding, as well as multimodal fusion.

Large-scale multi-modal analysis of audio-visual material is beginning in a number of multi-site research projects across the world, driven by various communities, such as information retrieval, video search, copyright protection, etc. While each of these have slightly different targets, they are facing largely the same challenges: how to robustly and efficiently process large amounts of data, how to represent and then fuse information across modalities, how to train classifiers and segmenters on un-labeled data, how to include human feedback, etc. Speech, language and audio researchers have considerable interest and experience in these areas, and should be at the core and forefront of this research. To make progress at a useful rate, researchers must be connected in a focused way, and be aware of each other’s work, in order to discuss algorithmic approaches, ideas for evaluation and comparisons across corpora and modalities, training methods with various degrees of supervision, available data sets, etc. Sharing software, databases, research results and projects' descriptions are some of the key elements to success which are at the core of the Speech and Language in Multimedia (SLIM) SIG's objectives.

The special session will serve these goals by bringing together researchers from different fields – speech, but also audio, multimedia – to share experience, resources and foster new research directions and initiatives. Contributions are expected on all aspects of speech and audio processing for multimedia contents: research results but also presentation of ongoing research projects or software, multimedia databases and benchmarking initiatives, etc. A special session, as opposed to a regular session, offers unique opportunities to emphasize interaction between participants with the goal of strengthening and growing the SLIM community. The following format will be adopted: a few selected talks targeting a large audience (e.g., project or dataset descriptions, overview) will open the session, followed by a panel and open discussion on how to develop our community along with poster presentations.


                                                                                                            
  Assistant Research Professor
  Language Technologies Institute
  School of Computer Science
  Carnegie Mellon University


Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA