ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2013 » ISCApad #184 » Events » Other Events » (2013-10-18) The 2013 Similar Segments in Social Speech Task Barcelona Spain

ISCApad #184

Friday, October 11, 2013 by Chris Wellekens

3-3-2 (2013-10-18) The 2013 Similar Segments in Social Speech Task Barcelona Spain

The 2013 Similar Segments in Social Speech Task

With users' growing willingness to share personal activity information, the eventual acceptance of social multimedia, including video and audio recordings of casual interactions, is inevitable. To unlock the potential value, we need to develop methods for searching such recordings, and this task is intended to support research in this area. It is likely to be of interest to researchers in the areas of speech technology, information retrieval, dialog, and topic modeling. The task involves searching in social multimedia, specifically conversations between students in an academic department. The scenario is this: A new member has joined an organization or social group that has a small archive of conversations among its members. He starts to listen, looking for any information that can help him better understand, participate in, enjoy, find friends in, and succeed in this group. As he listens to the archive (perhaps at random, perhaps based on some social tags, perhaps based on an initial keyword search) he finds something of interest, and wants to find more like it, across the entire archive. He marks what he found as a region of interest and requests more like it. The system comes back with a set of ``jump-in'' points, places in the archive to which he could jump and start listening/watching with the expectation of finding something similar.
In the task, the input to the systems will be a 1-10 second audio/video region of interest, and the desired output an ordered list of regions similar to it, matching as closely as possible the judgments of human searchers. Task participants will receive a 2-hour collection of dyadic conversations, each 5-10 minutes in length, by members of semi-cohesive group. These will include video, two-microphone stereo audio, speech recognition transcripts and a small set of prosodic features computed every 10 milliseconds. Metadata will include the native languages of the speakers. The dataset will be supplied under a permissive Creative Commons license. There will be several dozen similarity sets, each containing 5-40 regions, each about 3-20 seconds long, which were judged by one of the user population to all be similar in some way.
The test set will be a smaller set of conversations and a set of regions of interest, or seeds. For each seed, a system will return a list of jump-in points for its inferred similar-region set.

Task schedule (tentative)

April 1: Familiarization pack release

May 1: Development data release

July 1: Test set release

September 5 : Run submission deadline

October 18-19: Workshop, in Barcelona

This task is organized under the auspices of MediaEval 2013.

Further information is available at http://www.multimediaeval.org/mediaeval2013/socialspeech2013/ and http://www.cs.utep.edu/nigel/ssss/, or from the organizers: Nigel Ward, University of Texas at El Paso, USA; David G. Novick, University of Texas at El Paso, USA; Tatsuya Kawahara, Kyoto University, Japan; Elizabeth Shriberg, Microsoft, USA; Louis-Philippe Morency, University of Southern California, USA; Catharine Oertel, KTH, Sweden.

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy