ISCA - International Speech
Communication Association

ISCApad Archive  »  2015  »  ISCApad #205  »  Academic and Industry Notes  »  MediaEval 2015 Multimedia Benchmark

ISCApad #205

Wednesday, July 08, 2015 by Chris Wellekens

4-9 MediaEval 2015 Multimedia Benchmark


Call for Participation
MediaEval 2015 Multimedia Benchmark Evaluation
Early registration deadline: 1 May 2015

MediaEval is a multimedia benchmark evaluation that offers tasks promoting research and
innovation in areas related to human and social aspects of multimedia. MediaEval 2015
focuses on aspects of multimedia including and going beyond visual content, such as
language, speech, music, and social factors. Participants carry out one or more of the
tasks offered and submit runs to be evaluated. They then write up their results and
present them at the MediaEval 2015 workshop.

For each task, participants receive a task definition, task data and accompanying
resources (dependent on task) such as shot boundaries, keyframes, visual features, speech
transcripts and social metadata. In order to encourage participants to develop techniques
that push forward the state-of-the-art, a 'required reading' list of papers will be
provided for each task.

Participation is open to all interested research groups. To sign up, please click the
?MediaEval 2015 Registration? link at:

The following tasks are available to participants at MediaEval 2015:

*QUESST: Query by Example Search on Speech Task*
The task involves searching FOR audio content WITHIN audio content USING an audio content
query. This task is particularly interesting for speech researchers in the area of spoken
term detection or low-resource/zero-resource speech processing. The primary  performance
metric will be the normalized cross entropy cost (Cnxe).

*Multimodal Person Discovery in Broadcast TV (New in 2015!)*
Given raw TV broadcasts, each shot must be automatically tagged with the name(s) of
people who can be both seen as well as heard in the shot. The list of people is not known
a priori and their names must be discovered in an unsupervised way from provided text
overlay or speech transcripts. The task will be evaluated on a new French corpus
(provided by INA) and the AGORA Catalan corpus, using standard information retrieval
metrics based on a posteriori collaborative annotation of the corpus.

*C@merata: Querying Musical Scores*
The input is a natural language phrase referring to a musical feature (e.g., ?consecutive
fifths?) together with a classical music score, and the required output is a list of
passages in the score which contain that feature. Scores are in the MusicXML format,
which can capture most aspects of Western music notation. Evaluation is via versions of
Precision and Recall relative to a Gold Standard produced by the organisers.

*Affective Impact of Movies (including Violent Scenes Detection)*
In this task participating teams are expected to classify short movie scenes by their
affective content according to two use cases: (1) the presence of depicted violence, and
(2) their emotional impact (valence, arousal). The training data consists of short
Creative Commons-licensed movie scenes (both professional and amateur) together with
human annotations of violence and valence-arousal ratings. The results will be evaluated
using standard retrieval and classification metrics.

*Emotion in Music (An Affect Task)*
We aim at detecting emotional dynamics of music using its content. Given a set of songs,
participants are asked to automatically generate continuous emotional representations in
arousal and valence.

*Retrieving Diverse Social Images*
This task requires participants to refine a ranked list of Flickr photos with location
related information using provided visual, textual and user credibility information.
Results are evaluated with respect to their relevance to the query and the diverse
representation of it.

*Placing: Multimodal Geo-location Prediction*
The Placing Task requires participants to estimate the locations where multimedia items
(photos or videos) were captured solely by inspecting the content and metadata of these
items, and optionally exploiting additional knowledge sources such as gazetteers.
Performance is evaluated using the distance to the ground truth coordinates of the
multimedia items.

*Verifying Multimedia Use (New in 2015!)*
For this task, the input is a tweet about an event that has the profile to be of interest
in the international news, and the accompanying multimedia item (image or video).
Participants must build systems that output a binary decision representing a verification
of whether the multimedia item reflects the reality of the event in the way purported by
the tweet. The task is evaluated using the F1 score. Participants are also requested to
return a short explanation or evidence for the verification decision.

*Context of Experience: Recommending Videos Suiting a Watching Situation (New in 2015!)*
This task develops multimodal techniques for automatic prediction of multimedia in a
specific consumption context. In particular, we focus on the context of predicting movies
that are suitable to watch on airplanes. Input to the prediction methods are movie
trailers, and metadata from IMDb. Output is evaluated using the Weighted F1 score, with
expert labels as ground truth.

*Reliability of Social Multimedia Annotations (New in 2015!)*
Input is a set of underwater photos with user-generated annotations and other addition
social information taken from a social scuba divers website, and output is a ranked list
of the least reliable user-generated annotations. Systems will be evaluated using a
labeling of fish species created by expert annotators.

*Synchronization of Multi-User Event Media*
This task addresses the challenge of automatically creating a chronologically-ordered
outline of multiple multimedia collections corresponding to the same event. Given N media
collections (galleries) taken by different users/devices at the same event, the goal is
to find the best (relative) time alignment among them and detect the significant
sub-events over the whole gallery. Performance is evaluated using ground truth time codes
and actual event schedules.

*DroneProtect: Mini-drone Video Privacy Task (New in 2015!)*
Recent popularity of mini-drones and their rapidly increasing adoption in various areas,
including photography, news reporting, cinema, mail delivery, cartography, agriculture,
and military, raises concerns for privacy protection and personal safety. Input to the
task is drone video, and output is version of the video which protects privacy while
retaining key information about the event or situation recorded.

*Search and Anchoring in Video Archives*
The 2015 Search and Anchoring in Video Archives task consists of two sub-tasks: search
for multimedia content and automatic anchor selection. In the ?search for multimedia
content? sub-task, participants use multimodal textual and visual descriptions of content
of interest to retrieve potentially relevant video segments from within a collection. In
the ?automatic anchor selection? sub-task, participants automatically predict key
elements of videos as anchor points for the formation of hyperlinks to relevant content
within the collection. The video collection consists of professional broadcasts from BBC
or semi-professional user generated content. Participant submissions will be assessed
using professionally-created anchors, and crowdsourcing-based evaluation.

MediaEval 2015 Timeline
(dates vary slightly from task to task, see the individual task pages for the individual

Mid?March-May: Registration and return usage agreements.
May-June: Release of development/training data.
June-July: Release of test data.
Mid-Aug.: Participants submit their completed runs, and receive results.
End Aug: Participants submit their 2-page working notes papers.
14-15 September: MediaEval 2015 Workshop, Wurzen, Germany. Workshop as a satellite event
of Interspeech 2015, held nearby in Dresden the previous week.

We ask you to register by 1 May (because of the timing of the first wave of data
releases). After that point, late registration will be possible, but we encourage teams
to register as early as they can.

For questions or additional information please contact Martha Larson or visit

The ISCA SIG SLIM: Speech and Language in Multimedia ( is a key
supporter of MediaEval. This year, the MediaEval workshop will be held as a satellite
event of Interspeech (

A large number of organizations and projects make a contribution to MediaEval
organization, including the projects (alphabetical): Camomile
(, CrowdRec (, EONS
(, PHENICX (, Reveal
(, VideoSense (, Visen


Back  Top

 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA