Call for Participation MediaEval 2015 Multimedia Benchmark Evaluation http://www.multimediaeval.org Early registration deadline: 1 May 2015 --------------------------------------------------
MediaEval is a multimedia benchmark evaluation that offers tasks promoting research and innovation in areas related to human and social aspects of multimedia. MediaEval 2015 focuses on aspects of multimedia including and going beyond visual content, such as language, speech, music, and social factors. Participants carry out one or more of the tasks offered and submit runs to be evaluated. They then write up their results and present them at the MediaEval 2015 workshop.
For each task, participants receive a task definition, task data and accompanying resources (dependent on task) such as shot boundaries, keyframes, visual features, speech transcripts and social metadata. In order to encourage participants to develop techniques that push forward the state-of-the-art, a 'required reading' list of papers will be provided for each task.
Participation is open to all interested research groups. To sign up, please click the ?MediaEval 2015 Registration? link at:
The following tasks are available to participants at MediaEval 2015:
*QUESST: Query by Example Search on Speech Task* The task involves searching FOR audio content WITHIN audio content USING an audio content query. This task is particularly interesting for speech researchers in the area of spoken term detection or low-resource/zero-resource speech processing. The primary performance metric will be the normalized cross entropy cost (Cnxe).
*Multimodal Person Discovery in Broadcast TV (New in 2015!)* Given raw TV broadcasts, each shot must be automatically tagged with the name(s) of people who can be both seen as well as heard in the shot. The list of people is not known a priori and their names must be discovered in an unsupervised way from provided text overlay or speech transcripts. The task will be evaluated on a new French corpus (provided by INA) and the AGORA Catalan corpus, using standard information retrieval metrics based on a posteriori collaborative annotation of the corpus.
*C@merata: Querying Musical Scores* The input is a natural language phrase referring to a musical feature (e.g., ?consecutive fifths?) together with a classical music score, and the required output is a list of passages in the score which contain that feature. Scores are in the MusicXML format, which can capture most aspects of Western music notation. Evaluation is via versions of Precision and Recall relative to a Gold Standard produced by the organisers.
*Affective Impact of Movies (including Violent Scenes Detection)* In this task participating teams are expected to classify short movie scenes by their affective content according to two use cases: (1) the presence of depicted violence, and (2) their emotional impact (valence, arousal). The training data consists of short Creative Commons-licensed movie scenes (both professional and amateur) together with human annotations of violence and valence-arousal ratings. The results will be evaluated using standard retrieval and classification metrics.
*Emotion in Music (An Affect Task)* We aim at detecting emotional dynamics of music using its content. Given a set of songs, participants are asked to automatically generate continuous emotional representations in arousal and valence.
*Retrieving Diverse Social Images* This task requires participants to refine a ranked list of Flickr photos with location related information using provided visual, textual and user credibility information. Results are evaluated with respect to their relevance to the query and the diverse representation of it.
*Placing: Multimodal Geo-location Prediction* The Placing Task requires participants to estimate the locations where multimedia items (photos or videos) were captured solely by inspecting the content and metadata of these items, and optionally exploiting additional knowledge sources such as gazetteers. Performance is evaluated using the distance to the ground truth coordinates of the multimedia items.
*Verifying Multimedia Use (New in 2015!)* For this task, the input is a tweet about an event that has the profile to be of interest in the international news, and the accompanying multimedia item (image or video). Participants must build systems that output a binary decision representing a verification of whether the multimedia item reflects the reality of the event in the way purported by the tweet. The task is evaluated using the F1 score. Participants are also requested to return a short explanation or evidence for the verification decision.
*Context of Experience: Recommending Videos Suiting a Watching Situation (New in 2015!)* This task develops multimodal techniques for automatic prediction of multimedia in a specific consumption context. In particular, we focus on the context of predicting movies that are suitable to watch on airplanes. Input to the prediction methods are movie trailers, and metadata from IMDb. Output is evaluated using the Weighted F1 score, with expert labels as ground truth.
*Reliability of Social Multimedia Annotations (New in 2015!)* Input is a set of underwater photos with user-generated annotations and other addition social information taken from a social scuba divers website, and output is a ranked list of the least reliable user-generated annotations. Systems will be evaluated using a labeling of fish species created by expert annotators.
*Synchronization of Multi-User Event Media* This task addresses the challenge of automatically creating a chronologically-ordered outline of multiple multimedia collections corresponding to the same event. Given N media collections (galleries) taken by different users/devices at the same event, the goal is to find the best (relative) time alignment among them and detect the significant sub-events over the whole gallery. Performance is evaluated using ground truth time codes and actual event schedules.
*DroneProtect: Mini-drone Video Privacy Task (New in 2015!)* Recent popularity of mini-drones and their rapidly increasing adoption in various areas, including photography, news reporting, cinema, mail delivery, cartography, agriculture, and military, raises concerns for privacy protection and personal safety. Input to the task is drone video, and output is version of the video which protects privacy while retaining key information about the event or situation recorded.
*Search and Anchoring in Video Archives* The 2015 Search and Anchoring in Video Archives task consists of two sub-tasks: search for multimedia content and automatic anchor selection. In the ?search for multimedia content? sub-task, participants use multimodal textual and visual descriptions of content of interest to retrieve potentially relevant video segments from within a collection. In the ?automatic anchor selection? sub-task, participants automatically predict key elements of videos as anchor points for the formation of hyperlinks to relevant content within the collection. The video collection consists of professional broadcasts from BBC or semi-professional user generated content. Participant submissions will be assessed using professionally-created anchors, and crowdsourcing-based evaluation.
Mid?March-May: Registration and return usage agreements. May-June: Release of development/training data. June-July: Release of test data. Mid-Aug.: Participants submit their completed runs, and receive results. End Aug: Participants submit their 2-page working notes papers. 14-15 September: MediaEval 2015 Workshop, Wurzen, Germany. Workshop as a satellite event of Interspeech 2015, held nearby in Dresden the previous week.
We ask you to register by 1 May (because of the timing of the first wave of data releases). After that point, late registration will be possible, but we encourage teams to register as early as they can.
The ISCA SIG SLIM: Speech and Language in Multimedia (http://slim-sig.irisa.fr) is a key supporter of MediaEval. This year, the MediaEval workshop will be held as a satellite event of Interspeech (http://interspeech2015.org).