ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2019 » ISCApad #251 » Events » Other Events » (2019-06-14) CfS The How2 Challenge - New Tasks for Vision and Language

ISCApad #251

Sunday, May 12, 2019 by Chris Wellekens

3-3-37 (2019-06-14) CfS The How2 Challenge - New Tasks for Vision and Language

Call for Submissions

The How2 Challenge - New Tasks for Vision and Language

Research at the intersection of vision and language has attracted an increasing amount of attention over the last ten years. Current topics include the study of multi-modal representations, translation between modalities, bootstrapping of labels from one modality into another, visually-grounded question answering, embodied question-answering, segmentation and storytelling, and grounding the meaning of language in visual data. Still, these tasks may not be sufficient to fully exploit the potential of vision and language data.

To support research in this area, we recently released the How2 data-set,¹ containing 2000 hours of how-to instructional videos, with audio, subtitles, Brazilian Portuguese translations, and textual summaries, making it an ideal resource to bring together researchers working on different aspects of multimodal learning. We hope that a common dataset will facilitate comparisons of tools and algorithms, and foster collaboration.

We are organizing a workshop, “The How2 Challenge - New Tasks for Vision and Language” at ICML 2019, to bring together researchers and foster the exchange of ideas in this area. We seek submissions in the following two categories:

Papers that describe work on the How2 data, either on the “shared challenge tasks”, e.g. multi-modal speech recognition, machine translation, or video summarization challenge, described on the How2 web site² (a leader-board will be provided), or creating “un-shared”, novel tasks.
Papers that describe other related and relevant work to further vision and language ideas by proposing new tasks, or analyzing the utility of existing tasks and data sets in interesting ways

The organizers encourage both the publication of novel work that is relevant to the topics of discussion, and late-breaking results on the How2 tasks in a single format. The workshop will also feature a number of invited talks, and a moderated discussion around the challenges and opportunities that current tasks in vision and language present. We aim to stimulate discussion around new tasks that go beyond image captioning and visual question answering, and which could form the basis for future research in this area. We seek to create a venue to encourage collaboration between different sub-fields and help establish new research directions that we believe will sustain multimodal machine learning research for years to come.

The How2 Challenge uses the How2 Corpus (https://srvk.github.io/how2-dataset/)

Invited speakers:

Katerina Fragkiadaki (Carnegie Mellon University)
Lisa Anne Hendricks (UC Berkeley)
Qin Jin (Renmin University)
Angeliki Lazaridou (DeepMind)
Devi Parikh (Georgia Tech)
Kate Saenko (Boston University)
Bernt Schiele (Max Plank Institute for Informatics)

Important dates:

Challenge starts: March 15, 2019

Paper submission: May 15, 2019

Notification: May 22, 2019
Workshop date: June 14 or 15, 2019

For more information, visit https://srvk.github.io/how2-challenge/

Contact us at how2challenge@gmail.com.

1https://github.com/srvk/how2-dataset

2https://srvk.github.io/how2-challenge/

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy