| We have created and made publicly available a dense audio-visual person-oriented ground-truth annotation of a feature movie (100 minutes long): “Hannah and her sisters” by Woody Allen.
The annotation includes
• Face tracks in video (densely annotated, i.e., in each frame, and person-labeled)
• Speech segments in audio (person-labeled)
• Shot boundaries in video
The annotation can be useful for evaluating
• Person-oriented video-based tasks (e.g., face tracking, automatic character naming, etc.)
• Person-oriented audio-based tasks (e.g., speaker diarization or recognition)
• Person-oriented multimodal-based tasks (e.g., audio-visual character naming)
Detail on Hannah dataset and access to it can be obtained there:
https://research.technicolor.com/rennes/hannah-home/
https://research.technicolor.com/rennes/hannah-download/
Acknowledgments:
This work is supported by AXES EU project: http://www.axes-project.eu/
Alexey Ozerov Alexey.Ozerov@technicolor.com
Jean-Ronan Vigouroux,
Louis Chevallier
Patrick Pérez
Technicolor Research & Innovation
|