Abstract:Video data is critical for learning systems that can perceive, interact with and reason about the world. Going beyond images, videos can provide useful cues about motion, causality, and even geometry, while often consisting of multiple modes of data. Yet, they also heighten the…