Temporal learning for multimedia intelligence

Video QA

Grounding

Diffusion


Subject 1: A S* dog
A S* dog and a V* cat are jumping over a river.
Subject 2: A V* cat
A S* dog and a V* cat are playing chess.

H2V for sports lives