Conditional Multi-Event Temporal Grounding in Long-Form Video

Check out the paper here.