Study fork of Meta VJEPA2 — self-supervised video representation learning via joint embedding predictive architecture, with notes on video masking strategy, predictor design, and downstream evaluation.
computer-vision deep-learning video-understanding self-supervised-learning predictive-coding joint-embedding jepa video-joint-embedding-predictive-architecture self-supervised-video spatiotemporal-representation
-
Updated
Jun 13, 2025 - Python