VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

Hidir Yesiltepe1, Jiazhen Hu1, Tuna Han Salih Meral1, Adil Kaan Akan2, Kaan Oktay2, Hoda Eldardiry1, Pinar Yanardag1
1Virginia Tech   2fal

Teaser

Qualitative Results

5 Seconds
|

30 Seconds
|

VideoMLA - LongSANA Comparison
|

VideoMLA (Ours)

LongSANA

VideoMLA (Ours)

LongSANA

VideoMLA (Ours)

LongSANA

VideoMLA (Ours)

LongSANA

VideoMLA (Ours)

LongSANA

VideoMLA (Ours)

LongSANA

VideoMLA (Ours)

LongSANA

VideoMLA (Ours)

LongSANA

Ablations

NoPE - RoPE Ablations
|

RoPE Heavy (RoPE=96, NoPE=32)

Balanced (RoPE=64, NoPE=64)

NoPE Heavy (RoPE=32, NoPE=96)

Qualitative Comparison
|

5 Seconds