Synthesizing high-quality spatial dynamics requires redundant deep network forward passes across hundreds of denoising steps.
Ground Truth
Stale Reuse
Averages hide local motion. Blindly copying old activations destroys temporal coherence before global loss triggers a refresh.
Content-Aware Caching for Accelerated World Models
A training-free framework that predicts skipped computation rather than copying it over blindly.
WorldCache treats caching like a localized prediction. It controls the pace with causal tracking while interpolating the next state.
Dynamically scales caching tolerance based on early layer motion velocity.
Penalizes caching errors in perceptually critical high-frequency regions.
Interpolates skipped cache states using trajectory matching.
Exponentially relaxes caching constraints in later denoising stages.
Over baseline architectures while strictly maintaining visual fidelity, motion dynamics, and prompt adherence across Cosmos, WAN2.1, and DreamDojo.