Why simple feature reuse breaks in world models
Standard training-free caching methods reuse stale activations whenever average drift looks small. That shortcut can fail in dynamic scenes, where local motion and perceptually important objects change long before the global average signals trouble.
WorldCache is a training-free caching framework for diffusion-transformer world models. It improves both when to reuse features and how to approximate skipped computation through motion-adaptive thresholds, saliency-weighted drift estimation, optimal feature approximation, and adaptive threshold scheduling across denoising steps.
Static averages hide local motion
Large backgrounds mask meaningful changes in hands, agents, or manipulated objects.
Copying the past is too crude
Frozen snapshots create ghosting, blur, and motion drift once the rollout diverges.
Late denoising has more reuse
After the global layout forms, better caching decisions produce the largest gains.
WorldCache treats caching like a careful prediction, not a blind shortcut.
Video world models spend much of their time repeating similar computation across denoising steps. WorldCache saves time by reusing deep features only when the scene is stable enough, then estimating the skipped features with motion-aware approximation.
The four modules in WorldCache
WorldCache changes the skip rule, the reuse rule, and the threshold schedule so that caching follows motion and denoising phase instead of relying on a single fixed heuristic.
Causal Feature Caching
Results across Cosmos, WAN2.1, and EgoDex-Eval
Switch between main benchmarks, transfer results, robotics evaluation, and the unified all-results view.
Speed-quality frontier
Latency comparison
Lower latency is better. Each bar shows the paper's reported runtime and speedup.
Paper table values
How the gain changes with denoising step budget
Longer denoising trajectories give caching more opportunities. This view tracks latency and speedup across reported step budgets.
Interactive step-budget view
Select a budget to inspect latency and speedup values.
Qualitative evidence across scenes and scales
WorldCache stays closer to the baseline rollout in dynamic and interaction-heavy regions where simpler caching strategies drift.
Reference
If you find WorldCache useful in your research, please consider citing our work.
@article{nawaz2026worldcache,
title = {WorldCache: Content-Aware Caching for Accelerated Video World Models},
author = {Umair Nawaz and Ahmed Heakl and Ufaq Khan and Abdelrahman Shaker and Salman Khan and Fahad Shahbaz Khan},
eprint = {2603.22286},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2603.22286},
year = {2026}
}