The Generative Bottleneck

Video World Models are Powerful
but Autoregressive Generation is Slow

Synthesizing high-quality spatial dynamics requires redundant deep network forward passes across hundreds of denoising steps.

The Flaw in Existing Solutions

Naive Caching Causes Motion Drift

Ground Truth

Stale Reuse

Averages hide local motion. Blindly copying old activations destroys temporal coherence before global loss triggers a refresh.

WorldCache

Content-Aware Caching for Accelerated World Models

A training-free framework that predicts skipped computation rather than copying it over blindly.

Smart Architecture

Motion-Aware Caching Logic

WorldCache treats caching like a localized prediction. It controls the pace with causal tracking while interpolating the next state.

Core Components

Driven by Four Key Ideologies

Causal Feature Caching

Dynamically scales caching tolerance based on early layer motion velocity.

Saliency Weighted Drift

Penalizes caching errors in perceptually critical high-frequency regions.

Optimal Feature Approx.

Interpolates skipped cache states using trajectory matching.

Adaptive Scheduling

Exponentially relaxes caching constraints in later denoising stages.

Early
Late
Empirical Benchmarks
2.30×
Uncompromised Speedup

Over baseline architectures while strictly maintaining visual fidelity, motion dynamics, and prompt adherence across Cosmos, WAN2.1, and DreamDojo.

Technical Rigor

Evaluation & Generalization

World Types
Image2World Text2World
Model Backbones
Cosmos-Predict2.5 (2B) Cosmos-Predict2.5 (14B) WAN2.1 (1.3B) WAN2.1 (14B) DreamDojo (1.3B)
Benchmarks
PAI-Eval EgoDex-Eval
Cosmos-Predict 2.5 14B · WorldCache Generation outputs

Flawless Temporal Coherence