Real-Time Chunking (RTC)¶
Why predicted actions go stale during inference, and how to fix it with delay-aware blending.
The Problem¶
Action chunking predicts 16 future actions at once. But inference takes time — ~50ms on Jetson Orin. During those 50ms, the robot keeps moving. By the time the new prediction arrives, the first few actions are stale.
Time ──────────────────────────────────────────────►
Chunk 1: [a₁ a₂ a₃ a₄ a₅ a₆ ... a₁₆]
▲
│ Inference starts (50ms @ 50Hz = ~3 steps)
│ Robot executes a₃, a₄, a₅ while waiting
│
Chunk 2: [b₁ b₂ b₃ b₄ b₅ b₆ ... b₁₆]
▲
└── b₁, b₂, b₃ are stale — robot already passed these!
Naive approach: execute b₁ anyway. The robot jerks backward, then catches up. This is the chunk boundary problem — the most common source of discontinuous motion in VLA policies.
Two Fixes (No Denoiser Required)¶
Real-Time Chunking was introduced by Physical Intelligence for flow-matching policies like π₀. Their full approach uses gradient-based guidance during iterative denoising.
Neon uses a single-pass MLP — no iterative denoising. But two of RTC's three ideas are model-agnostic and apply directly:
1. Skip Stale Actions¶
Measure actual inference latency. Convert to timesteps. Skip them.
delay_steps = ⌈latency × control_freq⌉
= ⌈0.050 × 50⌉
= 3
Chunk 2: [b₁ b₂ b₃ | b₄ b₅ b₆ ... b₁₆]
───────── ───────────────────
skip 3 execute from here
The robot never receives stale commands. No jerk.
2. Blend the Overlap¶
When the new chunk arrives, leftover actions from the old chunk are still valid predictions for the near future. Instead of throwing them away, blend old and new with decaying weights:
Old leftover: [a₆ a₇ a₈ a₉ a₁₀ a₁₁ ... ]
New chunk: [b₄ b₅ b₆ b₇ b₈ b₉ ... b₁₆]
Weights: [0.9 0.7 0.5 0.3 0.1 0.0 ... 0.0 ]
─── high trust ─── ── transition ── ── new only ──
Near the overlap start: trust the old prediction (the robot committed to it). Further out: trust the new prediction (it has fresher observations).
Blend Schedules¶
Neon supports four blending strategies:
Weights decay linearly from 1 → 0 across the overlap region. Good default for most tasks.
Faster initial decay, holds longer at the boundary. Smoother for fine manipulation.
Uniform weight across all overlapping steps. This is what Neon v1 did — a flat α/β split.
Visual Comparison¶
OLD chunk NEW chunk
────────── ──────────
EMA: ░░░░░░░░░░ → ████████████
uniform 30% uniform 70%
Linear: █▓▓▒▒░░░ → ░░▒▒▓▓██████
1.0 → 0.0 0.0 → 1.0
Latest: ░░░░░░░░░░ → ████████████
ignore old 100% new
Usage¶
Default (RTC enabled)¶
from neon.inference.server import NeonInferenceServer
server = NeonInferenceServer(
model_path="cagataydev/neon-g1-v1",
blend_schedule="linear", # "linear" | "exp" | "ema" | "latest"
execution_horizon=10, # blend across 10 steps
control_freq=50.0, # robot control rate
)
# Control loop
while running:
action = server.get_action()
if action is None:
# Queue empty — time for a new prediction
server.predict(
image=camera.read(),
instruction="Pick up the red cup",
proprioception=robot.get_joints(),
rtc=True, # ← delay-aware queue
)
action = server.get_action()
robot.send(action)
time.sleep(1/50) # 50 Hz control
Legacy Mode (Neon v1)¶
output = server.predict(
image=frame,
instruction="Pick up the cup",
proprioception=joints,
smooth=True, # ← EMA only, no queue
rtc=False,
)
HTTP API¶
# Start with RTC
python -m neon.inference.server \
--model cagataydev/neon-g1-v1 \
--blend linear \
--horizon 10 \
--freq 50
# Predict + queue actions
curl -X POST http://localhost:8300/predict \
-d '{"instruction": "wave", "rtc": true}'
# Pop next action from queue
curl -X POST http://localhost:8300/action
Tuning Guide¶
| Parameter | Default | Effect of ↑ | Effect of ↓ |
|---|---|---|---|
execution_horizon |
10 | Smoother transitions, slower to react | More responsive, potential jumps |
control_freq |
50 | More granular delay compensation | Fewer steps skipped |
blend_schedule |
linear | — | — |
Rules of thumb:
- Fine manipulation (pouring, insertion):
expschedule, horizon 12-16 - Fast reaching (pick up objects):
linearschedule, horizon 6-8 - Locomotion (walking):
linearschedule, horizon 10 - Debugging:
latestschedule (see raw predictions, no blending)
What We Didn't Port (and Why)¶
RTC's full algorithm includes gradient-based prefix guidance — during each denoising step, it computes ∂x₁/∂xₜ and applies a correction that steers the prediction toward the previous chunk's trajectory.
This requires:
- Iterative denoising — a loop of N steps refining noise → actions
- Differentiable denoiser —
torch.autograd.grad()through the model - Time parameter τ — normalized denoising progress for guidance weight
Neon's action decoder is a single-pass MLP: one forward pass, done. No iteration, no τ, no gradient to steer. The prefix guidance is elegant but fundamentally tied to flow-matching architectures.
The two things we did port — delay skipping and prefix blending — give 80% of the benefit with 0% of the complexity.
Future: if Neon adds a diffusion/flow head
If we add a diffusion-based action head (see Action Heads), full RTC guidance becomes possible. The ActionQueue is already designed to support it — the _blend_with_prefix method would be replaced by gradient-guided denoising.
References¶
- Black, K., Galliker, M.Y., Levine, S. (2025). Real-Time Execution of Action Chunking Flow Policies. arXiv:2506.07339
- Physical Intelligence. OpenPI: π₀ and π₀.₅ Implementation. Apache-2.0.
- LeRobot. RTC Module. HuggingFace. Apache-2.0.
→ Next: Data Soup — mixing data sources for robust training