Real-Time Chunking (RTC)¶

Why predicted actions go stale during inference, and how to fix it with delay-aware blending.

The Problem¶

Action chunking predicts 16 future actions at once. But inference takes time — ~50ms on Jetson Orin. During those 50ms, the robot keeps moving. By the time the new prediction arrives, the first few actions are stale.

Time ──────────────────────────────────────────────►

Chunk 1: [a₁ a₂ a₃ a₄ a₅ a₆ ... a₁₆]
              ▲
              │ Inference starts (50ms @ 50Hz = ~3 steps)
              │ Robot executes a₃, a₄, a₅ while waiting
              │
Chunk 2: [b₁ b₂ b₃ b₄ b₅ b₆ ... b₁₆]
          ▲
          └── b₁, b₂, b₃ are stale — robot already passed these!

Naive approach: execute b₁ anyway. The robot jerks backward, then catches up. This is the chunk boundary problem — the most common source of discontinuous motion in VLA policies.

Two Fixes (No Denoiser Required)¶

Real-Time Chunking was introduced by Physical Intelligence for flow-matching policies like π₀. Their full approach uses gradient-based guidance during iterative denoising.

Neon uses a single-pass MLP — no iterative denoising. But two of RTC's three ideas are model-agnostic and apply directly:

1. Skip Stale Actions¶

Measure actual inference latency. Convert to timesteps. Skip them.

delay_steps = ⌈latency × control_freq⌉
            = ⌈0.050 × 50⌉
            = 3

Chunk 2: [b₁ b₂ b₃ | b₄ b₅ b₆ ... b₁₆]
          ─────────   ───────────────────
            skip 3     execute from here

The robot never receives stale commands. No jerk.

2. Blend the Overlap¶

When the new chunk arrives, leftover actions from the old chunk are still valid predictions for the near future. Instead of throwing them away, blend old and new with decaying weights:

Old leftover:  [a₆  a₇  a₈  a₉  a₁₀  a₁₁  ...  ]
New chunk:     [b₄  b₅  b₆  b₇  b₈   b₉   ...  b₁₆]
Weights:       [0.9 0.7 0.5 0.3 0.1  0.0   ...  0.0 ]
                ─── high trust ───  ── transition ──  ── new only ──

Near the overlap start: trust the old prediction (the robot committed to it). Further out: trust the new prediction (it has fresher observations).

Blend Schedules¶

Neon supports four blending strategies:

Linear (default)ExponentialEMALatest

Weights decay linearly from 1 → 0 across the overlap region. Good default for most tasks.

w(t) = 1 - t/horizon

Faster initial decay, holds longer at the boundary. Smoother for fine manipulation.

w(t) = linear(t) × expm1(linear(t)) / (e - 1)

Uniform weight across all overlapping steps. This is what Neon v1 did — a flat α/β split.

w = 0.3 everywhere in overlap

No blending. Always use the newest prediction. Most responsive, but potential discontinuities.

w = 0 everywhere

Visual Comparison¶

      OLD chunk          NEW chunk
      ──────────         ──────────

EMA:  ░░░░░░░░░░  →  ████████████
      uniform 30%       uniform 70%

Linear: █▓▓▒▒░░░  →  ░░▒▒▓▓██████
        1.0 → 0.0       0.0 → 1.0

Latest: ░░░░░░░░░░  →  ████████████
        ignore old       100% new

Usage¶

Default (RTC enabled)¶

from neon.inference.server import NeonInferenceServer

server = NeonInferenceServer(
    model_path="cagataydev/neon-g1-v1",
    blend_schedule="linear",    # "linear" | "exp" | "ema" | "latest"
    execution_horizon=10,       # blend across 10 steps
    control_freq=50.0,          # robot control rate
)

# Control loop
while running:
    action = server.get_action()

    if action is None:
        # Queue empty — time for a new prediction
        server.predict(
            image=camera.read(),
            instruction="Pick up the red cup",
            proprioception=robot.get_joints(),
            rtc=True,   # ← delay-aware queue
        )
        action = server.get_action()

    robot.send(action)
    time.sleep(1/50)  # 50 Hz control

Legacy Mode (Neon v1)¶

output = server.predict(
    image=frame,
    instruction="Pick up the cup",
    proprioception=joints,
    smooth=True,  # ← EMA only, no queue
    rtc=False,
)

HTTP API¶

# Start with RTC
python -m neon.inference.server \
    --model cagataydev/neon-g1-v1 \
    --blend linear \
    --horizon 10 \
    --freq 50

# Predict + queue actions
curl -X POST http://localhost:8300/predict \
    -d '{"instruction": "wave", "rtc": true}'

# Pop next action from queue
curl -X POST http://localhost:8300/action

Tuning Guide¶

Parameter	Default	Effect of ↑	Effect of ↓
`execution_horizon`	10	Smoother transitions, slower to react	More responsive, potential jumps
`control_freq`	50	More granular delay compensation	Fewer steps skipped
`blend_schedule`	linear	—	—

Rules of thumb:

Fine manipulation (pouring, insertion): exp schedule, horizon 12-16
Fast reaching (pick up objects): linear schedule, horizon 6-8
Locomotion (walking): linear schedule, horizon 10
Debugging: latest schedule (see raw predictions, no blending)

What We Didn't Port (and Why)¶

RTC's full algorithm includes gradient-based prefix guidance — during each denoising step, it computes ∂x₁/∂xₜ and applies a correction that steers the prediction toward the previous chunk's trajectory.

This requires:

Iterative denoising — a loop of N steps refining noise → actions
Differentiable denoiser — torch.autograd.grad() through the model
Time parameter τ — normalized denoising progress for guidance weight

Neon's action decoder is a single-pass MLP: one forward pass, done. No iteration, no τ, no gradient to steer. The prefix guidance is elegant but fundamentally tied to flow-matching architectures.

The two things we did port — delay skipping and prefix blending — give 80% of the benefit with 0% of the complexity.

Future: if Neon adds a diffusion/flow head

If we add a diffusion-based action head (see Action Heads), full RTC guidance becomes possible. The ActionQueue is already designed to support it — the _blend_with_prefix method would be replaced by gradient-guided denoising.

References¶

Black, K., Galliker, M.Y., Levine, S. (2025). Real-Time Execution of Action Chunking Flow Policies. arXiv:2506.07339
Physical Intelligence. OpenPI: π₀ and π₀.₅ Implementation. Apache-2.0.
LeRobot. RTC Module. HuggingFace. Apache-2.0.

→ Next: Data Soup — mixing data sources for robust training