Skip to content

Relative Actions (Cosmos)

The same physical movement should produce the same numbers — regardless of which robot performs it. That's the promise of relative actions, ported from Cosmos-Predict2.5.


The Problem

Different robots have different kinematics. "Move the gripper forward 5cm" requires completely different joint angles on a Franka, an SO-100, and a G1 humanoid. Absolute joint positions are body-specific. They don't transfer.

The Solution: Gripper-Frame Relative Actions

Compute actions as relative displacements in the gripper's own coordinate frame:

graph TD
    subgraph "Previous State"
        P1["Gripper at (x₁, y₁, z₁)<br/>Rotation R₁"]
    end
    subgraph "Current State"
        P2["Gripper at (x₂, y₂, z₂)<br/>Rotation R₂"]
    end
    subgraph "Relative Action"
        RA["Δxyz = R₁ᵀ · (xyz₂ - xyz₁)<br/>Δrot = euler(R₁ᵀ · R₂)<br/>gripper_state"]
    end

    P1 --> RA
    P2 --> RA

    style RA fill:#e65100,color:#fff

The formula:

rel_xyz = prev_rotm.T @ (curr_xyz - prev_xyz)    # Position delta in gripper frame
rel_rot = rotm2euler(prev_rotm.T @ curr_rotm)     # Rotation delta
action  = [rel_xyz(3), rel_rot(3), gripper(1)]     # 7-DoF, any robot

Why this works: A 2cm forward push produces [0.02, 0, 0, 0, 0, 0, grip] on any robot. The same physical movement = the same numbers. Always. Regardless of workspace origin, coordinate convention, or robot geometry.


Implementation

Rotation Math

from neon.data.relative_actions import euler2rotm, rotm2euler, rotm2quat

R = euler2rotm([roll, pitch, yaw])    # (3,) → (3, 3) rotation matrix
angles = rotm2euler(R)                 # (3, 3) → (3,) back to Euler
quat = rotm2quat(R)                   # (3, 3) → (4,) quaternion [w, x, y, z]

ZYX convention throughout. Consistent with Cosmos and the URDF standard.

State-to-Action Conversion

from neon.data.relative_actions import states_to_relative_actions

actions = states_to_relative_actions(
    arm_states=ee_poses,          # (T, 6) [x, y, z, roll, pitch, yaw]
    gripper_states=gripper_widths, # (T,) [0=open, 1=closed]
    action_scaler=20.0,           # Cosmos default normalization
)
# → (T-1, 7) relative actions

Bridge Dataset API

Compatible with Cosmos/Bridge data format out of the box:

from neon.data.relative_actions import get_action_sequence_from_states

actions = get_action_sequence_from_states(
    data={"state": [...], "continuous_gripper_state": [...]},
    fps_downsample_ratio=2,  # 30fps source → 15fps target
    action_scaler=20.0,
)

The Action Scaler

Cosmos uses a scale factor of 20.0 for action deltas. Raw EE displacements are tiny — millimeters. Scaling to a neural-network-friendly range:

raw Δx = 0.002 meters (2mm)
scaled  = 0.002 × 20.0 = 0.04  (network-friendly range)

Configurable via DataSourceConfig(action_scaler=20.0).


Integration with Data Soup

The cosmos_dreamgen source type automatically uses relative actions:

DataSourceConfig(
    name="cosmos-bridge",
    type="cosmos_dreamgen",
    path="/data/bridge",
    weight=2.0,
    use_relative_actions=True,
    action_scaler=20.0,
    camera_ids=["head", "hand_0", "hand_1"],
)

Supports local data (videos/ + annotations/*.json) and HuggingFace datasets (nvidia/GR1-100, NVIDIA/GR00T-Dreams-GR1).