Relative Actions (Cosmos)¶
The same physical movement should produce the same numbers — regardless of which robot performs it. That's the promise of relative actions, ported from Cosmos-Predict2.5.
The Problem¶
Different robots have different kinematics. "Move the gripper forward 5cm" requires completely different joint angles on a Franka, an SO-100, and a G1 humanoid. Absolute joint positions are body-specific. They don't transfer.
The Solution: Gripper-Frame Relative Actions¶
Compute actions as relative displacements in the gripper's own coordinate frame:
graph TD
subgraph "Previous State"
P1["Gripper at (x₁, y₁, z₁)<br/>Rotation R₁"]
end
subgraph "Current State"
P2["Gripper at (x₂, y₂, z₂)<br/>Rotation R₂"]
end
subgraph "Relative Action"
RA["Δxyz = R₁ᵀ · (xyz₂ - xyz₁)<br/>Δrot = euler(R₁ᵀ · R₂)<br/>gripper_state"]
end
P1 --> RA
P2 --> RA
style RA fill:#e65100,color:#fff
The formula:
rel_xyz = prev_rotm.T @ (curr_xyz - prev_xyz) # Position delta in gripper frame
rel_rot = rotm2euler(prev_rotm.T @ curr_rotm) # Rotation delta
action = [rel_xyz(3), rel_rot(3), gripper(1)] # 7-DoF, any robot
Why this works: A 2cm forward push produces [0.02, 0, 0, 0, 0, 0, grip] on any robot. The same physical movement = the same numbers. Always. Regardless of workspace origin, coordinate convention, or robot geometry.
Implementation¶
Rotation Math¶
from neon.data.relative_actions import euler2rotm, rotm2euler, rotm2quat
R = euler2rotm([roll, pitch, yaw]) # (3,) → (3, 3) rotation matrix
angles = rotm2euler(R) # (3, 3) → (3,) back to Euler
quat = rotm2quat(R) # (3, 3) → (4,) quaternion [w, x, y, z]
ZYX convention throughout. Consistent with Cosmos and the URDF standard.
State-to-Action Conversion¶
from neon.data.relative_actions import states_to_relative_actions
actions = states_to_relative_actions(
arm_states=ee_poses, # (T, 6) [x, y, z, roll, pitch, yaw]
gripper_states=gripper_widths, # (T,) [0=open, 1=closed]
action_scaler=20.0, # Cosmos default normalization
)
# → (T-1, 7) relative actions
Bridge Dataset API¶
Compatible with Cosmos/Bridge data format out of the box:
from neon.data.relative_actions import get_action_sequence_from_states
actions = get_action_sequence_from_states(
data={"state": [...], "continuous_gripper_state": [...]},
fps_downsample_ratio=2, # 30fps source → 15fps target
action_scaler=20.0,
)
The Action Scaler¶
Cosmos uses a scale factor of 20.0 for action deltas. Raw EE displacements are tiny — millimeters. Scaling to a neural-network-friendly range:
Configurable via DataSourceConfig(action_scaler=20.0).
Integration with Data Soup¶
The cosmos_dreamgen source type automatically uses relative actions:
DataSourceConfig(
name="cosmos-bridge",
type="cosmos_dreamgen",
path="/data/bridge",
weight=2.0,
use_relative_actions=True,
action_scaler=20.0,
camera_ids=["head", "hand_0", "hand_1"],
)
Supports local data (videos/ + annotations/*.json) and HuggingFace datasets (nvidia/GR1-100, NVIDIA/GR00T-Dreams-GR1).