Skip to content

G1 EDU+ Data Collection Guide — VR Teleoperation with Meta Quest 3

Goal: Collect the richest humanoid manipulation dataset on Earth. Camera + YOLO + SAM + Audio + EEF + Joints → LeRobot v3 → Train Neon.


Table of Contents

  1. Hardware Checklist
  2. G1 EDU+ Power-On Sequence
  3. Network Setup
  4. Meta Quest 3 VR Teleoperation Setup
  5. Neon Collector Installation
  6. Running Data Collection
  7. Training on Collected Data
  8. Troubleshooting

1. Hardware Checklist

Item Model Purpose
Robot Unitree G1 EDU+ (29-DOF body + 14-DOF hands = 43-DOF) The robot
VR Headset Meta Quest 3 Hand tracking → G1 arm teleoperation
Companion PC NVIDIA GPU (RTX 3090+ or Jetson Orin) YOLO + SAM + Neon inference
Router Gigabit Ethernet + WiFi 6 Low-latency G1 ↔ PC ↔ Quest 3
USB Microphone Any 16kHz+ capable Audio capture (Sound4D)
Ethernet Cable Cat6 G1 to router (wired, not WiFi)
Scene Table + objects (cups, bottles, tools) Manipulation tasks
Item Purpose
External webcam (Realsense D435i) Better head camera + depth
Safety harness / gantry Catch robot if balance fails
Second monitor Live visualization dashboard

2. G1 EDU+ Power-On Sequence

⚠️ SAFETY FIRST

  1. Clear the area — 2m radius around the robot, no obstacles
  2. Have a spotter — someone ready to hit E-STOP
  3. Gantry/harness on if available — G1 can fall during init
  4. Keep the RC controller powered on — it's your kill switch

Step-by-Step Power-On

┌─────────────────────────────────────────────────────┐
│  G1 EDU+ Boot Sequence (~90 seconds total)          │
│                                                     │
│  1. [0s]   Plug in battery (rear panel)             │
│  2. [0s]   Press power button (chest, hold 3s)      │
│  3. [10s]  LEDs cycle → self-test begins            │
│  4. [30s]  Joints lock → standing position          │
│  5. [45s]  WiFi/Ethernet interface comes up          │
│  6. [60s]  SDK API available on 192.168.123.10      │
│  7. [90s]  Ready (LED solid green)                  │
│                                                     │
│  RC Controller:                                     │
│  - Power on BEFORE robot                            │
│  - Channel 1: E-STOP (instant motor kill)           │
│  - Channel 2: Mode switch (damping/stand/walk)      │
│  - Default mode after boot: DAMPING (safe)          │
│                                                     │
│  Mode Transitions (via RC or SDK):                  │
│  DAMPING → STAND → WALK → SDK_CONTROL              │
│                                                     │
│  To enter SDK control:                              │
│  1. Switch to STAND mode (RC switch up)             │
│  2. Wait for robot to stand (~5s)                   │
│  3. Switch to WALK mode (RC switch middle)           │
│  4. Send SDK enable command (see below)             │
└─────────────────────────────────────────────────────┘

SDK Enable (from companion PC)

# Verify connection
ping 192.168.123.10

# Enable SDK control (Python)
python3 -c "
from unitree_sdk2py.core.channel import ChannelFactory
from unitree_sdk2py.go2.sport import SportClient

ChannelFactory.Init(0, '192.168.123.10')
client = SportClient()
client.Init()
print('SDK connected!')

# Switch to SDK control mode
client.SwitchGait(0)  # 0=idle stand
print('Robot in SDK control mode')
"

Shutdown Sequence

1. Send zero velocity: client.Move(0, 0, 0)
2. Switch RC to DAMPING mode
3. Hold power button (3s) → graceful shutdown
4. Wait for LEDs off
5. Disconnect battery

3. Network Setup

┌──────────────┐     Ethernet      ┌──────────────┐
│   G1 EDU+    │◄──────────────────►│   Router     │
│ 192.168.123.10│    (1Gbps,<1ms)   │              │
└──────────────┘                    │  WiFi 6      │
                                    │  5GHz band   │
┌──────────────┐     Ethernet      │              │
│ Companion PC │◄──────────────────►│              │
│ 192.168.123.X│    (1Gbps)        │              │
└──────────────┘                    └──────┬───────┘
                                           │ WiFi 6
                                    ┌──────┴───────┐
                                    │ Meta Quest 3 │
                                    │ 192.168.123.Y│
                                    └──────────────┘

Router Configuration

# Recommended: dedicated 5GHz WiFi network for Quest 3
# SSID: "neon-teleop" (or whatever)
# Channel: 36 or 149 (least congested)
# Bandwidth: 80MHz or 160MHz
# Band: 5GHz ONLY (2.4GHz adds latency)

# Static IPs (recommended for stability):
# G1:           192.168.123.10  (factory default)
# Companion PC: 192.168.123.100
# Quest 3:      192.168.123.200 (DHCP reservation by MAC)

Companion PC Network

# Set static IP on the Ethernet interface to G1
sudo ip addr add 192.168.123.100/24 dev eth0
# or via NetworkManager:
nmcli con add type ethernet ifname eth0 ip4 192.168.123.100/24

# Verify
ping 192.168.123.10   # G1
ping 192.168.123.200  # Quest 3 (after WiFi connect)

4. Meta Quest 3 VR Teleoperation Setup

Architecture

┌───────────────────────────────────────────────────────────────┐
│                    VR Teleoperation Pipeline                   │
│                                                               │
│  Quest 3                    Companion PC              G1      │
│  ┌─────────┐  WebSocket   ┌─────────────┐  SDK     ┌──────┐ │
│  │ Hand    │──────────────►│ TeleOp      │─────────►│ Arms │ │
│  │ Tracking│  (~10ms)     │ Server      │ (50Hz)   │ 14DOF│ │
│  │ 6DOF   │              │             │          │      │ │
│  │ per hand│              │ IK Solver   │          │ Loco │ │
│  │         │              │ retarget    │          │ 3DOF │ │
│  │ Thumbs- │              │ Quest→G1    │          │      │ │
│  │ tick    │              │             │          │      │ │
│  │ buttons │              │ Neon        │          │ Cam  │ │
│  │ = grip  │              │ Collector   │          │ Mic  │ │
│  └─────────┘              └─────────────┘          └──────┘ │
└───────────────────────────────────────────────────────────────┘

Open-TeleVision is purpose-built for humanoid teleoperation with Quest headsets.

# On companion PC:
git clone https://github.com/OpenTeleVision/TeleVision.git
cd TeleVision
pip install -e .

# Install dependencies
pip install vuer==0.0.30 opencv-python mediapipe

# Generate HTTPS certs (Quest requires HTTPS for hand tracking API)
cd cert
bash generate_cert.sh
# This creates server.crt and server.key
# You'll need to install server.crt on the Quest 3 (see below)

Install cert on Quest 3: 1. Copy server.crt to Quest 3 via USB or adb push server.crt /sdcard/ 2. On Quest 3: Settings → Security → Install certificate → select the file 3. Or use adb shell am start -a android.credentials.INSTALL -t application/x-x509-ca-cert -d file:///sdcard/server.crt

Configure for G1 arm retargeting:

# teleop_g1.py — Quest 3 → G1 arms via Open-TeleVision
import numpy as np
from TeleVision import OpenTeleVision
from neon.collect import G1DataCollector, G1CollectorConfig
from neon.inference.g1_controller import G1Controller, G1Config
from neon.data.action_space import G1ActionSpace, ControlMode

# --- TeleVision server (receives Quest 3 hand poses) ---
tv = OpenTeleVision(
    host="0.0.0.0",
    port=8012,
    cert_file="./cert/server.crt",
    key_file="./cert/server.key",
    # Stream G1's head camera back to Quest 3 for immersive view
    stream_camera=True,
    camera_resolution=(640, 480),
)

# --- G1 controller ---
g1 = G1Controller(G1Config(robot_ip="192.168.123.10", mode="arms_only"))
g1.connect()

# --- Neon data collector ---
collector = G1DataCollector(G1CollectorConfig(
    robot_ip="192.168.123.10",
    hub_id="cagataydev/neon-g1-real-v1",
    enable_yolo=True,
    enable_sam=True,
    enable_audio=True,
    fps=20,
))
collector.start()

# --- IK Retargeting: Quest hand 6DOF → G1 arm 7DOF ---
action_space = G1ActionSpace(mode=ControlMode.ARMS_ONLY, include_locomotion=True)

def quest_to_g1_arms(left_wrist_pose, right_wrist_pose, left_grip, right_grip):
    """Convert Quest 3 hand tracking to G1 joint targets.

    Quest gives us:
    - left_wrist_pose: (4, 4) homogeneous transform
    - right_wrist_pose: (4, 4) homogeneous transform
    - left_grip / right_grip: float 0-1 (pinch strength)

    We need G1 joint angles (14 DoF: 7 per arm).

    Method: Analytical IK using G1 URDF kinematics.
    For quick start: use position-only IK (ignore wrist orientation).
    """
    target_joints = action_space.default_position()

    # Scale Quest workspace to G1 workspace
    # Quest tracking volume: ~1m³, G1 arm reach: ~0.6m
    WORKSPACE_SCALE = 0.6

    for side, (pose, grip) in enumerate([
        (left_wrist_pose, left_grip),
        (right_wrist_pose, right_grip),
    ]):
        if pose is None:
            continue

        # Extract position from 4x4 transform
        pos = pose[:3, 3] * WORKSPACE_SCALE

        # Simple analytical IK for 7-DOF arm
        # (shoulder_pitch, shoulder_roll, shoulder_yaw, elbow,
        #  wrist_roll, wrist_pitch, wrist_yaw)

        # Offset: left arm starts at index 0, right at index 7
        # in the 14-DoF arms_only action vector
        offset = side * 7

        # --- Shoulder pitch: vertical angle to target ---
        dist_xz = np.sqrt(pos[0]**2 + pos[2]**2)
        target_joints[offset + 0] = np.arctan2(pos[0], -pos[2])

        # --- Shoulder roll: lateral deviation ---
        target_joints[offset + 1] = np.arctan2(pos[1], dist_xz) * (1 if side == 0 else -1)

        # --- Shoulder yaw: from wrist orientation ---
        if pose is not None:
            target_joints[offset + 2] = np.arctan2(pose[1, 0], pose[0, 0]) * 0.5

        # --- Elbow: distance-based (farther = straighter) ---
        UPPER_ARM = 0.2725
        FOREARM = 0.2595
        reach = np.linalg.norm(pos)
        max_reach = UPPER_ARM + FOREARM
        cos_elbow = np.clip((reach**2 - UPPER_ARM**2 - FOREARM**2) /
                           (2 * UPPER_ARM * FOREARM), -1, 1)
        target_joints[offset + 3] = np.pi - np.arccos(cos_elbow)

        # --- Wrist: from Quest hand orientation ---
        rot = pose[:3, :3]
        target_joints[offset + 4] = np.arctan2(rot[2, 1], rot[2, 2]) * 0.3  # roll
        target_joints[offset + 5] = np.arctan2(-rot[2, 0], np.sqrt(rot[2,1]**2 + rot[2,2]**2)) * 0.3  # pitch
        target_joints[offset + 6] = grip * 1.5  # map grip to wrist yaw (simplified)

    return target_joints


# --- Main teleoperation + collection loop ---
print("\n" + "="*60)
print("VR TELEOPERATION ACTIVE")
print("="*60)
print("On Quest 3: open browser → https://192.168.123.100:8012")
print("Allow hand tracking when prompted")
print("Pinch to grip. Thumbstick to walk.")
print("Press B to start episode, A to end episode.")
print("="*60 + "\n")

episode_active = False
task = ""

try:
    while True:
        # Get Quest 3 hand data from TeleVision server
        tv_data = tv.get_latest()
        if tv_data is None:
            continue

        left_pose = tv_data.get("left_wrist")    # (4,4) or None
        right_pose = tv_data.get("right_wrist")   # (4,4) or None
        left_grip = tv_data.get("left_pinch", 0)  # 0-1
        right_grip = tv_data.get("right_pinch", 0)
        buttons = tv_data.get("buttons", {})

        # Button B: start new episode
        if buttons.get("b_pressed") and not episode_active:
            task = input("Task description: ") if not task else task
            collector.new_episode(task)
            episode_active = True
            print(f"▶ Recording: {task}")

        # Button A: end episode
        if buttons.get("a_pressed") and episode_active:
            collector.end_episode()
            episode_active = False
            print("⏹ Episode saved")
            task = ""

        # Retarget Quest → G1 joints
        target = quest_to_g1_arms(left_pose, right_pose, left_grip, right_grip)

        # Build full action vector (14 arm joints + 3 locomotion)
        full_action = np.zeros(46, dtype=np.float32)
        full_action[:len(target)] = target

        # Locomotion from thumbsticks
        full_action[14] = tv_data.get("left_thumbstick_y", 0) * 0.5   # vx
        full_action[15] = tv_data.get("left_thumbstick_x", 0) * 0.3   # vy
        full_action[16] = tv_data.get("right_thumbstick_x", 0) * 0.5  # omega

        # Send to robot
        g1.send_actions(action_space.denormalize(
            action_space.normalize(full_action)
        ))

        # Record frame (collector handles YOLO+SAM+Audio+EEF in background)
        collector.set_action(full_action)
        if episode_active:
            collector.collect_frame()

except KeyboardInterrupt:
    print("\nStopping...")
finally:
    if episode_active:
        collector.end_episode()
    collector.stop()
    g1.disconnect()
    tv.stop()

Launch on Quest 3: 1. Put on Quest 3 2. Open Meta Quest Browser 3. Navigate to https://192.168.123.100:8012 4. Accept the self-signed certificate warning 5. Allow hand tracking permission when prompted 6. You should see the G1's camera feed in VR + your hands tracked

Option B: Bunny-VisionPro / AnyTeleop (Alternative)

If Open-TeleVision doesn't work well, these are alternatives:

# AnyTeleop — supports Quest 3 via WebXR
pip install anyteleop
anyteleop-server --robot g1 --ip 192.168.123.10

# Bunny-VisionPro (originally for Apple Vision Pro, has Quest adapter)
git clone https://github.com/Dingry/BunnyVisionPro.git
# Requires more setup but has better retargeting

Option C: DIY WebXR (Minimal, No Dependencies)

# Minimal Quest 3 hand tracking → WebSocket → Python
# See: https://github.com/nicrusso7/quest-hand-tracking-ws

# Python side:
pip install websockets

# Start a simple retargeting server:
python3 -c "
import asyncio, websockets, json
import numpy as np

async def handler(ws):
    async for msg in ws:
        data = json.loads(msg)
        # data contains Quest 3 hand joint positions
        left = np.array(data.get('leftHand', {}).get('wrist', [0,0,0]))
        right = np.array(data.get('rightHand', {}).get('wrist', [0,0,0]))
        print(f'L:{left} R:{right}')

asyncio.run(websockets.serve(handler, '0.0.0.0', 9090))
"

5. Neon Collector Installation

# On companion PC (with NVIDIA GPU):
cd /path/to/neon
pip install -e ".[collect]"

# This installs:
# - ultralytics (YOLO11)
# - segment-anything (SAM)
# - sounddevice (audio)
# - lerobot (dataset format)
# - pyarrow (parquet)

# Verify GPU for perception:
python3 -c "import torch; print(f'CUDA: {torch.cuda.is_available()}, GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"none\"}')"

# Download models (first run caches them):
python3 -c "from ultralytics import YOLO; YOLO('yolo11n.pt')"
python3 -c "
from segment_anything import sam_model_registry
import urllib.request
from pathlib import Path
d = Path.home() / '.cache/neon/sam'
d.mkdir(parents=True, exist_ok=True)
p = d / 'sam_vit_b.pth'
if not p.exists():
    print('Downloading SAM vit_b...')
    urllib.request.urlretrieve('https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b.pth', str(p))
print('SAM ready')
"

# Login to HuggingFace (for dataset upload):
huggingface-cli login

6. Running Data Collection

Quick Start (CLI)

# Single timed episode (60 seconds):
neon-collect \
    --robot-ip 192.168.123.10 \
    --hub-id cagataydev/neon-g1-real-v1 \
    --task "pick up the red cup and place it on the left side" \
    --duration 60 \
    --fps 20 \
    --mode arms_only

# Interactive mode (control episode boundaries manually):
neon-collect \
    --robot-ip 192.168.123.10 \
    --hub-id cagataydev/neon-g1-real-v1

# Then in the interactive prompt:
# collect> n pick up the red cup        ← start episode
# collect> e                             ← end episode
# collect> n place cup on shelf          ← start new episode
# collect> e
# collect> s                             ← print stats
# collect> q                             ← quit + export to HuggingFace

Full VR Teleop + Collection

# Terminal 1: Start TeleVision server (Quest 3 ↔ PC)
cd TeleVision
python3 teleop_g1.py

# Terminal 2: Monitor (optional)
# Open http://localhost:8766 for the Neon dashboard
# Shows: camera feed, YOLO boxes, joint states, audio waveform

# On Quest 3:
# 1. Open browser → https://192.168.123.100:8012
# 2. Start teleoperating
# 3. Press B to start recording, A to stop

What Gets Recorded Per Frame

Modality Key Shape Source
Head camera observation.image.camera_head (480, 640, 3) G1 camera
Joint state observation.state (43,) Motor encoders
EEF state observation.eef_state (14,) FK from joints
Segmentation observation.segmentation.camera_head (480, 640) YOLO→SAM
Audio observation.audio (32000,) USB mic @16kHz
Action action (46,) Teleop targets
Language language_instruction string Task description

Data Quality Tips

  1. Vary viewpoints — approach objects from different angles
  2. Vary speeds — some fast grabs, some slow careful placements
  3. Vary objects — cups, bottles, tools, food items, soft things
  4. Include failures — reach and miss, drop objects (model learns recovery)
  5. Narrate tasks — speak while teleoperating (captured in audio)
  6. 50+ episodes per task — minimum for fine-tuning to generalize
  7. 10-30 second episodes — short, focused manipulation tasks
  8. Consistent lighting — or deliberately vary it for robustness

7. Training on Collected Data

from neon.training.config import TrainConfig
from neon.training.train import train
from neon.model.neon_vla import NeonConfig
from neon.model.video_backbone import BackboneConfig
from neon.data.data_soup import DataSoupConfig, DataSourceConfig

config = TrainConfig(
    model=NeonConfig(
        backbone=BackboneConfig(
            model_id="Qwen/Qwen2.5-Omni-7B",
            load_in_4bit=True,
            freeze_backbone=True,
        ),
        control_mode="arms_only",
        action_head_type="flow",         # Multi-modal action distribution
        num_action_steps=16,
        use_eef=True,                    # YOUR EEF data
        use_segmentation=True,           # YOUR YOLO+SAM masks
        audio={"enabled": True, "encoder_type": "whisper"},  # YOUR audio
    ),
    data=DataSoupConfig(
        sources=[
            # YOUR real G1 data (primary)
            DataSourceConfig(
                name="g1_real",
                type="neon_v3",
                path="cagataydev/neon-g1-real-v1",
                weight=3.0,  # 3× weight — your data is gold
            ),
            # Mix with existing datasets for generalization
            DataSourceConfig(
                name="agibot",
                type="agibot",
                path="lerobot/xvla-agibot-world",
                weight=1.0,
            ),
        ],
    ),
    epochs=5,
    batch_size=2,
    learning_rate=1e-4,
    hub_model_id="cagataydev/neon-g1-real-flow-v1",
    push_to_hub=True,
)

train(config)

Training on HuggingFace Jobs (if no local GPU)

# Upload training script
hf jobs uv run \
    --flavor a100-large \
    --secrets HF_TOKEN \
    --timeout 8h \
    -- python -m neon.training.train \
        --backbone Qwen/Qwen2.5-Omni-7B \
        --mode arms_only

8. Troubleshooting

G1 Won't Enter SDK Mode

# Check connection
ping 192.168.123.10

# Check if robot is in STAND mode (required before SDK)
# Use RC controller: switch to STAND first, wait 5s, then WALK

# Check Unitree SDK version
pip show unitree_sdk2py
# Need >= 1.0.0 for G1 EDU+

Quest 3 Can't Connect

# Verify Quest is on same network
# On Quest 3: Settings → WiFi → check IP is 192.168.123.X

# Check HTTPS cert is installed
# Quest requires HTTPS for WebXR hand tracking API

# Test from Quest browser: https://192.168.123.100:8012
# If "connection refused": firewall blocking port 8012
sudo ufw allow 8012

# Latency test (should be <20ms)
ping 192.168.123.200  # Quest IP

YOLO/SAM Too Slow

# Check GPU usage
nvidia-smi

# Use smaller YOLO model
neon-collect --yolo-model yolo11n.pt   # nano (fastest)
# vs yolo11s.pt (small) or yolo11m.pt (medium)

# Disable SAM (use YOLO box masks instead — rough but fast)
neon-collect --no-sam

# SAM runs at ~3Hz on RTX 3090, ~8Hz on RTX 4090
# YOLO runs at ~30Hz even on CPU

Audio Not Capturing

# List audio devices
python3 -c "import sounddevice; print(sounddevice.query_devices())"

# Specify device index
neon-collect --audio-device 2

# Test capture
python3 -c "
import sounddevice as sd
import numpy as np
audio = sd.rec(16000, samplerate=16000, channels=1, dtype='float32')
sd.wait()
print(f'Captured {len(audio)} samples, max={np.abs(audio).max():.3f}')
"

High Latency in Teleop

Typical latency budget:
  Quest 3 hand tracking:    ~10ms
  WebSocket transfer:       ~5ms
  IK retargeting:           ~1ms
  G1 SDK command:           ~2ms
  Physics response:         ~20ms (50Hz control)
  ─────────────────────────────
  Total:                    ~38ms (target: <50ms)

If >50ms:
  - Use 5GHz WiFi for Quest (not 2.4GHz)
  - Use Ethernet for PC↔G1 (not WiFi)
  - Reduce camera stream resolution to Quest
  - Check for other network traffic

Appendix: File Outputs

After a collection session, your HuggingFace dataset will contain:

cagataydev/neon-g1-real-v1/
├── data/
│   └── chunk-000/
│       └── 000000.parquet    # All frame data
├── meta/
│   ├── info.json             # Dataset metadata
│   ├── episodes.parquet      # Episode boundaries
│   └── tasks.parquet         # Task descriptions
└── videos/
    ├── observation.image.camera_head/
    │   └── chunk-000/
    │       └── 000000.mp4    # Head camera video
    └── observation.image.camera_hand/
        └── ...

Then train with:

neon-train --config configs/g1_real_flow.yaml


"The best training data comes from the robot that will use it."