G1 EDU+ Data Collection Guide — VR Teleoperation with Meta Quest 3¶
Goal: Collect the richest humanoid manipulation dataset on Earth. Camera + YOLO + SAM + Audio + EEF + Joints → LeRobot v3 → Train Neon.
Table of Contents¶
- Hardware Checklist
- G1 EDU+ Power-On Sequence
- Network Setup
- Meta Quest 3 VR Teleoperation Setup
- Neon Collector Installation
- Running Data Collection
- Training on Collected Data
- Troubleshooting
1. Hardware Checklist¶
| Item | Model | Purpose |
|---|---|---|
| Robot | Unitree G1 EDU+ (29-DOF body + 14-DOF hands = 43-DOF) | The robot |
| VR Headset | Meta Quest 3 | Hand tracking → G1 arm teleoperation |
| Companion PC | NVIDIA GPU (RTX 3090+ or Jetson Orin) | YOLO + SAM + Neon inference |
| Router | Gigabit Ethernet + WiFi 6 | Low-latency G1 ↔ PC ↔ Quest 3 |
| USB Microphone | Any 16kHz+ capable | Audio capture (Sound4D) |
| Ethernet Cable | Cat6 | G1 to router (wired, not WiFi) |
| Scene | Table + objects (cups, bottles, tools) | Manipulation tasks |
Optional but Recommended¶
| Item | Purpose |
|---|---|
| External webcam (Realsense D435i) | Better head camera + depth |
| Safety harness / gantry | Catch robot if balance fails |
| Second monitor | Live visualization dashboard |
2. G1 EDU+ Power-On Sequence¶
⚠️ SAFETY FIRST¶
- Clear the area — 2m radius around the robot, no obstacles
- Have a spotter — someone ready to hit E-STOP
- Gantry/harness on if available — G1 can fall during init
- Keep the RC controller powered on — it's your kill switch
Step-by-Step Power-On¶
┌─────────────────────────────────────────────────────┐
│ G1 EDU+ Boot Sequence (~90 seconds total) │
│ │
│ 1. [0s] Plug in battery (rear panel) │
│ 2. [0s] Press power button (chest, hold 3s) │
│ 3. [10s] LEDs cycle → self-test begins │
│ 4. [30s] Joints lock → standing position │
│ 5. [45s] WiFi/Ethernet interface comes up │
│ 6. [60s] SDK API available on 192.168.123.10 │
│ 7. [90s] Ready (LED solid green) │
│ │
│ RC Controller: │
│ - Power on BEFORE robot │
│ - Channel 1: E-STOP (instant motor kill) │
│ - Channel 2: Mode switch (damping/stand/walk) │
│ - Default mode after boot: DAMPING (safe) │
│ │
│ Mode Transitions (via RC or SDK): │
│ DAMPING → STAND → WALK → SDK_CONTROL │
│ │
│ To enter SDK control: │
│ 1. Switch to STAND mode (RC switch up) │
│ 2. Wait for robot to stand (~5s) │
│ 3. Switch to WALK mode (RC switch middle) │
│ 4. Send SDK enable command (see below) │
└─────────────────────────────────────────────────────┘
SDK Enable (from companion PC)¶
# Verify connection
ping 192.168.123.10
# Enable SDK control (Python)
python3 -c "
from unitree_sdk2py.core.channel import ChannelFactory
from unitree_sdk2py.go2.sport import SportClient
ChannelFactory.Init(0, '192.168.123.10')
client = SportClient()
client.Init()
print('SDK connected!')
# Switch to SDK control mode
client.SwitchGait(0) # 0=idle stand
print('Robot in SDK control mode')
"
Shutdown Sequence¶
1. Send zero velocity: client.Move(0, 0, 0)
2. Switch RC to DAMPING mode
3. Hold power button (3s) → graceful shutdown
4. Wait for LEDs off
5. Disconnect battery
3. Network Setup¶
┌──────────────┐ Ethernet ┌──────────────┐
│ G1 EDU+ │◄──────────────────►│ Router │
│ 192.168.123.10│ (1Gbps,<1ms) │ │
└──────────────┘ │ WiFi 6 │
│ 5GHz band │
┌──────────────┐ Ethernet │ │
│ Companion PC │◄──────────────────►│ │
│ 192.168.123.X│ (1Gbps) │ │
└──────────────┘ └──────┬───────┘
│ WiFi 6
┌──────┴───────┐
│ Meta Quest 3 │
│ 192.168.123.Y│
└──────────────┘
Router Configuration¶
# Recommended: dedicated 5GHz WiFi network for Quest 3
# SSID: "neon-teleop" (or whatever)
# Channel: 36 or 149 (least congested)
# Bandwidth: 80MHz or 160MHz
# Band: 5GHz ONLY (2.4GHz adds latency)
# Static IPs (recommended for stability):
# G1: 192.168.123.10 (factory default)
# Companion PC: 192.168.123.100
# Quest 3: 192.168.123.200 (DHCP reservation by MAC)
Companion PC Network¶
# Set static IP on the Ethernet interface to G1
sudo ip addr add 192.168.123.100/24 dev eth0
# or via NetworkManager:
nmcli con add type ethernet ifname eth0 ip4 192.168.123.100/24
# Verify
ping 192.168.123.10 # G1
ping 192.168.123.200 # Quest 3 (after WiFi connect)
4. Meta Quest 3 VR Teleoperation Setup¶
Architecture¶
┌───────────────────────────────────────────────────────────────┐
│ VR Teleoperation Pipeline │
│ │
│ Quest 3 Companion PC G1 │
│ ┌─────────┐ WebSocket ┌─────────────┐ SDK ┌──────┐ │
│ │ Hand │──────────────►│ TeleOp │─────────►│ Arms │ │
│ │ Tracking│ (~10ms) │ Server │ (50Hz) │ 14DOF│ │
│ │ 6DOF │ │ │ │ │ │
│ │ per hand│ │ IK Solver │ │ Loco │ │
│ │ │ │ retarget │ │ 3DOF │ │
│ │ Thumbs- │ │ Quest→G1 │ │ │ │
│ │ tick │ │ │ │ │ │
│ │ buttons │ │ Neon │ │ Cam │ │
│ │ = grip │ │ Collector │ │ Mic │ │
│ └─────────┘ └─────────────┘ └──────┘ │
└───────────────────────────────────────────────────────────────┘
Option A: Open-TeleVision (Recommended — Best Quest 3 Support)¶
Open-TeleVision is purpose-built for humanoid teleoperation with Quest headsets.
# On companion PC:
git clone https://github.com/OpenTeleVision/TeleVision.git
cd TeleVision
pip install -e .
# Install dependencies
pip install vuer==0.0.30 opencv-python mediapipe
# Generate HTTPS certs (Quest requires HTTPS for hand tracking API)
cd cert
bash generate_cert.sh
# This creates server.crt and server.key
# You'll need to install server.crt on the Quest 3 (see below)
Install cert on Quest 3:
1. Copy server.crt to Quest 3 via USB or adb push server.crt /sdcard/
2. On Quest 3: Settings → Security → Install certificate → select the file
3. Or use adb shell am start -a android.credentials.INSTALL -t application/x-x509-ca-cert -d file:///sdcard/server.crt
Configure for G1 arm retargeting:
# teleop_g1.py — Quest 3 → G1 arms via Open-TeleVision
import numpy as np
from TeleVision import OpenTeleVision
from neon.collect import G1DataCollector, G1CollectorConfig
from neon.inference.g1_controller import G1Controller, G1Config
from neon.data.action_space import G1ActionSpace, ControlMode
# --- TeleVision server (receives Quest 3 hand poses) ---
tv = OpenTeleVision(
host="0.0.0.0",
port=8012,
cert_file="./cert/server.crt",
key_file="./cert/server.key",
# Stream G1's head camera back to Quest 3 for immersive view
stream_camera=True,
camera_resolution=(640, 480),
)
# --- G1 controller ---
g1 = G1Controller(G1Config(robot_ip="192.168.123.10", mode="arms_only"))
g1.connect()
# --- Neon data collector ---
collector = G1DataCollector(G1CollectorConfig(
robot_ip="192.168.123.10",
hub_id="cagataydev/neon-g1-real-v1",
enable_yolo=True,
enable_sam=True,
enable_audio=True,
fps=20,
))
collector.start()
# --- IK Retargeting: Quest hand 6DOF → G1 arm 7DOF ---
action_space = G1ActionSpace(mode=ControlMode.ARMS_ONLY, include_locomotion=True)
def quest_to_g1_arms(left_wrist_pose, right_wrist_pose, left_grip, right_grip):
"""Convert Quest 3 hand tracking to G1 joint targets.
Quest gives us:
- left_wrist_pose: (4, 4) homogeneous transform
- right_wrist_pose: (4, 4) homogeneous transform
- left_grip / right_grip: float 0-1 (pinch strength)
We need G1 joint angles (14 DoF: 7 per arm).
Method: Analytical IK using G1 URDF kinematics.
For quick start: use position-only IK (ignore wrist orientation).
"""
target_joints = action_space.default_position()
# Scale Quest workspace to G1 workspace
# Quest tracking volume: ~1m³, G1 arm reach: ~0.6m
WORKSPACE_SCALE = 0.6
for side, (pose, grip) in enumerate([
(left_wrist_pose, left_grip),
(right_wrist_pose, right_grip),
]):
if pose is None:
continue
# Extract position from 4x4 transform
pos = pose[:3, 3] * WORKSPACE_SCALE
# Simple analytical IK for 7-DOF arm
# (shoulder_pitch, shoulder_roll, shoulder_yaw, elbow,
# wrist_roll, wrist_pitch, wrist_yaw)
# Offset: left arm starts at index 0, right at index 7
# in the 14-DoF arms_only action vector
offset = side * 7
# --- Shoulder pitch: vertical angle to target ---
dist_xz = np.sqrt(pos[0]**2 + pos[2]**2)
target_joints[offset + 0] = np.arctan2(pos[0], -pos[2])
# --- Shoulder roll: lateral deviation ---
target_joints[offset + 1] = np.arctan2(pos[1], dist_xz) * (1 if side == 0 else -1)
# --- Shoulder yaw: from wrist orientation ---
if pose is not None:
target_joints[offset + 2] = np.arctan2(pose[1, 0], pose[0, 0]) * 0.5
# --- Elbow: distance-based (farther = straighter) ---
UPPER_ARM = 0.2725
FOREARM = 0.2595
reach = np.linalg.norm(pos)
max_reach = UPPER_ARM + FOREARM
cos_elbow = np.clip((reach**2 - UPPER_ARM**2 - FOREARM**2) /
(2 * UPPER_ARM * FOREARM), -1, 1)
target_joints[offset + 3] = np.pi - np.arccos(cos_elbow)
# --- Wrist: from Quest hand orientation ---
rot = pose[:3, :3]
target_joints[offset + 4] = np.arctan2(rot[2, 1], rot[2, 2]) * 0.3 # roll
target_joints[offset + 5] = np.arctan2(-rot[2, 0], np.sqrt(rot[2,1]**2 + rot[2,2]**2)) * 0.3 # pitch
target_joints[offset + 6] = grip * 1.5 # map grip to wrist yaw (simplified)
return target_joints
# --- Main teleoperation + collection loop ---
print("\n" + "="*60)
print("VR TELEOPERATION ACTIVE")
print("="*60)
print("On Quest 3: open browser → https://192.168.123.100:8012")
print("Allow hand tracking when prompted")
print("Pinch to grip. Thumbstick to walk.")
print("Press B to start episode, A to end episode.")
print("="*60 + "\n")
episode_active = False
task = ""
try:
while True:
# Get Quest 3 hand data from TeleVision server
tv_data = tv.get_latest()
if tv_data is None:
continue
left_pose = tv_data.get("left_wrist") # (4,4) or None
right_pose = tv_data.get("right_wrist") # (4,4) or None
left_grip = tv_data.get("left_pinch", 0) # 0-1
right_grip = tv_data.get("right_pinch", 0)
buttons = tv_data.get("buttons", {})
# Button B: start new episode
if buttons.get("b_pressed") and not episode_active:
task = input("Task description: ") if not task else task
collector.new_episode(task)
episode_active = True
print(f"▶ Recording: {task}")
# Button A: end episode
if buttons.get("a_pressed") and episode_active:
collector.end_episode()
episode_active = False
print("⏹ Episode saved")
task = ""
# Retarget Quest → G1 joints
target = quest_to_g1_arms(left_pose, right_pose, left_grip, right_grip)
# Build full action vector (14 arm joints + 3 locomotion)
full_action = np.zeros(46, dtype=np.float32)
full_action[:len(target)] = target
# Locomotion from thumbsticks
full_action[14] = tv_data.get("left_thumbstick_y", 0) * 0.5 # vx
full_action[15] = tv_data.get("left_thumbstick_x", 0) * 0.3 # vy
full_action[16] = tv_data.get("right_thumbstick_x", 0) * 0.5 # omega
# Send to robot
g1.send_actions(action_space.denormalize(
action_space.normalize(full_action)
))
# Record frame (collector handles YOLO+SAM+Audio+EEF in background)
collector.set_action(full_action)
if episode_active:
collector.collect_frame()
except KeyboardInterrupt:
print("\nStopping...")
finally:
if episode_active:
collector.end_episode()
collector.stop()
g1.disconnect()
tv.stop()
Launch on Quest 3:
1. Put on Quest 3
2. Open Meta Quest Browser
3. Navigate to https://192.168.123.100:8012
4. Accept the self-signed certificate warning
5. Allow hand tracking permission when prompted
6. You should see the G1's camera feed in VR + your hands tracked
Option B: Bunny-VisionPro / AnyTeleop (Alternative)¶
If Open-TeleVision doesn't work well, these are alternatives:
# AnyTeleop — supports Quest 3 via WebXR
pip install anyteleop
anyteleop-server --robot g1 --ip 192.168.123.10
# Bunny-VisionPro (originally for Apple Vision Pro, has Quest adapter)
git clone https://github.com/Dingry/BunnyVisionPro.git
# Requires more setup but has better retargeting
Option C: DIY WebXR (Minimal, No Dependencies)¶
# Minimal Quest 3 hand tracking → WebSocket → Python
# See: https://github.com/nicrusso7/quest-hand-tracking-ws
# Python side:
pip install websockets
# Start a simple retargeting server:
python3 -c "
import asyncio, websockets, json
import numpy as np
async def handler(ws):
async for msg in ws:
data = json.loads(msg)
# data contains Quest 3 hand joint positions
left = np.array(data.get('leftHand', {}).get('wrist', [0,0,0]))
right = np.array(data.get('rightHand', {}).get('wrist', [0,0,0]))
print(f'L:{left} R:{right}')
asyncio.run(websockets.serve(handler, '0.0.0.0', 9090))
"
5. Neon Collector Installation¶
# On companion PC (with NVIDIA GPU):
cd /path/to/neon
pip install -e ".[collect]"
# This installs:
# - ultralytics (YOLO11)
# - segment-anything (SAM)
# - sounddevice (audio)
# - lerobot (dataset format)
# - pyarrow (parquet)
# Verify GPU for perception:
python3 -c "import torch; print(f'CUDA: {torch.cuda.is_available()}, GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"none\"}')"
# Download models (first run caches them):
python3 -c "from ultralytics import YOLO; YOLO('yolo11n.pt')"
python3 -c "
from segment_anything import sam_model_registry
import urllib.request
from pathlib import Path
d = Path.home() / '.cache/neon/sam'
d.mkdir(parents=True, exist_ok=True)
p = d / 'sam_vit_b.pth'
if not p.exists():
print('Downloading SAM vit_b...')
urllib.request.urlretrieve('https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b.pth', str(p))
print('SAM ready')
"
# Login to HuggingFace (for dataset upload):
huggingface-cli login
6. Running Data Collection¶
Quick Start (CLI)¶
# Single timed episode (60 seconds):
neon-collect \
--robot-ip 192.168.123.10 \
--hub-id cagataydev/neon-g1-real-v1 \
--task "pick up the red cup and place it on the left side" \
--duration 60 \
--fps 20 \
--mode arms_only
# Interactive mode (control episode boundaries manually):
neon-collect \
--robot-ip 192.168.123.10 \
--hub-id cagataydev/neon-g1-real-v1
# Then in the interactive prompt:
# collect> n pick up the red cup ← start episode
# collect> e ← end episode
# collect> n place cup on shelf ← start new episode
# collect> e
# collect> s ← print stats
# collect> q ← quit + export to HuggingFace
Full VR Teleop + Collection¶
# Terminal 1: Start TeleVision server (Quest 3 ↔ PC)
cd TeleVision
python3 teleop_g1.py
# Terminal 2: Monitor (optional)
# Open http://localhost:8766 for the Neon dashboard
# Shows: camera feed, YOLO boxes, joint states, audio waveform
# On Quest 3:
# 1. Open browser → https://192.168.123.100:8012
# 2. Start teleoperating
# 3. Press B to start recording, A to stop
What Gets Recorded Per Frame¶
| Modality | Key | Shape | Source |
|---|---|---|---|
| Head camera | observation.image.camera_head |
(480, 640, 3) | G1 camera |
| Joint state | observation.state |
(43,) | Motor encoders |
| EEF state | observation.eef_state |
(14,) | FK from joints |
| Segmentation | observation.segmentation.camera_head |
(480, 640) | YOLO→SAM |
| Audio | observation.audio |
(32000,) | USB mic @16kHz |
| Action | action |
(46,) | Teleop targets |
| Language | language_instruction |
string | Task description |
Data Quality Tips¶
- Vary viewpoints — approach objects from different angles
- Vary speeds — some fast grabs, some slow careful placements
- Vary objects — cups, bottles, tools, food items, soft things
- Include failures — reach and miss, drop objects (model learns recovery)
- Narrate tasks — speak while teleoperating (captured in audio)
- 50+ episodes per task — minimum for fine-tuning to generalize
- 10-30 second episodes — short, focused manipulation tasks
- Consistent lighting — or deliberately vary it for robustness
7. Training on Collected Data¶
from neon.training.config import TrainConfig
from neon.training.train import train
from neon.model.neon_vla import NeonConfig
from neon.model.video_backbone import BackboneConfig
from neon.data.data_soup import DataSoupConfig, DataSourceConfig
config = TrainConfig(
model=NeonConfig(
backbone=BackboneConfig(
model_id="Qwen/Qwen2.5-Omni-7B",
load_in_4bit=True,
freeze_backbone=True,
),
control_mode="arms_only",
action_head_type="flow", # Multi-modal action distribution
num_action_steps=16,
use_eef=True, # YOUR EEF data
use_segmentation=True, # YOUR YOLO+SAM masks
audio={"enabled": True, "encoder_type": "whisper"}, # YOUR audio
),
data=DataSoupConfig(
sources=[
# YOUR real G1 data (primary)
DataSourceConfig(
name="g1_real",
type="neon_v3",
path="cagataydev/neon-g1-real-v1",
weight=3.0, # 3× weight — your data is gold
),
# Mix with existing datasets for generalization
DataSourceConfig(
name="agibot",
type="agibot",
path="lerobot/xvla-agibot-world",
weight=1.0,
),
],
),
epochs=5,
batch_size=2,
learning_rate=1e-4,
hub_model_id="cagataydev/neon-g1-real-flow-v1",
push_to_hub=True,
)
train(config)
Training on HuggingFace Jobs (if no local GPU)¶
# Upload training script
hf jobs uv run \
--flavor a100-large \
--secrets HF_TOKEN \
--timeout 8h \
-- python -m neon.training.train \
--backbone Qwen/Qwen2.5-Omni-7B \
--mode arms_only
8. Troubleshooting¶
G1 Won't Enter SDK Mode¶
# Check connection
ping 192.168.123.10
# Check if robot is in STAND mode (required before SDK)
# Use RC controller: switch to STAND first, wait 5s, then WALK
# Check Unitree SDK version
pip show unitree_sdk2py
# Need >= 1.0.0 for G1 EDU+
Quest 3 Can't Connect¶
# Verify Quest is on same network
# On Quest 3: Settings → WiFi → check IP is 192.168.123.X
# Check HTTPS cert is installed
# Quest requires HTTPS for WebXR hand tracking API
# Test from Quest browser: https://192.168.123.100:8012
# If "connection refused": firewall blocking port 8012
sudo ufw allow 8012
# Latency test (should be <20ms)
ping 192.168.123.200 # Quest IP
YOLO/SAM Too Slow¶
# Check GPU usage
nvidia-smi
# Use smaller YOLO model
neon-collect --yolo-model yolo11n.pt # nano (fastest)
# vs yolo11s.pt (small) or yolo11m.pt (medium)
# Disable SAM (use YOLO box masks instead — rough but fast)
neon-collect --no-sam
# SAM runs at ~3Hz on RTX 3090, ~8Hz on RTX 4090
# YOLO runs at ~30Hz even on CPU
Audio Not Capturing¶
# List audio devices
python3 -c "import sounddevice; print(sounddevice.query_devices())"
# Specify device index
neon-collect --audio-device 2
# Test capture
python3 -c "
import sounddevice as sd
import numpy as np
audio = sd.rec(16000, samplerate=16000, channels=1, dtype='float32')
sd.wait()
print(f'Captured {len(audio)} samples, max={np.abs(audio).max():.3f}')
"
High Latency in Teleop¶
Typical latency budget:
Quest 3 hand tracking: ~10ms
WebSocket transfer: ~5ms
IK retargeting: ~1ms
G1 SDK command: ~2ms
Physics response: ~20ms (50Hz control)
─────────────────────────────
Total: ~38ms (target: <50ms)
If >50ms:
- Use 5GHz WiFi for Quest (not 2.4GHz)
- Use Ethernet for PC↔G1 (not WiFi)
- Reduce camera stream resolution to Quest
- Check for other network traffic
Appendix: File Outputs¶
After a collection session, your HuggingFace dataset will contain:
cagataydev/neon-g1-real-v1/
├── data/
│ └── chunk-000/
│ └── 000000.parquet # All frame data
├── meta/
│ ├── info.json # Dataset metadata
│ ├── episodes.parquet # Episode boundaries
│ └── tasks.parquet # Task descriptions
└── videos/
├── observation.image.camera_head/
│ └── chunk-000/
│ └── 000000.mp4 # Head camera video
└── observation.image.camera_hand/
└── ...
Then train with:
"The best training data comes from the robot that will use it."