Installation¶

Ten seconds to a robot that sees time.

Quick Install¶

pip install neon-vla

That's it. One line. The backbone downloads lazily when you first call load_backbone() — nothing heavy happens at import time.

From Source¶

For development, contributing, or if you want to live on the edge:

git clone https://github.com/cagataycali/neon.git
cd neon
pip install -e ".[dev]"

The [dev] extra installs test runners, linters, and type checkers. Everything you need to hack on Neon and know immediately if you broke something.

What Gets Installed¶

Neon keeps its dependency tree tight. Heavy imports (torch, transformers) are lazy — they load on use, not on import.

Package	Version	Why It's Here
`torch`	≥ 2.2.0	Tensor operations, model execution
`transformers`	≥ 4.48.0, < 5.3.0	Backbone loading (Qwen2.5-VL, Cosmos)
`datasets`	≥ 3.0.0	HuggingFace data soup loading
`huggingface-hub`	≥ 0.23.0	Model push/pull
`numpy`	≥ 1.24.0	The bedrock of everything
`pillow`	≥ 10.0.0	Image I/O
`einops`	≥ 0.7.0	Tensor reshaping without headaches
`pyyaml`	≥ 6.0	Config parsing

Optional Extras¶

TrainingAudioHardware (G1 Robot)

pip install neon-vla[train]
# Adds: peft, trl, bitsandbytes, accelerate

QLoRA, 4-bit quantization, gradient accumulation — everything for training action heads on consumer hardware.

pip install neon-vla[audio]
# Adds: openai-whisper, torchaudio

Give the robot ears. Whisper encoder for understanding spoken commands.

pip install unitree_sdk2py
# Unitree SDK for real G1 hardware control

The bridge between predicted actions and a physical 35kg humanoid.

Hardware Requirements¶

What You Want To Do	GPU	VRAM	Notes
Inference (3B)	Jetson Orin / RTX 3060	8 GB	4-bit quantized, ~50ms latency
Inference (7B)	RTX 4090 / A100	16 GB	4-bit quantized
Train action heads	RTX 4090 / L4	24 GB	Backbone frozen, only 6M params train
Train with LoRA	A100 / L40S	40+ GB	Fine-tune backbone attention layers

No GPU? No problem.

Everything runs on CPU for testing and development. It'll be slow (~5s per prediction) but completely functional. All 168 tests pass on CPU without any GPU or backbone weights.

Verify It Works¶

import neon
from neon.model.neon_vla import NeonConfig, NeonVLA
from neon.data.action_space import G1ActionSpace

# Create the model (backbone hasn't loaded yet — this is fast)
config = NeonConfig(control_mode="arms_only")
model = NeonVLA(config)

# Check the action space
print(model.action_space)
# → G1ActionSpace(mode=arms_only, joints=14, action_dim=17)

print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")
# → Trainable parameters: ~2,345,678 (action heads + fusion only)

If you see that output, you're ready. The video backbone will download when you need it — when you call model.load_backbone() for the first time.

Next¶

→ Quickstart — create a model, predict actions, control a robot