Skip to content

Installation

Ten seconds to a robot that sees time.


Quick Install

pip install neon-vla

That's it. One line. The backbone downloads lazily when you first call load_backbone() — nothing heavy happens at import time.

From Source

For development, contributing, or if you want to live on the edge:

git clone https://github.com/cagataycali/neon.git
cd neon
pip install -e ".[dev]"

The [dev] extra installs test runners, linters, and type checkers. Everything you need to hack on Neon and know immediately if you broke something.


What Gets Installed

Neon keeps its dependency tree tight. Heavy imports (torch, transformers) are lazy — they load on use, not on import.

Package Version Why It's Here
torch ≥ 2.2.0 Tensor operations, model execution
transformers ≥ 4.48.0, < 5.3.0 Backbone loading (Qwen2.5-VL, Cosmos)
datasets ≥ 3.0.0 HuggingFace data soup loading
huggingface-hub ≥ 0.23.0 Model push/pull
numpy ≥ 1.24.0 The bedrock of everything
pillow ≥ 10.0.0 Image I/O
einops ≥ 0.7.0 Tensor reshaping without headaches
pyyaml ≥ 6.0 Config parsing

Optional Extras

pip install neon-vla[train]
# Adds: peft, trl, bitsandbytes, accelerate
QLoRA, 4-bit quantization, gradient accumulation — everything for training action heads on consumer hardware.

pip install neon-vla[audio]
# Adds: openai-whisper, torchaudio
Give the robot ears. Whisper encoder for understanding spoken commands.

pip install unitree_sdk2py
# Unitree SDK for real G1 hardware control
The bridge between predicted actions and a physical 35kg humanoid.


Hardware Requirements

What You Want To Do GPU VRAM Notes
Inference (3B) Jetson Orin / RTX 3060 8 GB 4-bit quantized, ~50ms latency
Inference (7B) RTX 4090 / A100 16 GB 4-bit quantized
Train action heads RTX 4090 / L4 24 GB Backbone frozen, only 6M params train
Train with LoRA A100 / L40S 40+ GB Fine-tune backbone attention layers

No GPU? No problem.

Everything runs on CPU for testing and development. It'll be slow (~5s per prediction) but completely functional. All 168 tests pass on CPU without any GPU or backbone weights.


Verify It Works

import neon
from neon.model.neon_vla import NeonConfig, NeonVLA
from neon.data.action_space import G1ActionSpace

# Create the model (backbone hasn't loaded yet — this is fast)
config = NeonConfig(control_mode="arms_only")
model = NeonVLA(config)

# Check the action space
print(model.action_space)
# → G1ActionSpace(mode=arms_only, joints=14, action_dim=17)

print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")
# → Trainable parameters: ~2,345,678 (action heads + fusion only)

If you see that output, you're ready. The video backbone will download when you need it — when you call model.load_backbone() for the first time.


Next

Quickstart — create a model, predict actions, control a robot