thor-cosmos¶
→ Visit the standalone landing page for the marketing overview. NVIDIA Cosmos on Jetson AGX Thor — one
justfile, one Strands agent, full lifecycle.
What is this?¶
A Strands agent + justfile that orchestrates the full NVIDIA Cosmos ecosystem — Reason2 (VLM), Predict2.5 (world model), Transfer2.5 (ControlNet), Xenna (data curation) — and deploys it on Jetson AGX Thor for real-time robot perception.
graph LR
A["Operator"] -->|just <recipe>| J["justfile"]
B["Strands Agent"] -->|just_run| J
J --> C["hf · trt-edgellm"]
J --> D["torchrun · cosmos-cli"]
J --> E["curl · gstreamer · nats"]
C --> F["Cosmos models"]
D --> F
Why justfile as the command surface?¶
- Every Cosmos upstream repo (
cosmos-predict2.5,cosmos-transfer2.5,cosmos-reason2,cosmos-cookbook) already ships ajustfile. Ours blends in. - One source of truth: agents shell out to
just <recipe>; operators runjust <recipe>directly. Zero duplication. - Thin Python tools (~30 lines each) invoke a recipe and normalize output to a Strands
ToolResult. - Discoverable:
just --listprints every pipeline step. - Composable: meta-recipes (
pipeline-edge-deploy) chain atomic recipes.
Features¶
-
Cosmos-Reason2 (VLM)
Real-time vision-language on Jetson Thor (FP8, TRT-EdgeLLM). Quantize, ONNX-export, engine-build, serve, infer.
-
Cosmos-Predict2.5
World model / video generation: text→world, video→world, action-conditioned, multiview.
-
Cosmos-Transfer2.5
ControlNet-style video transfer: edge, depth, seg, vis, multi-control.
-
Cosmos-Xenna
7-stage data curation: split → transcode → crop → filter → caption → dedup → shard.
-
Training & Distillation
Post-train Reason2 (SFT/RL), Predict2.5, Transfer2.5 via
torchrun. Step-distill with KD / DMD2. -
12 Evaluation Metrics
FID, FVD, TSE, CSE, Sampson, blur-SSIM, Canny-F1, depth-RMSE, seg-mIoU, DOVER, Reason-critic, Reason-reward.
-
Edge Deployment
GStreamer RTP capture (HW-accel), TRT-EdgeLLM HTTP server, NATS event publishing.
-
19 Strands Tools
All tool calls normalized to
{status, content: [text, json, image]}. Parallel-by-default.
The flagship pipeline¶
The intbot_edge_vlm cookbook recipe — deploy Cosmos-Reason2 to Thor for real-time robot perception:
# On x86 GPU host
just prep-edge-model reason2-2b ./models/R2-fp8
scp -r ./models/R2-fp8/onnx thor:~/R2-fp8-onnx
# On Thor
just build-engines ~/R2-fp8-onnx ~/R2-fp8-engines
just serve-start ~/R2-fp8-engines/llm ~/R2-fp8-engines/visual
just infer /tmp/frame.jpg "count people in the scene"
# Real-time loop
just perception-loop perception.vlm "describe the scene, count people"
Quick links¶
- Install & Quickstart — 2-minute setup
- Architecture — how tools, recipes and repos fit together
- API Reference — all 19 agent tools
- Cosmos Cookbook — upstream recipes