Strands Cosmos
Give your AI agent eyes that understand physics.
NVIDIA Cosmos Reason VLM provider for Strands Agents โ physical AI reasoning, video understanding, and embodied intelligence.
See It In Action¶
-
๐ Driving Analysis with Chain-of-Thought

-
๐ค Robot Embodied Reasoning

-
๐ฌ Video Captioning

-
โ๏ธ Physics Reasoning (Text-Only)

What is Strands Cosmos?¶
Strands Cosmos is the full-lifecycle NVIDIA Cosmos toolkit for Strands Agents. It provides Cosmos-Reason2 as a model provider plus 21 tools covering the entire ecosystem: VLM reasoning, world-model generation (Predict2.5), video-to-video (Transfer2.5), data curation (Xenna), post-training, quantization, edge deployment, and evaluation.
2 models ยท 21 tools ยท Video + Image + Text ยท Chain-of-Thought ยท Jetson-native ยท Full pipeline automation
graph LR
A["๐ฃ๏ธ Strands Agent"] --> B{"CosmosVisionModel"}
B -->|Video| C["๐ Driving Analysis"]
B -->|Image| D["๐ค Robot Planning"]
B -->|Text| E["โ๏ธ Physics Reasoning"]
B -->|CoT| F["๐ง Chain-of-Thought"]
Get Started in 2 Minutes¶
from strands import Agent
from strands_cosmos import CosmosVisionModel
model = CosmosVisionModel(model_id="nvidia/Cosmos-Reason2-2B")
agent = Agent(model=model)
# Analyze a dashcam video
agent("Caption in detail: <video>dashcam.mp4</video>")
# Reason about a robot's view
agent("<image>robot_view.jpg</image> What should the robot do next?")
# Physics understanding (text-only)
agent("What happens when you push a ball off the edge of a table?")
โ Full Quickstart | Installation
Capabilities¶
-
๐ Driving Analysis
Traffic, hazards, navigation from dashcam video
โ Driving example
-
๐ค Robot Planning
Next-action prediction, 2D trajectory planning
-
๐ฌ Video Captioning
Detailed temporal-spatial descriptions
โ Video captioning
-
โ๏ธ Physics Reasoning
Object permanence, causality, plausibility
โ Text reasoning
-
๐ 2D Grounding
Bounding box localization in images
-
๐ง Chain-of-Thought
<think>reasoning before answersโ CoT guide
Models¶
| Model | GPU Memory | Architecture | Best For |
|---|---|---|---|
| Cosmos-Reason2-2B | 24 GB | Qwen3-VL | Edge / Jetson |
| Cosmos-Reason2-8B | 32 GB | Qwen3-VL | Desktop / Cloud |
Verified Platforms¶
| Platform | GPU | Status |
|---|---|---|
| Jetson AGX Thor | Thor 132 GB | โ (with CUBLAS fix) |
| Desktop | A100 / H100 / RTX 4090 | โ |
| Jetson Orin | Orin 32/64 GB | โ (may need CUBLAS fix) |
Two Ways to Use¶
from strands import Agent
from strands_cosmos import cosmos_reason_hf, video_probe, cosmos_sysinfo
# 21 tools available โ use any combination
agent = Agent(tools=[cosmos_reason_hf, video_probe, cosmos_sysinfo])
agent("Check GPU status, probe the video, then describe what you see in /tmp/scene.mp4")
from strands import Agent
from strands_cosmos import (
cosmos_model_download, cosmos_quantize, cosmos_export_onnx,
cosmos_build_engine, cosmos_serve, cosmos_inference,
)
# Agent orchestrates the full edge-deployment pipeline
agent = Agent(tools=[
cosmos_model_download, cosmos_quantize, cosmos_export_onnx,
cosmos_build_engine, cosmos_serve, cosmos_inference,
])
agent("Download Reason2-2B, quantize to FP8, export ONNX, build TRT engine, start server, and run a test query")
Performance on Jetson AGX Thor¶
Benchmarks with Cosmos-Reason2-2B on 132GB unified memory:
| Example | Task | Time | Recording |
|---|---|---|---|
| 01 | Text-only physics | ~11s | cast |
| 02 | Video caption (10s @ 4fps) | ~15s | cast |
| 03 | Driving analysis + CoT | ~16s | cast |
| 04 | Embodied reasoning + CoT | ~43s | cast |
| 05 | Tool invocation | ~9s | cast |
Quick Links¶
Developer Setup (Full Cosmos Ecosystem)¶
git clone https://github.com/cagataycali/strands-cosmos && cd strands-cosmos
just setup-full # Installs apt deps, Python deps, clones 6 Cosmos repos
just doctor # Platform diagnostics โ what works on THIS machine
just doctor checks: repos, core tools, Python packages, media tools, TRT binaries, GPU/CUDA โ with platform-aware guidance (workstation vs Jetson vs Docker).