Strands Cosmos
Give your AI agent eyes that understand physics.
NVIDIA Cosmos Reason VLM provider for Strands Agents โ physical AI reasoning, video understanding, and embodied intelligence.
See It In Action¶
-
๐ Driving Analysis with Chain-of-Thought

-
๐ค Robot Embodied Reasoning

-
๐ฌ Video Captioning

-
โ๏ธ Physics Reasoning (Text-Only)

What is Strands Cosmos?¶
Strands Cosmos connects Strands Agents to NVIDIA Cosmos-Reason2 โ a family of vision-language models purpose-built for physical world understanding.
2 models ยท Video + Image + Text ยท Chain-of-Thought reasoning ยท Tool integration ยท Jetson-native
graph LR
A["๐ฃ๏ธ Strands Agent"] --> B{"CosmosVisionModel"}
B -->|Video| C["๐ Driving Analysis"]
B -->|Image| D["๐ค Robot Planning"]
B -->|Text| E["โ๏ธ Physics Reasoning"]
B -->|CoT| F["๐ง Chain-of-Thought"]
Get Started in 2 Minutes¶
from strands import Agent
from strands_cosmos import CosmosVisionModel
model = CosmosVisionModel(model_id="nvidia/Cosmos-Reason2-2B")
agent = Agent(model=model)
# Analyze a dashcam video
agent("Caption in detail: <video>dashcam.mp4</video>")
# Reason about a robot's view
agent("<image>robot_view.jpg</image> What should the robot do next?")
# Physics understanding (text-only)
agent("What happens when you push a ball off the edge of a table?")
โ Full Quickstart | Installation
Capabilities¶
-
๐ Driving Analysis
Traffic, hazards, navigation from dashcam video
โ Driving example
-
๐ค Robot Planning
Next-action prediction, 2D trajectory planning
-
๐ฌ Video Captioning
Detailed temporal-spatial descriptions
โ Video captioning
-
โ๏ธ Physics Reasoning
Object permanence, causality, plausibility
โ Text reasoning
-
๐ 2D Grounding
Bounding box localization in images
-
๐ง Chain-of-Thought
<think>reasoning before answersโ CoT guide
Models¶
| Model | GPU Memory | Architecture | Best For |
|---|---|---|---|
| Cosmos-Reason2-2B | 24 GB | Qwen3-VL | Edge / Jetson |
| Cosmos-Reason2-8B | 32 GB | Qwen3-VL | Desktop / Cloud |
Verified Platforms¶
| Platform | GPU | Status |
|---|---|---|
| Jetson AGX Thor | Thor 132 GB | โ (with CUBLAS fix) |
| Desktop | A100 / H100 / RTX 4090 | โ |
| Jetson Orin | Orin 32/64 GB | โ (may need CUBLAS fix) |
Two Ways to Use¶
Performance on Jetson AGX Thor¶
Benchmarks with Cosmos-Reason2-2B on 132GB unified memory:
| Example | Task | Time | Recording |
|---|---|---|---|
| 01 | Text-only physics | ~11s | cast |
| 02 | Video caption (10s @ 4fps) | ~15s | cast |
| 03 | Driving analysis + CoT | ~16s | cast |
| 04 | Embodied reasoning + CoT | ~43s | cast |
| 05 | Tool invocation | ~9s | cast |