Strands Cosmos

Give your AI agent eyes that understand physics.

NVIDIA Cosmos Reason VLM provider for Strands Agents — physical AI reasoning, video understanding, and embodied intelligence.

See It In Action¶

🚗 Driving Analysis with Chain-of-Thought

→ Full example + code
🤖 Robot Embodied Reasoning

→ Full example + code

🎬 Video Captioning

→ Full example + code
⚛️ Physics Reasoning (Text-Only)

→ Full example + code

Play locally

pip install asciinema
asciinema play docs/assets/casts/03_driving_analysis.cast

What is Strands Cosmos?¶

Strands Cosmos connects Strands Agents to NVIDIA Cosmos-Reason2 — a family of vision-language models purpose-built for physical world understanding.

2 models · Video + Image + Text · Chain-of-Thought reasoning · Tool integration · Jetson-native

graph LR
    A["🗣️ Strands Agent"] --> B{"CosmosVisionModel"}
    B -->|Video| C["🚗 Driving Analysis"]
    B -->|Image| D["🤖 Robot Planning"]
    B -->|Text| E["⚛️ Physics Reasoning"]
    B -->|CoT| F["🧠 Chain-of-Thought"]

Get Started in 2 Minutes¶

pip install strands-cosmos strands-agents

from strands import Agent
from strands_cosmos import CosmosVisionModel

model = CosmosVisionModel(model_id="nvidia/Cosmos-Reason2-2B")
agent = Agent(model=model)

# Analyze a dashcam video
agent("Caption in detail: <video>dashcam.mp4</video>")

# Reason about a robot's view
agent("<image>robot_view.jpg</image> What should the robot do next?")

# Physics understanding (text-only)
agent("What happens when you push a ball off the edge of a table?")

→ Full Quickstart | Installation

Capabilities¶

🚗 Driving Analysis

Traffic, hazards, navigation from dashcam video

→ Driving example
🤖 Robot Planning

Next-action prediction, 2D trajectory planning

→ Embodied reasoning
🎬 Video Captioning

Detailed temporal-spatial descriptions

→ Video captioning
⚛️ Physics Reasoning

Object permanence, causality, plausibility

→ Text reasoning
🔍 2D Grounding

Bounding box localization in images
🧠 Chain-of-Thought

<think> reasoning before answers

→ CoT guide

Models¶

Model	GPU Memory	Architecture	Best For
Cosmos-Reason2-2B	24 GB	Qwen3-VL	Edge / Jetson
Cosmos-Reason2-8B	32 GB	Qwen3-VL	Desktop / Cloud

Verified Platforms¶

Platform	GPU	Status
Jetson AGX Thor	Thor 132 GB	✅ (with CUBLAS fix)
Desktop	A100 / H100 / RTX 4090	✅
Jetson Orin	Orin 32/64 GB	✅ (may need CUBLAS fix)

Two Ways to Use¶

As the Agent's ModelAs a Tool (in any Agent)

from strands import Agent
from strands_cosmos import CosmosVisionModel

model = CosmosVisionModel(model_id="nvidia/Cosmos-Reason2-2B")
agent = Agent(model=model)
agent("Describe this scene: <video>scene.mp4</video>")

from strands import Agent
from strands_cosmos import cosmos_vision_invoke

# Use Cosmos as a tool inside a Bedrock/OpenAI agent
agent = Agent(tools=[cosmos_vision_invoke])
agent("Analyze this dashcam video for safety: /path/to/video.mp4")

Performance on Jetson AGX Thor¶

Benchmarks with Cosmos-Reason2-2B on 132GB unified memory:

Example	Task	Time	Recording
01	Text-only physics	~11s	cast
02	Video caption (10s @ 4fps)	~15s	cast
03	Driving analysis + CoT	~16s	cast
04	Embodied reasoning + CoT	~43s	cast
05	Tool invocation	~9s	cast