Skip to content
Strands Cosmos

Strands Cosmos

Give your AI agent eyes that understand physics.

NVIDIA Cosmos Reason VLM provider for Strands Agents โ€” physical AI reasoning, video understanding, and embodied intelligence.


See It In Action

Play locally

pip install asciinema
asciinema play docs/assets/casts/03_driving_analysis.cast

What is Strands Cosmos?

Strands Cosmos connects Strands Agents to NVIDIA Cosmos-Reason2 โ€” a family of vision-language models purpose-built for physical world understanding.

2 models ยท Video + Image + Text ยท Chain-of-Thought reasoning ยท Tool integration ยท Jetson-native

graph LR
    A["๐Ÿ—ฃ๏ธ Strands Agent"] --> B{"CosmosVisionModel"}
    B -->|Video| C["๐Ÿš— Driving Analysis"]
    B -->|Image| D["๐Ÿค– Robot Planning"]
    B -->|Text| E["โš›๏ธ Physics Reasoning"]
    B -->|CoT| F["๐Ÿง  Chain-of-Thought"]

Get Started in 2 Minutes

pip install strands-cosmos strands-agents
from strands import Agent
from strands_cosmos import CosmosVisionModel

model = CosmosVisionModel(model_id="nvidia/Cosmos-Reason2-2B")
agent = Agent(model=model)

# Analyze a dashcam video
agent("Caption in detail: <video>dashcam.mp4</video>")

# Reason about a robot's view
agent("<image>robot_view.jpg</image> What should the robot do next?")

# Physics understanding (text-only)
agent("What happens when you push a ball off the edge of a table?")

โ†’ Full Quickstart | Installation


Capabilities

  • ๐Ÿš— Driving Analysis

    Traffic, hazards, navigation from dashcam video

    โ†’ Driving example

  • ๐Ÿค– Robot Planning

    Next-action prediction, 2D trajectory planning

    โ†’ Embodied reasoning

  • ๐ŸŽฌ Video Captioning

    Detailed temporal-spatial descriptions

    โ†’ Video captioning

  • โš›๏ธ Physics Reasoning

    Object permanence, causality, plausibility

    โ†’ Text reasoning

  • ๐Ÿ” 2D Grounding

    Bounding box localization in images

  • ๐Ÿง  Chain-of-Thought

    <think> reasoning before answers

    โ†’ CoT guide


Models

Model GPU Memory Architecture Best For
Cosmos-Reason2-2B 24 GB Qwen3-VL Edge / Jetson
Cosmos-Reason2-8B 32 GB Qwen3-VL Desktop / Cloud

Verified Platforms

Platform GPU Status
Jetson AGX Thor Thor 132 GB โœ… (with CUBLAS fix)
Desktop A100 / H100 / RTX 4090 โœ…
Jetson Orin Orin 32/64 GB โœ… (may need CUBLAS fix)

Two Ways to Use

from strands import Agent
from strands_cosmos import CosmosVisionModel

model = CosmosVisionModel(model_id="nvidia/Cosmos-Reason2-2B")
agent = Agent(model=model)
agent("Describe this scene: <video>scene.mp4</video>")
from strands import Agent
from strands_cosmos import cosmos_vision_invoke

# Use Cosmos as a tool inside a Bedrock/OpenAI agent
agent = Agent(tools=[cosmos_vision_invoke])
agent("Analyze this dashcam video for safety: /path/to/video.mp4")

Performance on Jetson AGX Thor

Benchmarks with Cosmos-Reason2-2B on 132GB unified memory:

Example Task Time Recording
01 Text-only physics ~11s cast
02 Video caption (10s @ 4fps) ~15s cast
03 Driving analysis + CoT ~16s cast
04 Embodied reasoning + CoT ~43s cast
05 Tool invocation ~9s cast


Resources