Skip to content
Strands Cosmos

Strands Cosmos

Give your AI agent eyes that understand physics.

NVIDIA Cosmos Reason VLM provider for Strands Agents โ€” physical AI reasoning, video understanding, and embodied intelligence.

Pipeline Overview


See It In Action

Play locally

pip install asciinema
asciinema play docs/assets/casts/03_driving_analysis.cast

What is Strands Cosmos?

Strands Cosmos is the full-lifecycle NVIDIA Cosmos toolkit for Strands Agents. It provides Cosmos-Reason2 as a model provider plus 21 tools covering the entire ecosystem: VLM reasoning, world-model generation (Predict2.5), video-to-video (Transfer2.5), data curation (Xenna), post-training, quantization, edge deployment, and evaluation.

2 models ยท 21 tools ยท Video + Image + Text ยท Chain-of-Thought ยท Jetson-native ยท Full pipeline automation

graph LR
    A["๐Ÿ—ฃ๏ธ Strands Agent"] --> B{"CosmosVisionModel"}
    B -->|Video| C["๐Ÿš— Driving Analysis"]
    B -->|Image| D["๐Ÿค– Robot Planning"]
    B -->|Text| E["โš›๏ธ Physics Reasoning"]
    B -->|CoT| F["๐Ÿง  Chain-of-Thought"]

Get Started in 2 Minutes

pip install strands-cosmos
from strands import Agent
from strands_cosmos import CosmosVisionModel

model = CosmosVisionModel(model_id="nvidia/Cosmos-Reason2-2B")
agent = Agent(model=model)

# Analyze a dashcam video
agent("Caption in detail: <video>dashcam.mp4</video>")

# Reason about a robot's view
agent("<image>robot_view.jpg</image> What should the robot do next?")

# Physics understanding (text-only)
agent("What happens when you push a ball off the edge of a table?")

โ†’ Full Quickstart | Installation


Capabilities

  • ๐Ÿš— Driving Analysis

    Traffic, hazards, navigation from dashcam video

    โ†’ Driving example

  • ๐Ÿค– Robot Planning

    Next-action prediction, 2D trajectory planning

    โ†’ Embodied reasoning

  • ๐ŸŽฌ Video Captioning

    Detailed temporal-spatial descriptions

    โ†’ Video captioning

  • โš›๏ธ Physics Reasoning

    Object permanence, causality, plausibility

    โ†’ Text reasoning

  • ๐Ÿ” 2D Grounding

    Bounding box localization in images

  • ๐Ÿง  Chain-of-Thought

    <think> reasoning before answers

    โ†’ CoT guide


Models

Model GPU Memory Architecture Best For
Cosmos-Reason2-2B 24 GB Qwen3-VL Edge / Jetson
Cosmos-Reason2-8B 32 GB Qwen3-VL Desktop / Cloud

Verified Platforms

Platform GPU Status
Jetson AGX Thor Thor 132 GB โœ… (with CUBLAS fix)
Desktop A100 / H100 / RTX 4090 โœ…
Jetson Orin Orin 32/64 GB โœ… (may need CUBLAS fix)

Two Ways to Use

from strands import Agent
from strands_cosmos import CosmosVisionModel

model = CosmosVisionModel(model_id="nvidia/Cosmos-Reason2-2B")
agent = Agent(model=model)
agent("Describe this scene: <video>scene.mp4</video>")
from strands import Agent
from strands_cosmos import cosmos_reason_hf, video_probe, cosmos_sysinfo

# 21 tools available โ€” use any combination
agent = Agent(tools=[cosmos_reason_hf, video_probe, cosmos_sysinfo])
agent("Check GPU status, probe the video, then describe what you see in /tmp/scene.mp4")
from strands import Agent
from strands_cosmos import (
    cosmos_model_download, cosmos_quantize, cosmos_export_onnx,
    cosmos_build_engine, cosmos_serve, cosmos_inference,
)

# Agent orchestrates the full edge-deployment pipeline
agent = Agent(tools=[
    cosmos_model_download, cosmos_quantize, cosmos_export_onnx,
    cosmos_build_engine, cosmos_serve, cosmos_inference,
])
agent("Download Reason2-2B, quantize to FP8, export ONNX, build TRT engine, start server, and run a test query")

Performance on Jetson AGX Thor

Benchmarks with Cosmos-Reason2-2B on 132GB unified memory:

Example Task Time Recording
01 Text-only physics ~11s cast
02 Video caption (10s @ 4fps) ~15s cast
03 Driving analysis + CoT ~16s cast
04 Embodied reasoning + CoT ~43s cast
05 Tool invocation ~9s cast


Developer Setup (Full Cosmos Ecosystem)

git clone https://github.com/cagataycali/strands-cosmos && cd strands-cosmos
just setup-full    # Installs apt deps, Python deps, clones 6 Cosmos repos
just doctor        # Platform diagnostics โ€” what works on THIS machine

just doctor checks: repos, core tools, Python packages, media tools, TRT binaries, GPU/CUDA โ€” with platform-aware guidance (workstation vs Jetson vs Docker).


Resources