Skip to content

Examples

New here? Start with the notebooks

These scripts are the runnable distillation of the interactive notebooks — learn the concept step by step in a notebook, then ship it from examples/.

Runnable examples tested on NVIDIA Jetson AGX Thor (132GB unified memory).


Demo Video

Demo — Driving analysis on Jetson AGX Thor

Click to watch the full demo video


All Examples

  • 01 — Basic Text (Physics Reasoning)

    Basic text inference

    Text-only physics reasoning — no video or image needed. ~11s on Thor.

    Full example + code

  • 02 — Video Captioning

    Video captioning

    Detailed temporal-spatial descriptions from video. ~15s on Thor.

    Full example + code

  • 03 — Driving Analysis (CoT)

    Driving analysis

    Dashcam safety analysis with chain-of-thought reasoning. ~16s on Thor.

    Full example + code

  • 04 — Embodied Reasoning

    Embodied reasoning

    Robot next-action prediction from workspace images. ~43s on Thor.

    Full example + code

  • 05 — Tool Usage

    Tool usage

    Cosmos as a callable tool inside any Strands agent. ~9s on Thor.

    Full example + code

  • 06 — Cosmos 3 Reasoner

    Omnimodal video/image understanding via local vLLM — caption, temporal, embodied, grounding.

    examples/06_cosmos3_reason.py · Cosmos 3 Guide

  • 07 — Cosmos 3 Generator

    Text → image / video / video + sound (in-process Diffusers).

    examples/07_cosmos3_generate.py

  • 08 — Cosmos 3 Action

    World-model rollouts: forward/inverse dynamics, policy (Cosmos Framework).

    examples/08_cosmos3_action.py

  • 09 — Cosmos 3 Showcase (Reason → Generate)

    Cosmos 3 reason to generate

    Reason about a real video, then generate similar videos (incl. audio) from the description.

    examples/09_cosmos3_showcase.py · demo/cosmos3_showcase/


Quick Reference

# Example Time (Thor) Recording
1 Basic Text ~11s cast
2 Video Caption ~15s cast
3 Driving Analysis ~16s cast
4 Embodied Reasoning ~43s cast
5 Tool Usage ~9s cast
6 Cosmos 3 Reasoner (vLLM) caption ~5s
7 Cosmos 3 Generator (Diffusers) t2v ~20–55s
8 Cosmos 3 Action (Framework) rollout ~30s
9 Cosmos 3 Showcase (reason→generate) full loop

Running Locally

git clone https://github.com/cagataycali/strands-cosmos.git
cd strands-cosmos
pip install -e .

# Jetson devices: fix CUBLAS first
strands-cosmos-fix-cublas

# Run any example
python examples/01_basic_text.py
python examples/02_video_caption.py
python examples/03_driving_analysis.py
python examples/04_embodied_reasoning.py
python examples/05_tool_usage.py

# Cosmos 3 (see the Cosmos 3 Guide for env setup: just c3-setup-reason / c3-setup-gen)
python examples/06_cosmos3_reason.py       # needs `just c3-serve-reason` running
python examples/07_cosmos3_generate.py     # needs `just c3-setup-gen`
python examples/09_cosmos3_showcase.py     # reason -> generate showcase

Sample media

Examples 02–05 need a sample.mp4 (video) and/or sample.png (image) in the project root. Set paths via environment variables:

export SAMPLE_VIDEO=/path/to/your/video.mp4
export SAMPLE_IMAGE=/path/to/your/image.png

Playing Terminal Recordings

All examples have asciinema .cast recordings:

pip install asciinema

# Play any recording
asciinema play docs/assets/casts/01_basic_text.cast
asciinema play docs/assets/casts/03_driving_analysis.cast

Execution Flow

graph TD
    START["Run Example"] --> MODEL["Load Model<br/>~3s (cached)"]
    MODEL --> MEDIA{"Has media?"}
    MEDIA -->|"Video"| DECODE["Decode frames<br/>@ configured FPS"]
    MEDIA -->|"Image"| PROCESS["Process image<br/>visual tokens"]
    MEDIA -->|"Text only"| TOKENIZE["Tokenize text"]
    DECODE --> INFER["GPU Inference<br/>token-by-token streaming"]
    PROCESS --> INFER
    TOKENIZE --> INFER
    INFER --> OUTPUT["Stream output<br/>to terminal"]
    OUTPUT --> DONE["✅ PASS"]

    style MODEL fill:#264653,color:#fff
    style INFER fill:#76b900,color:#fff
    style DONE fill:#2d6a4f,color:#fff