Examples¶
New here? Start with the notebooks
These scripts are the runnable distillation of the
interactive notebooks — learn the concept step by step
in a notebook, then ship it from examples/.
Runnable examples tested on NVIDIA Jetson AGX Thor (132GB unified memory).
Demo Video¶
Click to watch the full demo video
All Examples¶
-
01 — Basic Text (Physics Reasoning)

Text-only physics reasoning — no video or image needed. ~11s on Thor.
-
02 — Video Captioning

Detailed temporal-spatial descriptions from video. ~15s on Thor.
-
03 — Driving Analysis (CoT)

Dashcam safety analysis with chain-of-thought reasoning. ~16s on Thor.
-
04 — Embodied Reasoning

Robot next-action prediction from workspace images. ~43s on Thor.
-
05 — Tool Usage

Cosmos as a callable tool inside any Strands agent. ~9s on Thor.
-
06 — Cosmos 3 Reasoner
Omnimodal video/image understanding via local vLLM — caption, temporal, embodied, grounding.
→
examples/06_cosmos3_reason.py· Cosmos 3 Guide -
07 — Cosmos 3 Generator
Text → image / video / video + sound (in-process Diffusers).
→
examples/07_cosmos3_generate.py -
08 — Cosmos 3 Action
World-model rollouts: forward/inverse dynamics, policy (Cosmos Framework).
→
examples/08_cosmos3_action.py -
09 — Cosmos 3 Showcase (Reason → Generate)

Reason about a real video, then generate similar videos (incl. audio) from the description.
→
examples/09_cosmos3_showcase.py· demo/cosmos3_showcase/
Quick Reference¶
| # | Example | Time (Thor) | Recording |
|---|---|---|---|
| 1 | Basic Text | ~11s | cast |
| 2 | Video Caption | ~15s | cast |
| 3 | Driving Analysis | ~16s | cast |
| 4 | Embodied Reasoning | ~43s | cast |
| 5 | Tool Usage | ~9s | cast |
| 6 | Cosmos 3 Reasoner (vLLM) | caption ~5s | — |
| 7 | Cosmos 3 Generator (Diffusers) | t2v ~20–55s | — |
| 8 | Cosmos 3 Action (Framework) | rollout ~30s | — |
| 9 | Cosmos 3 Showcase (reason→generate) | full loop | — |
Running Locally¶
git clone https://github.com/cagataycali/strands-cosmos.git
cd strands-cosmos
pip install -e .
# Jetson devices: fix CUBLAS first
strands-cosmos-fix-cublas
# Run any example
python examples/01_basic_text.py
python examples/02_video_caption.py
python examples/03_driving_analysis.py
python examples/04_embodied_reasoning.py
python examples/05_tool_usage.py
# Cosmos 3 (see the Cosmos 3 Guide for env setup: just c3-setup-reason / c3-setup-gen)
python examples/06_cosmos3_reason.py # needs `just c3-serve-reason` running
python examples/07_cosmos3_generate.py # needs `just c3-setup-gen`
python examples/09_cosmos3_showcase.py # reason -> generate showcase
Sample media
Examples 02–05 need a sample.mp4 (video) and/or sample.png (image) in the project root. Set paths via environment variables:
Playing Terminal Recordings¶
All examples have asciinema .cast recordings:
pip install asciinema
# Play any recording
asciinema play docs/assets/casts/01_basic_text.cast
asciinema play docs/assets/casts/03_driving_analysis.cast
Execution Flow¶
graph TD
START["Run Example"] --> MODEL["Load Model<br/>~3s (cached)"]
MODEL --> MEDIA{"Has media?"}
MEDIA -->|"Video"| DECODE["Decode frames<br/>@ configured FPS"]
MEDIA -->|"Image"| PROCESS["Process image<br/>visual tokens"]
MEDIA -->|"Text only"| TOKENIZE["Tokenize text"]
DECODE --> INFER["GPU Inference<br/>token-by-token streaming"]
PROCESS --> INFER
TOKENIZE --> INFER
INFER --> OUTPUT["Stream output<br/>to terminal"]
OUTPUT --> DONE["✅ PASS"]
style MODEL fill:#264653,color:#fff
style INFER fill:#76b900,color:#fff
style DONE fill:#2d6a4f,color:#fff