Changelog¶
All notable changes to strands-cosmos are documented here.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased¶
[0.4.4] - 2026-06-04¶
Changed¶
cosmos3_video2videonow exposes the conditioning controls that actually drive how much the prompt transforms the video:condition_frames(latent frame indexes kept clean) andcondition_keep("first"/"last"). New restyle-friendly defaults (condition_frames="0",condition_keep="last",guidance=8,steps=35) so a real transform (day→night, recolor, change the scene) is the easy path. With the old defaults the prompt barely changed the video (it reconstructed the input).c3-v2vjustfile recipe gains matchingcond_frames/cond_keepparams.- Added
demo/cosmos3_v2v/showcase (recolor / rain / crowd / night) + tuning guide.
[0.4.3] - 2026-06-04¶
Added — native video-to-video transfer¶
cosmos3_video2videotool +c3-omni-docker/c3-v2vjustfile recipes: structure-preserving video transfer (e.g. day→night, recolor buildings, restyle) by re-rendering an input video with a new prompt via the Cosmos 3 vLLM-Omni server (/v1/videos/sync,input_reference).- vLLM-Omni runs from the official
vllm/vllm-omni:cosmos3Docker image (the only build with all modalities incl. video2video).c3-omni-dockerlaunches it with--gpus alland the shared HF cache. - Validated live: input construction-site video transformed to night / recolor / rain (832×480, structure preserved, ~12s/clip).
[0.4.2] - 2026-06-04¶
Fixed¶
- All justfile-backed tools failed on
pip install: thejustfile(which everycosmos_*/cosmos3_*tool shells out to) was never packaged, so_find_justfile()returnedNonein a pip install and tools errored with exit 127. The justfile is now bundled into the wheel/sdist atstrands_cosmos/justfile(copied from the repo root at build time) and the lookup resolves it. Verified from a clean wheel install. Added a regression test.
[0.4.1] - 2026-06-04¶
Fixed¶
- PyPI publish failure (0.3.2 / 0.4.0 never reached PyPI): the
cosmos3-genandallextras declared a direct-URL dependency (diffusers @ git+https://...), which PyPI rejects on upload. The extras now require plaindiffusers>=0.36; the Cosmos3OmniPipeline dev build is installed separately viajust c3-setup-genorpip install -U "git+https://github.com/huggingface/diffusers.git". - Release workflow now fails fast if built metadata contains any direct-URL dependency (guard step before PyPI publish).
[0.4.0] - 2026-06-04¶
Added — Cosmos 3 post-training (SFT)¶
- 7
cosmos3_train_*tools +c3-train-*justfile recipes wrapping the NVIDIA Cosmos Framework supervised fine-tuning stack:cosmos3_train_recipes,cosmos3_train_show,cosmos3_train_convert,cosmos3_train_convert_vlm,cosmos3_train_prep_dataset,cosmos3_train,cosmos3_train_export. - Full 4-step flow: checkpoint → DCP convert, dataset prep, SFT run (paired
launch shell /
torchrun -m cosmos_framework.scripts.train), and DCP → HF safetensors export. Recipes: vision_sft_nano/super, llava_ov, videophy2_nano. - New Cosmos 3 Training guide +
examples/10_cosmos3_finetune.py;c3-doctorreports the training env. - Validated locally (no 8× H100 needed):
c3-train-recipeslists recipes andc3-train-showresolves/validates the full SFT config viatrain.py --dryrun.
[0.3.2] - 2026-06-04¶
Added¶
cosmos3-genoptional extra —pip install "strands-cosmos[cosmos3-gen]"installs the Cosmos 3 generator backend in one step (Diffusers dev build,cosmos_guardrail,soundfile, imageio), making text/image -> image/video/sound generation frictionless without the justfile.
Fixed¶
just c3-setup-gennow also installssoundfile(required for the video-with-sound audio mux path).
[0.3.1] - 2026-06-04¶
Fixed¶
- Video decoding on fresh installs:
transformers5.x decodes video viatorchcodecand silently falls back totorchvision, whoseio.read_videowas removed intorchvision>=0.27(AttributeErrorwhen captioning a video).torchcodecis now a core dependency so video works out of the box.
0.3.0 - 2026-06-04¶
Added — Cosmos 3 omnimodal world models 🌌¶
First-class support for NVIDIA's newest Cosmos 3 omnimodal model family — reasoning and generation across text, image, video, audio, and action. Runs on local compute (vLLM / Diffusers / Cosmos Framework).
- Model providers (implement
strands.models.Model): Cosmos3ReasonerModel— omnimodal reasoning (text + vision → text) via a local vLLM server (Cosmos3ReasonerForConditionalGeneration). Captioning, temporal localization, embodied next-action, 2D grounding, physical plausibility, situation understanding, action chain-of-thought. Supports explicit<think>reasoning.Cosmos3GeneratorModel— generation (text/image → image/video/sound) in-process via HuggingFace DiffusersCosmos3OmniPipeline. Generates and muxes stereo AAC audio @ 48kHz for video-with-sound.- 16
cosmos3_*tools (thin justfile wrappers): - Reasoner:
cosmos3_reason,cosmos3_caption,cosmos3_temporal,cosmos3_embodied,cosmos3_ground,cosmos3_plausibility,cosmos3_situation,cosmos3_action_cot - Generator:
cosmos3_text2image,cosmos3_text2video,cosmos3_image2video,cosmos3_text2video_sound - Action / world-model:
cosmos3_forward_dynamics,cosmos3_inverse_dynamics,cosmos3_policy - Servers:
cosmos3_serve - 13
c3-*justfile recipes:c3-doctor,c3-setup-{reason,gen,omni,framework},c3-serve-{reason,omni,status,stop-reason,stop-omni},c3-reason,c3-gen,c3-action. cosmos3optional extra (pip install "strands-cosmos[cosmos3]") for the reasoner OpenAI client.- Examples:
06_cosmos3_reason.py,07_cosmos3_generate.py,08_cosmos3_action.py,09_cosmos3_showcase.py(reason → generate showcase). - Showcase:
demo/cosmos3_showcase/— Cosmos 3 reasons about a real video, then generates similar videos (incl. one with synchronized audio) from its own description. - Docs: new Cosmos 3 Guide; README, index, quickstart, installation, architecture, API reference, and capability guides all lead with Cosmos 3.
Changed¶
- README and docs now present 4 model providers + 37 tools and lead with Cosmos 3 as the flagship; Cosmos-Reason2 repositioned as the lightweight edge/Jetson VLM.
c3-doctorrecognizes cached Hugging Face tokens (not justHF_TOKEN/env).c3-serve-status(andcosmos3_serve) detect directly-launched servers via an HTTP/healthprobe, not only the PID file.
Verified (single NVIDIA L40S, 46 GB, no NIM)¶
- Reasoner: caption 6.6s, plus temporal / embodied / plausibility / situation.
- Generator: text→image, text→video, image→video, and text→video + sound (H264 + AAC stereo 48kHz).
- Action: forward-dynamics rollout (832×480, 61 frames) via the Cosmos Framework.
Notes¶
- CUDA pairing: match the torch backend to your driver — CUDA 13 →
cu130+vllm==0.21.0; CUDA 12.8 →cu128+vllm==0.19.1.just c3-doctorreports it. - Single-GPU: the reasoner (vLLM) and generator (Diffusers) each load a 16B model and will not fit on one ~46 GB GPU simultaneously — stop one before running the other.
- The Cosmos 3 reasoner caps
--max-model-len 32768to avoid KV-cache OOM (the model's 262K default exceeds a single 46 GB GPU). - Removed the internal
COSMOS3_INTEGRATION.mdplanning doc (superseded by shipped code + docs).
0.2.0 - 2026-05-08¶
Added¶
- 21 tools covering the full Cosmos pipeline (Reason2 inference, Predict2.5,
Transfer2.5, model lifecycle, training, distillation, Xenna curation, evaluation,
I/O, system diagnostics) — all thin wrappers over a 50+ recipe
justfile. - MkDocs Material documentation site with examples, guides, and API reference.
- Notebooks and runnable examples (01–05).
AGENTS.mddevelopment contract.
0.1.2 - 2026-03-xx¶
Fixed¶
- asyncio event-loop thread-safety in
cosmos_vision_invoke.
0.1.1 - 2026-03-09¶
Added¶
- Initial public release:
CosmosVisionModel(video + image + text) andCosmosModel(text-only) Strands model providers for Cosmos-Reason2. - Jetson CUBLAS compatibility fix (
strands-cosmos-fix-cublas). - Dashcam safety-analysis demo.