API Reference¶

All 19 Strands tools + 42 just recipes.

Edge inference (Thor)¶

`cosmos_inference`¶

Real-time VLM call against the TRT-EdgeLLM HTTP server.

Arg	Type	Default	Notes
`prompt`	str	required	User instruction
`image_path`	str	""	Mutually exclusive with `image_b64`
`image_b64`	str	""	Base64 image alternative
`server_url`	str	`$COSMOS_VLM_URL`	Override endpoint
`max_tokens`	int	256	Keep low for latency
`temperature`	float	0.2	0.0-0.2 for perception
`system_prompt`	str	""	Optional system message
`return_image`	bool	False	Embed input image in response

→ Recipe: just infer <image> <prompt> [max_tokens] [temperature] [url]

`cosmos_serve`¶

Manage the TRT-EdgeLLM server lifecycle.

Arg	Values
`action`	`start` / `stop` / `restart` / `status` / `logs`
`llm_engine_dir`, `visual_engine_dir`	required for start/restart
`port`, `host`	bind address
`lines`	log lines (status=logs)

→ Recipes: serve-start / serve-stop / serve-restart / serve-status / serve-logs

`cosmos_build_engine`¶

Build a TensorRT engine from ONNX on Thor.

Arg	Default
`which_part`	`"llm"` or `"visual"`
`min_image_tokens`	4
`max_image_tokens`	10240
`max_input_len`	1024

→ Recipes: build-llm-engine, build-visual-engine, build-engines (both)

`rtp_capture_frame`¶

Capture one JPEG from RTP/H.264 (GStreamer, HW-accel).

Arg	Default
`bind_ip`	`"0.0.0.0"`
`port`	5600
`width`	800
`height`	600
`timeout_s`	5
`return_image`	True (embeds bytes)

→ Recipe: just rtp-capture <port> <output> <w> <h> <timeout>

`nats_publish`¶

Publish a JSON payload to a NATS subject.

Arg	Default
`subject`	required
`payload`	dict
`servers`	`$NATS_URL`

→ Recipe: just nats-publish <subject> <payload_json>

`system_info`¶

Host / Jetson / GPU summary.

→ Recipe: just sysinfo

x86 model prep¶

`cosmos_quantize`¶

FP8 / INT8 / INT4 quantization.

Arg	Default
`model_dir`	`"nvidia/Cosmos-Reason2-2B"`
`output_dir`	`"./quantized/R2-fp8"`
`dtype`	`"fp16"`
`quantization`	`"fp8"`

→ Recipe: just quantize <model_dir> <output_dir> <dtype> <quantization>

`cosmos_export_onnx`¶

Export LLM or visual encoder to ONNX.

Arg	Notes
`which_part`	`"llm"` or `"visual"`
`dtype`, `quantization`	visual only

→ Recipes: export-llm, export-visual

`cosmos_model_download`¶

HF download with known shortcuts.

Arg	Default
`name`	required (shortcut or HF repo)
`local_dir`	`""` (default `./checkpoints/<name>`)
`kind`	`"model"` or `"dataset"`

→ Recipes: download, download-dataset

`cosmos_reason_hf`¶

HF Transformers inference (full-precision reference).

Arg	Default
`image_path`	"" (or `video_path`)
`model_id`	`"nvidia/Cosmos-Reason2-2B"`
`max_new_tokens`	256
`temperature`	0.2
`device`	`"auto"`

→ No recipe (direct Transformers call).

Generation¶

`cosmos_predict_generate`¶

World model video generation.

Arg	Default
`prompt`	required
`model_variant`	`"video2world"` (also `text2world`, `action_conditioned`, `multiview`)
`num_frames`, `height`, `width`, `fps`	121, 720, 1280, 24
`guidance_scale`	7.0
`num_steps`	35
`seed`	0
`checkpoint`	"" (override)
`repo_dir`	"" (override `$COSMOS_PREDICT_REPO`)

→ Recipe: just predict-generate <input_json>

`cosmos_transfer_generate`¶

ControlNet-style video transfer.

Arg	Default
`prompt`	required
`control`	`"edge"` (also `depth`, `seg`, `vis`, `multi`)
`control_video`	"" (optional)
`style_image`	"" (optional)
`control_weights`	dict (required for `control="multi"`)
`guidance_scale`, `num_steps`, `seed`	3.0, 35, 0

→ Recipe: just transfer-generate <input_json> <control>

Training¶

`cosmos_post_train`¶

Post-train Reason2 / Predict2.5 / Transfer2.5.

Arg	Values
`config_path`	required YAML
`model_family`	`reason2` / `predict2_5` / `transfer2_5`
`strategy`	`full` / `lora` / `rl` (reason2 only for `rl`)
`num_gpus`	1 (predict/transfer)
`dry_run`	False

→ Recipes: post-train-reason2, post-train-reason2-rl, post-train-predict, post-train-transfer

`cosmos_distill`¶

Step distillation (KD / DMD2).

Arg	Values
`teacher_checkpoint`	required
`student_output`	required
`method`	`"kd"` or `"dmd2"`
`model_family`	`"transfer2_5"` or `"predict2_5"`
`num_gpus`	8

→ Recipe: just distill <teacher> <student> <method> <family>

Data + Eval¶

`cosmos_curate`¶

Cosmos-Xenna curation pipeline.

Arg	Default
`input_dir`	required
`output_dir`	`"./outputs/curated"`
`stages`	`"all"` or comma-separated
`num_workers`	8

→ Recipe: just curate <input> <output> <stages> <workers>

`cosmos_evaluate`¶

12 metrics.

Arg	Valid metrics
`metric`	`fid fvd tse cse sampson blur_ssim canny_f1 depth_rmse seg_miou dover reason_critic reason_reward`
`pred_path`	required
`gt_path`	required for most
`output_dir`	`"./outputs/eval"`

→ Recipe: just evaluate <metric> <pred> <gt>

Utilities¶

`image_read`¶

Load an image and embed it in the response (Converse API compatible).

`video_probe` / `video_extract_frames`¶

ffprobe JSON / extract frames at specified FPS.

→ Recipes: video-probe, video-frames

Meta-recipes (pipelines)¶

Recipe	Chain
`prep-edge-model`	download → quantize → export-llm → export-visual
`pipeline-edge-deploy`	prep-edge-model + Thor hand-off hints
`pipeline-gr00t-dreams`	download-dataset + post-train-predict
`perception-loop`	rtp-capture + infer + nats-publish (∞ loop)
`deploy-thor`	rsync + remote `just install`
`smoke`	env + sysinfo + serve-status

Env vars (quick reference)¶

See Installation → Configure .env.

API Reference¶

Edge inference (Thor)¶

cosmos_inference¶

cosmos_serve¶

cosmos_build_engine¶

rtp_capture_frame¶

nats_publish¶

system_info¶

x86 model prep¶

cosmos_quantize¶

cosmos_export_onnx¶

cosmos_model_download¶

cosmos_reason_hf¶