Skip to content

Quickstart

A 2-minute tour. Prerequisites: installation done.

1. Start the agent

thor-cosmos
# ๐Ÿค–๐ŸŒŒ thor-cosmos agent โ€” ready
#     model = global.anthropic.claude-opus-4-6-v1
#     tools = 19
#     type 'exit' or Ctrl-C to quit
# ๐ŸŒŒ โ–ธ

2. Talk to it

๐ŸŒŒ โ–ธ what's the state of the VLM server?

The agent calls cosmos_serve(action="status") โ†’ just serve-status โ†’ returns ๐Ÿ”ด not running.

๐ŸŒŒ โ–ธ download Cosmos-Reason2-2B and tell me what it is

The agent chains cosmos_model_download(name="reason2-2b") with a knowledge lookup.

3. Or use just directly

Every capability is a shell recipe โ€” the agent and the operator share it:

just --list
#   default
#   env
#   install
#   run
#   deploy-thor
#   download
#   download-dataset
#   quantize
#   export-llm
#   export-visual
#   ...
#   smoke

4. Run a real pipeline

Prep a model on your x86 host

just prep-edge-model reason2-2b ./models/R2-fp8
# โ†’ download from HF
# โ†’ quantize to FP8
# โ†’ export LLM to ONNX
# โ†’ export visual encoder to ONNX

Deploy to Thor

just deploy-thor cagatay@thor.local ~/thor-cosmos
scp -r ./models/R2-fp8/onnx cagatay@thor.local:~/R2-fp8-onnx

On Thor: build engines + serve

ssh cagatay@thor.local
cd ~/thor-cosmos
just build-engines ~/R2-fp8-onnx ~/R2-fp8-engines
just serve-start ~/R2-fp8-engines/llm ~/R2-fp8-engines/visual
just serve-status      # ๐ŸŸข running pid=1234  http://127.0.0.1:8080

Inference

just infer assets/test.jpg "count people and describe their clothing"

Or through the agent:

๐ŸŒŒ โ–ธ capture a frame from RTP and tell me what you see

The agent calls rtp_capture_frame(...) โ†’ cosmos_inference(...) in one turn (the frame bytes are embedded in the first tool result, so the second tool sees the image directly).

Next