Skip to content

thor-cosmos

Training & Distillation

cagataycali/thor-cosmos

Training & Distillation¶

thor-cosmos wraps four post-training entry points plus step distillation.

Post-training tool¶

cosmos_post_train(
    config_path="configs/my-sft.yaml",
    model_family="reason2",            # reason2 | predict2_5 | transfer2_5
    strategy="full",                   # full | lora | rl
    num_gpus=8,
    dry_run=False,
)

Per-family dispatch¶

Family	Strategy	Recipe	Backend
`reason2`	`full` or `lora`	`post-train-reason2`	`cosmos-cli train`
`reason2`	`rl`	`post-train-reason2-rl`	`cosmos-rl`
`predict2_5`	n/a	`post-train-predict`	`torchrun -m cosmos_predict2.train`
`transfer2_5`	n/a	`post-train-transfer`	`torchrun -m cosmos_transfer2.train`

CLI examples¶

Reason-2 SFT (LoRA)¶

just post-train-reason2 configs/reason2-sft.yaml lora

Reason-2 RLHF / DPO¶

just post-train-reason2-rl configs/reason2-dpo.yaml

Predict 2.5 fine-tune¶

just post-train-predict configs/gr00t-dreams.yaml 8

Transfer 2.5 fine-tune¶

just post-train-transfer configs/indoor-restyle.yaml 8

Distillation¶

Step distillation (Teacher → Student with fewer denoising steps).

cosmos_distill(
    teacher_checkpoint="./ckpts/transfer-teacher",
    student_output="./ckpts/transfer-4step",
    method="dmd2",                     # kd | dmd2
    model_family="transfer2_5",        # transfer2_5 | predict2_5
    num_gpus=8,
)

`kd` vs `dmd2`¶

kd (Knowledge Distillation): Student mimics Teacher's outputs. Stable, slower convergence.
dmd2 (Distribution Matching Distillation v2): Matches output distributions. Faster, needs more tuning.

Config discovery¶

Training configs usually live under configs/ in the relevant upstream repo. The cookbook recipes publish ready-to-use configs for GR00T-Dreams, indoor restyle, etc.

Monitoring¶

stdout: the tool streams last 8 KB of stdout/stderr into the Strands result
TensorBoard: most upstream configs write to ./logs/tb/
W&B: set WANDB_PROJECT in env; upstream configs pick it up