Tool Usage — Cosmos Inside Another Agent¶

Use Cosmos as a callable tool inside any Strands agent (Bedrock, Anthropic, OpenAI, Ollama, etc.).

Terminal Recording¶

Tool usage demo

📺 Can't see the animation? Download MP4

View full output

$ python examples/05_tool_usage.py
=== 05: Tool Usage (direct invoke) ===
Loading cosmos_vision_invoke tool... ✅ loaded
Processing video: sample.mp4

Status: success
Response: The scene shows a vehicle driving through a quiet
residential neighborhood. Parked cars line both sides...

Time: 8.7s
=== PASS ===

Play locally: asciinema play docs/assets/casts/05_tool_usage.cast

Architecture¶

graph LR
    U["🧑 User"] --> A["🤖 Cloud Agent<br/>Bedrock / Anthropic /<br/>OpenAI / Ollama"]
    A -->|"cosmos_vision_invoke()"| C["🌌 Cosmos<br/>(local GPU)"]
    C -->|"vision result"| A
    A -->|"other tools"| T["🔧 Shell / Editor /<br/>File / etc."]
    A --> U

    style A fill:#264653,color:#fff
    style C fill:#76b900,color:#fff

The orchestrating agent (cloud or local) decides when to call Cosmos. Cosmos runs locally on your GPU for vision inference. Results flow back.

Direct Tool Invocation¶

examples/05_tool_usage.py

from strands_cosmos import cosmos_vision_invoke

# Direct call — no outer agent needed
result = cosmos_vision_invoke(
    prompt="Describe the scene briefly.",
    video_path="sample.mp4",
    max_tokens=512,
)

print(result["status"])        # "success"
print(result["content"][0]["text"])  # Scene description

As a Tool Inside Another Agent¶

from strands import Agent
from strands_cosmos import cosmos_vision_invoke

# Cosmos becomes a tool inside a Claude / GPT-4 agent
agent = Agent(tools=[cosmos_vision_invoke])
agent("Analyze this dashcam video for safety: /path/to/video.mp4")

Both Tools Together¶

from strands import Agent
from strands_cosmos import cosmos_invoke, cosmos_vision_invoke

agent = Agent(tools=[cosmos_invoke, cosmos_vision_invoke])

# The agent automatically picks the right tool
agent("What happens in this video? /path/to/clip.mp4")  # → vision tool
agent("Explain Newton's third law")                       # → text tool

Tool Parameters¶

`cosmos_vision_invoke`¶

Parameter	Type	Default	Description
`prompt`	`str`	required	Question about the media
`video_path`	`str`	`""`	Path to video file
`image_path`	`str`	`""`	Path to image file
`model_id`	`str`	`nvidia/Cosmos-Reason2-2B`	Model to use
`reasoning`	`bool`	`False`	Enable chain-of-thought
`task`	`str`	`""`	Built-in task prompt key
`fps`	`float`	`4.0`	Video frame rate
`max_tokens`	`int`	`4096`	Maximum output tokens

`cosmos_invoke`¶

Parameter	Type	Default	Description
`prompt`	`str`	required	Text prompt
`model_id`	`str`	`nvidia/Cosmos-Reason2-2B`	Model to use
`reasoning`	`bool`	`False`	Enable chain-of-thought
`max_tokens`	`int`	`4096`	Maximum output tokens

Multi-Agent Architecture¶

graph TD
    USER["User Query"] --> ORCH["Orchestrator<br/>(Claude Sonnet 4)"]

    ORCH -->|"Analyze video"| COSMOS["🌌 Cosmos Vision<br/>(local GPU)"]
    ORCH -->|"Run code"| SHELL["🔧 Shell"]
    ORCH -->|"Read files"| FILE["📄 File Read"]
    ORCH -->|"Search web"| SEARCH["🔍 Search"]

    COSMOS --> ORCH
    SHELL --> ORCH
    FILE --> ORCH
    SEARCH --> ORCH
    ORCH --> RESULT["Final Answer"]

    style COSMOS fill:#76b900,color:#fff
    style ORCH fill:#264653,color:#fff

Model caching

The Cosmos model is loaded once and cached globally. Subsequent tool calls reuse the loaded model — no re-loading penalty.

→ Back to: All Examples