Architecture¶
How the singleton loader, lazy imports, and tools-only design fit together.
┌──────────────────────────────────────────────────────────┐
│ strands.Agent (your LLM) │
└──────────────────────────────────────────────────────────┘
│ calls @tool functions in parallel
▼
┌──────────────────────────────────────────────────────────┐
│ strands_omnivoice.tools/ (13 thin wrappers) │
│ ├─ tts.py ├─ batch.py │
│ ├─ clone.py ├─ transcribe.py │
│ ├─ design.py ├─ model_lifecycle.py │
│ ├─ info.py ├─ audio_utils.py │
│ └─ demo_server.py │
└──────────────────────────────────────────────────────────┘
│ all tools call get_model()
▼
┌──────────────────────────────────────────────────────────┐
│ strands_omnivoice._loader (singleton) │
│ ├─ Threading lock │
│ ├─ (model_id, device) cache key │
│ └─ Lazy import of `omnivoice` + torch │
└──────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ omnivoice.OmniVoice (k2-fsa upstream — pip install) │
│ ├─ Diffusion-LM TTS architecture │
│ ├─ 600+ languages │
│ ├─ Whisper ASR built-in │
│ └─ HF Transformers backbone │
└──────────────────────────────────────────────────────────┘
Why Tools, Not a Model Provider?¶
Strands Model providers are designed for chat LLMs that produce token streams.
OmniVoice is a TTS model that produces audio waveforms. Wrapping it as a Model would force an awkward chat-message API. Instead, we expose the three generation modes (tts/clone/design) plus utilities as @tool functions, and you bring your own LLM as the agent's brain.
Singleton Loader Rationale¶
OmniVoice's checkpoint is a few hundred MB; a cold load can take 5-10 seconds plus the first-time HF download. If omnivoice_clone and omnivoice_design each loaded their own copy, agent workflows would be miserable.
_loader.get_model() caches by (model_id, device). Multiple tools share the same instance; concurrent calls are serialized by a threading.Lock. To swap models, pass force=True or call unload_model() first.
Lazy Imports¶
Top-level imports of torch and omnivoice are heavy. Tool bodies import them only when a tool is called. This means:
from strands_omnivoice import omnivoice_ttsis fast.Agent(tools=[omnivoice_tts])doesn't trigger any heavy import.- The first
omnivoice_tts(...)call is what loads the model.
Tool Result Shape¶
Every tool returns the standard Strands tool-result dict:
{
"status": "success" | "error",
"content": [
{"text": "human-readable summary"},
{"json": {"output": "/tmp/h.wav", ...}},
],
}
This makes the tools agent-, MCP-, and human-friendly.