Quickstart¶

A two-minute tour of every voice mode — auto, design, and clone.

A minimal end-to-end example.

from strands import Agent
from strands_omnivoice import (
    omnivoice_tts, omnivoice_clone, omnivoice_design,
    omnivoice_sysinfo, audio_play,
)

agent = Agent(tools=[
    omnivoice_tts, omnivoice_clone, omnivoice_design,
    omnivoice_sysinfo, audio_play,
])

# 1. Sanity check
agent("omnivoice_sysinfo")

# 2. Auto voice
agent("omnivoice_tts text='Hello world' output=/tmp/hello.wav, then audio_play it")

Direct Tool Calls (No Agent)¶

Each @tool is just a function — call it directly:

from strands_omnivoice import omnivoice_tts

result = omnivoice_tts(text="Hello", output="/tmp/h.wav")
print(result["content"][0]["text"])
# → "🔊 wrote /tmp/h.wav (1.23s @ 24000 Hz)"

Three Generation Modes — All Real Samples¶

AutoDesignClone

omnivoice_tts(
    text="Hello world.",
    output="/tmp/auto.wav",
    language="English",
)

omnivoice_design(
    text="Once upon a time, in a land far far away.",
    instruct="female, elderly, low pitch, british accent",
    output="/tmp/story.wav",
)

omnivoice_clone(
    text="This is the cloned voice speaking different words.",
    ref_audio="/tmp/hello.wav",
    output="/tmp/cloned.wav",
)

Model Pre-warming¶

To avoid load-latency on the first synthesis, pre-warm:

from strands_omnivoice import omnivoice_load_model

omnivoice_load_model(device="mps")  # or "cuda", or leave empty for auto

Subsequent calls reuse the cached weights — zero double-load even when chaining omnivoice_clone → omnivoice_design → omnivoice_tts.

Running the Smoke-test Agent¶

python agent.py "Show sysinfo, then synth 'привет мир' to /tmp/ru.wav and play it"

The agent.py script in the repo loads all 13 tools and gives the LLM full creative freedom.

→ Voice Cloning Guide · Voice Design Guide