Audio¶

Every clip below is real stabilityai/stable-audio-open-1.0 output (fp16/cuda) — one use_diffusers call each, 44.1 kHz stereo, written straight to .wav. The sample rate comes from the model, never assumed.

an audio waveform drawing itself

“A dog barking in a backyard”
“Heavy rain and rolling thunder”
“Birds chirping in a forest at dawn”
“Mechanical keyboard typing quickly”

Music, too¶

The same model does music. Here's a 10-second lo-fi beat — one call, stereo:

One call¶

from strands_diffusers import use_diffusers

use_diffusers(
    action="run",
    pipeline="StableAudioPipeline",
    model="stabilityai/stable-audio-open-1.0",
    parameters={"prompt": "A dog barking in a backyard",
                "negative_prompt": "low quality, muffled",
                "audio_end_in_s": 5.0,
                "num_inference_steps": 120},
    dtype="float16", device="cuda",
)
# -> artifacts: ['/tmp/strands_diffusers/audio_*.wav']  (44.1 kHz stereo)

Scheduler is handled for you

stable-audio-open-1.0 ships with CosineDPMSolverMultistepScheduler, which depends on torchsde — and that hits a RecursionError on some torch builds. use_diffusers automatically swaps it for the equivalent non-SDE DPMSolverMultistepScheduler, so the call above just works. Want the original? Set STRANDS_DIFFUSERS_KEEP_SDE_SCHEDULER=1.

Stereo is preserved end-to-end: output is written as [N, C] whether the model returns channels-first [C, N] or channels-last [N, C], and the time axis is always kept.

Find an audio pipeline¶

use_diffusers(action="modalities")["data"]["audio"]
# ['AceStepPipeline', 'AudioLDMPipeline', 'AudioLDM2Pipeline',
#  'MusicLDMPipeline', 'StableAudioPipeline', 'DanceDiffusionPipeline', ...]

model	rate	best for
stable-audio-open-1.0	44.1k stereo	sound effects, ambience, short music
ACE-Step-v1-3.5B	44.1k stereo	full songs with vocals
musicgen	32k	instrumental music
audioldm2	16k	speech / SFX