2 Models Available
A video editor lays whoosh, impact, ambience, and UI sounds over an AI-generated cut so it stops sounding like a silent draft. On Martini's canvas, route the locked picture into Hunyuan Foley for video-to-audio Foley, or feed concrete prompts into ElevenLabs Sound Effects v2 ("close metallic door slam in narrow concrete hallway, short reverb"). Stack a Minimax Music atmospheric bed underneath, then mix everything into the timeline before NLE export. Pick a model below to walk through the final sonic pass on a 30-60 second product or narrative video.
ElevenLabs
ElevenLabs Sound Effects v2 generates royalty-free sound effects from text prompts — whoosh transitions, impact stingers, ambient room tones, UI feedback sounds, footsteps, door slams, rain, machinery. Each prompt returns 4 variant clips so you can pick the take that fits the frame, then snap it to the video timeline. The model uses the official `eleven_text_to_sound_v2` endpoint, which means concrete prompts ("close metallic door slam in narrow concrete hallway, short reverb") beat vague ones ("door sound") by a wide margin. On Martini, SFX nodes attach directly to a video timeline segment, so a video editor can lay foley over an AI-generated cut without leaving the canvas. The 4-variant grid is the load-bearing UX — generate, listen, pick, snap to frame.
Fish Audio
Fish Audio S2-Pro is a text-to-speech model, not a dedicated sound-effects generator — its core job is expressive voice synthesis with bracket cues and multi-speaker dialogue. For pure foley (whoosh transitions, impact stingers, ambient room tone, UI feedback), ElevenLabs Sound Effects v2 is the right tool because it's built for that surface. Fish Audio S2-Pro plays a complementary role on the same canvas: it handles voice-driven sound design — character vocalizations like grunts, sighs, gasps, breathing, laughter, and cry effects via bracket cues like [exhausted sigh], [sharp gasp], [nervous chuckle], [exhausted breathing]. For a video that needs both real foley (door slams, ambient beds) and human-vocal SFX (a character's gasp, a runner's breathing), use ElevenLabs SFX v2 for the foley cues and Fish Audio S2-Pro for the vocal cues, both attached to the same canvas timeline.