ElevenLabs
ElevenLabs Sound Effects v2 generates royalty-free sound effects from text prompts — whoosh transitions, impact stingers, ambient room tones, UI feedback sounds, footsteps, door slams, rain, machinery. Each prompt returns 4 variant clips so you can pick the take that fits the frame, then snap it to the video timeline. The model uses the official `eleven_text_to_sound_v2` endpoint, which means concrete prompts ("close metallic door slam in narrow concrete hallway, short reverb") beat vague ones ("door sound") by a wide margin. On Martini, SFX nodes attach directly to a video timeline segment, so a video editor can lay foley over an AI-generated cut without leaving the canvas. The 4-variant grid is the load-bearing UX — generate, listen, pick, snap to frame.
Before prompting, decide where each SFX lives on the video timeline. A 30-second product video typically needs: 1 whoosh transition between the open and the product reveal, 1 impact stinger when a key feature lands on screen, 1 ambient room tone running underneath dialogue, and 1 UI sound (click, ding, swipe) per interactive element shown. Mark these spots on the canvas timeline first. Each SFX attaches to a specific frame range — typically 0.5-3 seconds long for transitions and stingers, 5-30 seconds for ambient beds. Spotting upfront makes prompting easier because you already know what the SFX needs to do.
The single biggest quality lever for SFX v2 is prompt specificity. A 5-element template works: source (what makes the sound), action (what it's doing), space (the environment), distance (close vs. far), intensity (soft vs. loud). Examples: "close metallic door slam in narrow concrete hallway, short reverb" (door SFX); "distant thunder rumble in open countryside, long decay, low frequency" (ambient transition); "soft mechanical click, dry, no reverb, intimate ASMR distance" (UI tap). Vague prompts like "door sound" or "scary ambient" produce generic results because the model cannot disambiguate which kind of door, which kind of scary. Write the prompt the way a foley artist would describe the cue.
Each SFX v2 prompt returns 4 variant clips. Listen to all four back to back at the actual frame placement (not in isolation) — a whoosh that sounds great solo can clash with the music bed underneath. Pick the take that supports the visual without competing for attention. Then snap the chosen variant to the exact frame the cue should hit. Martini's canvas lets you drag the SFX clip on the timeline; an impact stinger usually wants its peak amplitude on the frame the visual lands, not 0.1s before or after. For ambient beds, the loop point matters more than the start — fade in over 2-3 seconds, hold under dialogue, fade out at the cut.
A polished 30-60 second video typically has 4-8 SFX cues layered, not 1-2. Standard layering: ambient bed (room tone or atmospheric drone) running underneath, transitional SFX (whooshes, risers) at cuts, impact SFX (stingers, slams) on key visual moments, and UI SFX (clicks, dings) on element-specific frames. Place each SFX on its own Audio node and attach to its target frame range. The Martini canvas timeline lets you stack and align all cues without an external NLE. For final delivery, the canvas exports as a single audio mix or as separate stems — depends on whether you're shipping the video as-is or handing off to a colorist/sound designer.
5-element foley prompt — source (metallic door), action (slam), space (narrow concrete hallway), distance (close), intensity/character (sharp, mid frequency). Beats "door sound" by a wide margin. Use for a hard-cut transition or a scene-end stinger.
close metallic door slam in narrow concrete hallway, short reverb, sharp impact, mid frequency
Ambient bed prompt — describes space and texture explicitly, requests a seamless loop length so the SFX layers under a 30s scene without obvious repeat. Place this on its own Audio node running underneath dialogue.
low rumbling room tone, abandoned warehouse at night, distant air conditioner hum, sparse occasional creak, 30 seconds seamless loop
Concrete prompts beat vague ones by a wide margin. Use the 5-element template (source, action, space, distance, intensity) for every cue. "Door sound" produces generic; "close metallic door slam in narrow concrete hallway, short reverb" produces useable.
4 variants per prompt is the load-bearing UX. Listen at the actual frame placement (not solo) — what sounds great in isolation may clash with music underneath.
For ambient beds, request a seamless loop length explicitly ("30 seconds seamless loop") so the model produces a clip that doesn't obviously repeat. Layer this under dialogue/SFX cues.
Standard cue density for a polished 30-60s video: 4-8 SFX cues layered. Ambient bed + transitional SFX + impact stingers + UI sounds. Anything sparser feels like a silent draft.
SFX v2 output is royalty-free, so commercial use is fine without further licensing. This is a real differentiator vs. stock libraries that charge per-use fees.
ElevenLabs Sound Effects v2 produces royalty-free, prompt-driven foley that scales with prompt specificity. The 4-variant grid lets you pick the take that fits the frame; the canvas timeline lets you snap each cue to the exact moment the visual needs support. Trade-off vs. dedicated foley libraries: SFX v2 is faster to iterate and free of per-use licensing, but a hand-crafted recording from a foley artist still has subtle character that prompt-driven generation can't fully match for marquee content. For 90% of social, product, and educational video work — where speed and licensing flexibility matter more than artisanal foley — SFX v2 is the right tool. The Martini canvas keeps SFX adjacent to the video timeline, so a video editor can lay 4-8 cues across a 30-60s edit without leaving the workspace.
Connect ElevenLabs Sound Effects v2 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started Free