Fish Audio

How to Create AI Sound Effects for Video with Fish Audio S2-Pro

Fish Audio S2-Pro is a text-to-speech model, not a dedicated sound-effects generator — its core job is expressive voice synthesis with bracket cues and multi-speaker dialogue. For pure foley (whoosh transitions, impact stingers, ambient room tone, UI feedback), ElevenLabs Sound Effects v2 is the right tool because it's built for that surface. Fish Audio S2-Pro plays a complementary role on the same canvas: it handles voice-driven sound design — character vocalizations like grunts, sighs, gasps, breathing, laughter, and cry effects via bracket cues like [exhausted sigh], [sharp gasp], [nervous chuckle], [exhausted breathing]. For a video that needs both real foley (door slams, ambient beds) and human-vocal SFX (a character's gasp, a runner's breathing), use ElevenLabs SFX v2 for the foley cues and Fish Audio S2-Pro for the vocal cues, both attached to the same canvas timeline.

Try Fish Audio S2-Pro Free

Step-by-Step Guide

Decide which SFX category each cue belongs to

Sort the SFX cues for the video into two buckets before picking the model. Bucket 1 (foley): door slams, glass breaks, footsteps, machinery, weather, ambient beds, UI clicks — these are non-vocal environmental sounds and route to ElevenLabs Sound Effects v2. Bucket 2 (vocal SFX): a character's gasp, a runner's heavy breathing, a frustrated sigh, a startled scream, a tired exhale — these are human-vocal cues and route to Fish Audio S2-Pro using bracket-only prompts inside an Audio node. The split matters because each model is built for its bucket; using SFX v2 for a "frustrated sigh" produces a generic sigh, while Fish Audio with [frustrated exhausted sigh] produces a sigh tied to a specific character voice.

Pick the character voice for the vocal SFX

Fish Audio S2-Pro vocal SFX inherit the character of the selected voice. A gasp from a deep male voice (cloned narrator) sounds different from a gasp from a young female voice (prebuilt expressive). Pick the voice first — typically you're reusing a character voice already cast for dialogue in the same scene. For diegetic vocal SFX (a character on screen reacting), use that character's established voice. For off-screen vocal SFX (a generic crowd reaction, an unseen scream), use a different voice or a cloned background voice so it doesn't pull focus from the on-screen character. Voice consent applies here too if you cloned voices.

Prompt with bracket-only cues

Vocal SFX prompts are bracket-only, no spoken words. Examples: `[sharp gasp]`, `[exhausted breathing for 5 seconds]`, `[nervous chuckle]`, `[startled scream then silence]`, `[panting after running, slow recovery]`. The model interprets the bracket as the entire vocal performance, no surrounding sentence needed. This is different from Fish Audio's normal dialogue use where brackets direct delivery on a spoken line. For a panting-after-running cue, place the bracket-only prompt on its own Audio node, generate, listen, then attach to the chase scene's post-action recovery beat on the canvas timeline.

Layer vocal SFX with foley and dialogue on the canvas

A scene with a chase moment typically needs three SFX layers: ambient bed (alleyway echo from ElevenLabs SFX v2), foley (running footsteps from SFX v2), vocal SFX (panting recovery from Fish Audio S2-Pro), and the spoken dialogue that follows (Fish Audio S2-Pro Dialogue mode or ElevenLabs Dialogue v3). Place each on its own Audio node and align to the timeline. The Martini canvas handles the layering; for final delivery you can export a single audio mix or separate stems for handoff to a mixer. Note: Fish Audio in Martini is currently SEO-positioned — production runtime depends on workspace configuration. If Fish Audio isn't wired up for vocal SFX, ElevenLabs Eleven v3 with bracket-only prompts (e.g., `[gasp]` as a standalone) is a fallback, though tag coverage is narrower.

Prompt Examples

Vocal SFX prompt for a chase-scene recovery beat — describes timing and intensity inside the bracket. Use the same character voice as that scene's dialogue for diegetic continuity.

[panting after running, heavy chest, slow recovery over 5 seconds]

Reaction SFX for a horror or thriller jump-cut — bracket-only prompt, no surrounding spoken words. Place on the frame the visual reveal lands.

[sharp startled gasp then sudden silence, female voice]

Parameter Tips

Fish Audio S2-Pro is a TTS model, not a foley generator. Use it only for vocal SFX (gasps, sighs, breathing, laughter, screams) and route door slams, ambient beds, UI sounds to ElevenLabs Sound Effects v2.

Vocal SFX prompts are bracket-only — no spoken words around the cue. The bracket is the entire performance: [sharp gasp], [exhausted breathing 5 seconds], [nervous chuckle].

Pick the voice before the cue. Diegetic vocal SFX inherit the character voice; off-screen reactions should use a different voice so they don't pull focus.

Pair Fish Audio vocal SFX with ElevenLabs Sound Effects v2 foley on the same canvas timeline. Each model handles the bucket it's built for — foley vs. vocal — and the canvas keeps all cues aligned.

Voice consent matters for vocal SFX too. If you cloned a voice for character dialogue, the same consent applies to vocal SFX generated with that voice.

What to Expect

Fish Audio S2-Pro is the vocal-SFX complement to ElevenLabs Sound Effects v2 on Martini. Use it for character gasps, breathing, sighs, laughs, and similar human-vocal cues that inherit a chosen character voice; route foley (doors, ambience, footsteps, UI) to SFX v2 on the same canvas. The Martini canvas timeline accepts both models' outputs and aligns them to the video, so a 30-60s edit can layer ambient bed (SFX v2) + foley (SFX v2) + vocal SFX (Fish Audio) + dialogue (Dialogue v3 or Fish Audio dialogue) without leaving the workspace. For pure foley work or for English-only projects where polish matters most, use ElevenLabs end-to-end. For multilingual scenes or when vocal SFX should match a previously-cloned character voice, Fish Audio is the right node for the vocal cues specifically.

Use Fish Audio S2-Pro on Martini

Connect Fish Audio S2-Pro with other AI models on Martini's infinite canvas. No GPU required — start free.

Get Started Free

Related features

Docs

nodes/audio

Try Other Models for This Task

ElevenLabs

ElevenLabs Sound Effects v2

ElevenLabs Sound Effects v2 generates royalty-free sound effects from text prompts — whoosh transitions, impact stingers, ambient room tones, UI feedback sounds, footsteps, door slams, rain, machinery. Each prompt returns 4 variant clips so you can pick the take that fits the frame, then snap it to the video timeline. The model uses the official `eleven_text_to_sound_v2` endpoint, which means concrete prompts ("close metallic door slam in narrow concrete hallway, short reverb") beat vague ones ("door sound") by a wide margin. On Martini, SFX nodes attach directly to a video timeline segment, so a video editor can lay foley over an AI-generated cut without leaving the canvas. The 4-variant grid is the load-bearing UX — generate, listen, pick, snap to frame.

View guide

How to Create AI Sound Effects for Video

Fish Audio

How to Create AI Sound Effects for Video with Fish Audio S2-Pro

Try Fish Audio S2-Pro Free

Step-by-Step Guide

Decide which SFX category each cue belongs to

Pick the character voice for the vocal SFX

Prompt with bracket-only cues

Layer vocal SFX with foley and dialogue on the canvas

Prompt Examples

Vocal SFX prompt for a chase-scene recovery beat — describes timing and intensity inside the bracket. Use the same character voice as that scene's dialogue for diegetic continuity.

[panting after running, heavy chest, slow recovery over 5 seconds]

Reaction SFX for a horror or thriller jump-cut — bracket-only prompt, no surrounding spoken words. Place on the frame the visual reveal lands.

[sharp startled gasp then sudden silence, female voice]

Parameter Tips

Vocal SFX prompts are bracket-only — no spoken words around the cue. The bracket is the entire performance: [sharp gasp], [exhausted breathing 5 seconds], [nervous chuckle].

Pick the voice before the cue. Diegetic vocal SFX inherit the character voice; off-screen reactions should use a different voice so they don't pull focus.

Voice consent matters for vocal SFX too. If you cloned a voice for character dialogue, the same consent applies to vocal SFX generated with that voice.

What to Expect

Use Fish Audio S2-Pro on Martini

Connect Fish Audio S2-Pro with other AI models on Martini's infinite canvas. No GPU required — start free.

Get Started Free

Related features

Docs

nodes/audio

Try Other Models for This Task

ElevenLabs

ElevenLabs Sound Effects v2

View guide

How to Create AI Sound Effects for Video

How to Create AI Sound Effects for Video with Fish Audio S2-Pro

Step-by-Step Guide

Decide which SFX category each cue belongs to

Pick the character voice for the vocal SFX

Prompt with bracket-only cues

Layer vocal SFX with foley and dialogue on the canvas

Prompt Examples

Parameter Tips

What to Expect

Use Fish Audio S2-Pro on Martini

Related features

Docs

Related reading

Try Other Models for This Task

ElevenLabs Sound Effects v2

This website uses cookies

How to Create AI Sound Effects for Video with Fish Audio S2-Pro

Step-by-Step Guide

Decide which SFX category each cue belongs to

Pick the character voice for the vocal SFX

Prompt with bracket-only cues

Layer vocal SFX with foley and dialogue on the canvas

Prompt Examples

Parameter Tips

What to Expect

Use Fish Audio S2-Pro on Martini

Related features

Docs

Related reading

Try Other Models for This Task

ElevenLabs Sound Effects v2