Fish Audio
Fish Audio S2-Pro is the multilingual, open-source choice for the host voice tag of a podcast intro — especially valuable for shows with international audiences or co-host duets. The S2-Pro model handles 80+ languages with automatic detection, takes natural-language bracket cues like [confidently] or [warmly] for delivery direction, and supports multi-speaker dialogue inside a single Audio node. On Martini, you build the same three-element intro architecture — music bed, voice tag, SFX — but use Fish Audio for the voice element when the show needs language flexibility or self-hostable infrastructure. Voice consent: if you're cloning a host's voice for the tag rather than picking a prebuilt voice, get explicit written permission first; Fish Audio is open-source, so consent enforcement sits with you.
Fish Audio S2-Pro is the right pick over ElevenLabs in three specific cases: (1) the show ships in multiple languages — Fish Audio handles 80+ languages on the same voice ID; (2) you need to clone a co-host's voice with a non-English reference sample where Fish Audio's phoneme alignment is more flexible; (3) you self-host audio infrastructure outside Martini. For an English-only show with a single host voice, ElevenLabs Eleven v3 is more polished and the safer default. Make this decision before you build the canvas — the bed and SFX architecture stays the same; only the voice node changes.
Fish Audio S2-Pro takes natural-language bracket tags inside the script: [confidently], [warmly], [pause two seconds], [conspiratorial whisper]. Use these to direct the voice tag delivery: a daily news show wants [confidently] before the show name; an interview show wants [warmly] before the host introduction; a true-crime show wants [conspiratorial whisper] for the cold open. Keep the bracket cues local — placed at the exact word where delivery should change. Compare against ElevenLabs' fixed inline tag set: Fish Audio's open-ended brackets give wider expressive range at the cost of slightly less predictable interpretation, so test 2-3 deliveries before locking in the intro.
Add three nodes: (1) Music bed Audio node — generate or upload 12-30 seconds of theme music. (2) Fish Audio S2-Pro Audio node — host voice tag with bracket cues, 5-8 seconds of show identity. (3) Sound Effects v2 (or another SFX provider) Audio node — single transition at the cut between intro and Episode 1. Align all three to the same canvas timeline. The voice tag plays over the music bed, the SFX hits as the voice ends, and the bed continues underneath the first 3-5 seconds of Episode 1. Standard total runtime: 12-30 seconds. Note: Fish Audio in Martini is currently SEO-positioned — production runtime depends on workspace configuration; if Fish Audio isn't wired up in your workspace, fall back to ElevenLabs Eleven v3 for the voice node.
Fish Audio's 80+ language support shines when the same podcast ships in multiple languages. Build the intro canvas once with English script, then duplicate the canvas and swap the voice tag script to Mandarin, Japanese, or Spanish — Fish Audio uses the same cloned (or prebuilt) voice across all language editions, so the show's sonic identity stays consistent while the language changes. The music bed and SFX stay identical across editions. This is the multilingual production workflow ElevenLabs Multilingual v2 also supports, but Fish Audio's wider language coverage and bracket-cue flexibility make it the stronger pick for shows with non-Western language editions.
Interview-show intro with Fish Audio S2-Pro bracket cues. The [warmly] sets opening tone, [pause] creates the standard radio handoff, [confidently] lands the host introduction. Works equally well in English, Mandarin, or any of Fish Audio's 80+ supported languages.
[warmly] You're listening to The Founder Diaries. [pause] Stories from the people building the future, in their own words. [confidently] I'm your host, Mei.
Two-host podcast intro using Fish Audio's S2-Pro multi-speaker tags. Each speaker can be a different cloned or prebuilt voice, all in one Audio node. The [excited] cue inside HostB's turn directs delivery only on that line.
[Speaker:HostA] Hey everyone, welcome back to Two Founders. [Speaker:HostB] Yep — and today, [excited] we've got a guest you've been waiting for.
Fish Audio S2-Pro is strongest for multilingual shows or shows where the host voice ships in non-English languages. For English-only shows, ElevenLabs Eleven v3 is more polished and safer.
Multi-speaker tags ([Speaker:HostA], [Speaker:HostB]) let you build a two-host intro inside a single Audio node — useful for podcasts where two co-hosts trade lines in the cold open.
Open-ended bracket cues [warmly], [confidently], [conspiratorial whisper] give wider expressive range than ElevenLabs' fixed tags, but interpretation can be slightly less predictable. Test 2-3 deliveries before locking in.
Self-hosted serving is an option if you don't want podcast intros leaving your infrastructure — useful for sensitive or pre-release content.
Save the canvas as a template and duplicate per language edition. Fish Audio uses the same voice ID across languages, so the show's sonic identity stays consistent while the spoken script changes.
Fish Audio S2-Pro is the multilingual, open-source pick for podcast intro voice tags — especially when the show ships in multiple languages, has co-hosts, or runs on self-hosted infrastructure. Trade-off vs. ElevenLabs Eleven v3: less polished English emotional delivery, slightly less predictable interpretation of bracket cues, and the consent burden sits with you. For pure English-language podcasts where the host voice tag is the make-or-break moment, ElevenLabs is the safer pick. For multilingual or two-host shows where flexibility matters more than maximum English polish, Fish Audio S2-Pro is worth the trade. The full intro pipeline — voice + music + SFX — runs entirely on the Martini canvas regardless of which voice model you pick, so producers can swap models per episode without rebuilding the canvas.
Connect Fish Audio S2-Pro with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started Free