ElevenLabs
A podcast intro is three audio elements layered on a 12-30 second timeline: a music bed, a host voice tag, and an SFX transition (whoosh, riser, or stinger). On Martini, ElevenLabs Eleven v3 handles the host voice tag and Sound Effects v2 handles the transition — both run inside Audio nodes on the same canvas, so you can swap voices, re-prompt SFX, and re-time the bed without leaving the canvas. Eleven v3 produces the broadcast-quality narrator delivery podcast listeners expect; the 21-voice library covers warm narrator (Rachel, Sarah), authoritative male (Brian, Daniel), and energetic show-host (Aria, Charlie). Voice consent: if you're cloning a co-host's voice for the tag rather than picking from the library, get explicit written permission first — the same rules as any other voice clone apply.
A podcast intro voice tag is 5-8 seconds of show identity. Pick the voice before you write the script: a daily news show wants Brian or Daniel (authoritative, paced); an interview show wants Sarah or Charlie (warm, conversational); a true-crime show wants Roger or Aria (gravelly or expressive). Generate the same 8-second test sentence with 3 voices on the canvas, listen to all three back to back, then commit. Voice-to-show fit drives more listener perception of professionalism than any other production decision; the model output quality is the same across all 21 voices, so the choice is purely tonal match.
A 12-second podcast intro typically holds 18-25 spoken words. That's short enough that every word matters. Write conversationally: "Welcome to The Builder's Hour — your weekly look at the people shipping the future. I'm your host, [name]." Avoid stiff formal text ("This podcast covers..."). Use ellipses to set up the cadence — "Welcome to The Builder's Hour... your weekly look..." reads with natural beats that land before the music swells. ElevenLabs v3 inline tags help: [excited] before the show name bumps energy at the brand moment; [pause] before "your host" creates the standard radio handoff beat.
Build the intro as three Audio nodes on the canvas: (1) Music bed — generate or upload 12-30 seconds of theme music at low volume. (2) ElevenLabs Eleven v3 — host voice tag, 5-8 seconds of show identity, plays over the bed. (3) Sound Effects v2 — a single transition (whoosh, riser, stinger) at the cut between intro and Episode 1. The Martini canvas lets you align all three to the same timeline. Standard arrangement: music starts, voice enters at +1s riding over the bed, SFX hits as voice ends, music continues underneath the first 3-5 seconds of Episode 1 then ducks out. The 12-30 second total is the industry sweet spot — shorter feels rushed; longer makes listeners skip.
A podcast intro should be identical across every episode of the show — same voice tag, same music bed, same SFX, same timing. Save the intro canvas as a Martini template once it's tuned, then duplicate the template for each new episode. Update only the host's spoken episode-specific tag (e.g., "And today, we're talking to..."), keep the rest locked. The deterministic output of ElevenLabs Eleven v3 with a fixed voice ID means re-running the canvas later produces an audio track that matches the original to the millisecond — critical for show consistency listeners notice subconsciously.
Standard interview-show intro for ElevenLabs Eleven v3 with Brian or Sarah voice. The [excited] tag bumps energy on the show name, [pause] creates the standard radio handoff before the host introduction. Total runtime: ~10 seconds.
Welcome to The Builder's Hour. [excited] Your weekly look at the people shipping the future. [pause] I'm your host, Sam Patel.
News-show cold open style — situational opener that grounds the listener before the show name reveal. Pair with Daniel or Roger for authoritative delivery; Aria for sharper energy.
It's Tuesday morning. Coffee's hot, the news is heavy, and I'm here to make sense of it. [confidently] This is The Daily Brief.
Total intro length: 12-30 seconds is the industry sweet spot. Voice tag inside that should be 5-8 seconds — enough to land the brand, short enough that the music bed carries the rest.
Music bed volume: keep at -12dB to -18dB under the voice during the tag, then return to -6dB after voice ends. Martini canvas timeline lets you set this without an external mixer.
Voice tag scripts at 25-30 words per 10 seconds of delivery. Anything denser sounds rushed; anything sparser feels like the script ran out.
For multilingual podcasts, render the same intro structure with ElevenLabs Multilingual v2 — same script, swap the language, keep music bed identical so listeners recognize the show across language editions.
Save the canvas as a template once tuned. Subsequent episodes only change the host's episode-specific tag line; intro proper stays locked for show consistency.
ElevenLabs Eleven v3 produces the broadcast-quality narrator voice tag that anchors a podcast intro. The 21-voice library covers every show archetype, the 70+ language support handles localized editions, and the inline tags ([excited], [pause], [confidently]) give the host voice the energy curve listeners expect from a polished show. Trade-off vs. Fish Audio S2-Pro: ElevenLabs is more polished and confident in English emotional delivery; Fish Audio offers wider language coverage and natural-language bracket cues. For an English-language podcast where the host voice tag is the make-or-break moment, ElevenLabs is the safer pick. The full intro pipeline — voice + music + SFX — runs entirely on the Martini canvas, so a podcast producer can iterate on the intro without leaving the workspace.
Connect ElevenLabs Eleven v3 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started Free