2 Models Available
A podcast host commissions a 12-second branded intro — voice tag plus 6-second music bed plus whoosh transition — entirely on the canvas without hiring an audio producer. On Martini, drop a script into an ElevenLabs Eleven v3 voice node, generate a matching theme via Suno V5 or Minimax Music, then chain Sound Effects v2 for transitions and route everything into the audio mixer. Output is a weekly show intro and outro with TTS host name, theme music in the right genre, and SFX transitions. Pick a model below to walk through the show-bumper workflow.
ElevenLabs
A podcast intro is three audio elements layered on a 12-30 second timeline: a music bed, a host voice tag, and an SFX transition (whoosh, riser, or stinger). On Martini, ElevenLabs Eleven v3 handles the host voice tag and Sound Effects v2 handles the transition — both run inside Audio nodes on the same canvas, so you can swap voices, re-prompt SFX, and re-time the bed without leaving the canvas. Eleven v3 produces the broadcast-quality narrator delivery podcast listeners expect; the 21-voice library covers warm narrator (Rachel, Sarah), authoritative male (Brian, Daniel), and energetic show-host (Aria, Charlie). Voice consent: if you're cloning a co-host's voice for the tag rather than picking from the library, get explicit written permission first — the same rules as any other voice clone apply.
Fish Audio
Fish Audio S2-Pro is the multilingual, open-source choice for the host voice tag of a podcast intro — especially valuable for shows with international audiences or co-host duets. The S2-Pro model handles 80+ languages with automatic detection, takes natural-language bracket cues like [confidently] or [warmly] for delivery direction, and supports multi-speaker dialogue inside a single Audio node. On Martini, you build the same three-element intro architecture — music bed, voice tag, SFX — but use Fish Audio for the voice element when the show needs language flexibility or self-hostable infrastructure. Voice consent: if you're cloning a host's voice for the tag rather than picking a prebuilt voice, get explicit written permission first; Fish Audio is open-source, so consent enforcement sits with you.