2 Models Available
An animation team scripts a 4-character scene — natural turn-taking, distinct voices, emotion tags — without booking voice actors. On Martini's canvas, set up a script node with speaker turns, route it through ElevenLabs Eleven v3 Dialogue (the dedicated multi-speaker endpoint), Fish Audio S2-Pro multi-speaker, or Minimax Speech, and use inline tags like [whispers], [laughs], [excited] for emotional delivery. Output is dialogue ready for a multi-character animated short, audio drama, or interactive prototype. Pick a model below to walk through the multi-speaker production flow.
ElevenLabs
ElevenLabs Dialogue v3 is the multi-speaker endpoint of Eleven v3 — built for natural turn-taking between distinct character voices, with inline emotion tags ([whispers], [laughs], [excited], [sighs]) directing per-line delivery. Where standard Eleven v3 is one voice reading a paragraph, Dialogue v3 lets you assign different voices to different speakers and have them read a scripted scene with natural pacing, breath, and emotional response. On Martini, you build dialogue scenes as Audio nodes on the canvas — one node per character if you want fine-grained control, or a single Dialogue v3 node for the full multi-speaker generation. The 21-voice library covers the full range of character archetypes, and the cloned voice support lets you bring in custom characters when the prebuilt voices don't match.
Fish Audio
Fish Audio S2-Pro's multi-speaker dialogue mode is exclusive to S2-Pro within the Fish Audio family — older S1 doesn't support it. Use [Speaker:Name] syntax to assign different voices to different speakers, with natural-language bracket cues like [whispering], [laughing nervously], or [pause two seconds] directing per-line delivery. Coverage is 80+ languages with automatic detection on the same voice IDs, which makes Fish Audio the strongest pick for multilingual dialogue scenes (an audio drama shipping in English + Mandarin + Japanese, for example) or scenes that need expressive ranges beyond ElevenLabs' fixed inline tag set. Open-source serving means you can self-host the dialogue generation outside Martini for sensitive or pre-release content.