ElevenLabs
ElevenLabs Eleven v3 produces the most natural-sounding English voiceovers of any TTS model on Martini. It offers 21 distinct voices — from warm narrator tones (Rachel, Sarah) to authoritative male voices (Roger, Brian, Daniel) — each with realistic emotional inflection that adapts to your script's content. At 10 credits per ~100 characters, it costs more than Minimax Speech (which excels at Chinese), but the English voice quality and emotional expressiveness are unmatched. ElevenLabs also offers a faster Turbo v2.5 variant (6 credits) and a Multilingual v2 for non-English languages.
ElevenLabs offers 21 voices, each with a distinct personality. For product narration and brand videos, try Rachel (warm, professional female) or Brian (confident, authoritative male). For tutorials and explainers, try Sarah (clear, friendly) or Daniel (calm, instructional). For storytelling and podcasts, try Aria (expressive, versatile) or Callum (engaging male narrator). Generate a short test sentence with 2-3 voices before committing to a full script — voice-content fit has more impact on quality than any other factor.
The most common mistake in TTS scripts is writing formal text that sounds stiff when spoken. Write conversationally: use contractions ("we'll" not "we will"), shorter sentences, and natural transitions ("Now, let's look at..." not "The following section demonstrates..."). ElevenLabs v3 handles emotional nuance — if you want excitement, write excitedly. If you want calm authority, write with measured, declarative sentences. The model infers tone from the writing style.
Punctuation is your primary pacing tool. Periods create natural pauses between ideas. Commas create brief pauses within sentences. Ellipses (...) create dramatic or contemplative pauses. Em dashes (—) create sharp transitions. Line breaks between paragraphs add slightly longer pauses than periods. For a 30-second ad, aim for 80-90 words. For tutorial narration, slow the pace to 120-130 words per minute (about 60 words per 30 seconds) with more punctuation breaks.
The real power of ElevenLabs on Martini is the canvas pipeline: connect the Audio output directly to a Lipsync node (OmniHuman or Kling LipSync) to create a talking head video, or connect it to a Video node to pair narration with AI-generated visuals. This enables complete ad production — script → voiceover → video — in a single workflow without leaving Martini.
Brand narration — the ellipsis before "designed to last a lifetime" creates a contemplative pause that emphasizes the value proposition. Short, declarative sentences give the voice a confident, premium feel. Try this with Rachel or Brian for different brand personalities.
Welcome to our new collection. Each piece is carefully crafted from sustainable materials... designed to last a lifetime. Discover what makes us different.
Tutorial narration — the numbered structure ("First... Then... Finally") gives the TTS natural pacing markers. The exclamation on "you're all set!" signals ElevenLabs to add upbeat energy at the end. Try this with Sarah or Daniel for clear, instructional delivery.
In this tutorial, we'll walk through three simple steps to set up your account. First, click the sign-up button on the homepage. Then, enter your email and choose a password. Finally, verify your email — and you're all set!
ElevenLabs v3 costs 10 credits per ~100 characters. For budget-conscious long-form narration, use the Turbo v2.5 variant (6 credits) — slightly less expressive but 40% cheaper.
The 21 voices are: Rachel, Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill. Always test 2-3 before committing.
For non-English voiceovers, use ElevenLabs TTS Multilingual v2 instead — it supports 29+ languages. For Chinese specifically, Minimax Speech 2.5 HD produces more natural Mandarin.
Write scripts at 120-150 words per minute for comfortable listening speed. A 60-second ad should be 120-150 words, not 200+.
ElevenLabs Eleven v3 produces the most human-sounding English TTS on Martini — emotional inflection, natural breathing patterns, and expressive delivery that sounds like a professional voice actor rather than AI. The trade-off vs. Minimax Speech: ElevenLabs is the clear winner for English, but Minimax Speech 2.5 HD produces more natural Chinese (especially Mandarin tonal accuracy). For multilingual projects, use ElevenLabs Multilingual v2 for Western languages and Minimax for Chinese/Asian languages.
Connect ElevenLabs TTS Eleven v3 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started Free