ElevenLabs AI Audio - ElevenLabs

ElevenLabs

ElevenLabs is a leading AI voice family for expressive text-to-speech, multilingual narration, dialogue, voice design, and text-to-sound effects. On Martini, use Eleven v3, Multilingual v2, Turbo v2.5, Dialogue v3, and Sound Effects v2 in node-based audio workflows.

Eleven v3 is ElevenLabs' latest expressive speech synthesis model, built for emotional delivery, inline audio tags, and natural multi-speaker dialogue across 70+ languages. Multilingual v2 remains the stable choice for long-form narration, corporate voiceover, e-learning, and projects where consistency matters more than maximum expressiveness. Flash v2.5 is ElevenLabs' current low-latency recommendation, but Martini keeps Turbo v2.5 available for existing workflows that already depend on it; ElevenLabs says Turbo v2.5 is functionally equivalent to Flash v2.5 except Flash is usually lower latency. Sound Effects v2 uses the official eleven_text_to_sound_v2 model for whooshes, ambience, UI sounds, impacts, seamless loops, and production audio details. On Martini, these audio nodes can be chained with video, image, and script nodes, so a creator can draft a scene, generate narration, add SFX, and keep the full production graph together.

Try ElevenLabs Free

ElevenLabs Variants

Variant	Description
ElevenLabs TTS Eleven v3	Expressive TTS via provider model eleven_v3, with audio tags, emotional delivery, 70+ languages, and a 5,000 character request limit.
ElevenLabs Dialogue Eleven v3	Multi-speaker dialogue mode for natural conversations, character discussions, dramatic reads, and scripted exchanges.
ElevenLabs TTS Multilingual v2	Stable, high-quality multilingual TTS for narration, e-learning, corporate video, and long-form audio in 29 languages.
ElevenLabs TTS Turbo v2.5	Low-latency multilingual TTS kept for existing workflows; Flash v2.5 is ElevenLabs' newer low-latency recommendation.
ElevenLabs Sound Effects v2	Text-to-sound generation via eleven_text_to_sound_v2 for ambience, impacts, transitions, UI feedback, loops, and cinematic layers.

Capabilities

Text-to-Speech

Dialogue

Sound Effects

Voice Cloning

Music Generation

Multilingual

Best For

Expressive character voiceover with emotional direction
Multi-speaker dialogue, scripted scenes, and audio drama
Long-form narration where voice consistency matters
Fast multilingual voice generation for existing Turbo workflows
Text-prompted sound effects for videos, games, and social clips

Strengths

Eleven v3 supports inline tags for emotion, delivery direction, and non-verbal reactions like laughs or sighs
Natural multi-speaker dialogue is available through Eleven v3 dialogue workflows
Multilingual v2 is strong for stable long-form narration and number-heavy content
Large voice ecosystem with voice cloning, voice design, and many premade voices
Sound Effects v2 covers production audio needs beyond spoken narration

Limitations

Eleven v3 is more variable and higher latency than v2.5 or Flash models, so it is not the best fit for real-time agents
Flash v2.5 is ElevenLabs' current recommendation over Turbo v2.5 for new low-latency use cases
Long-form content may need chunking because Eleven v3 has a lower per-request character limit than Multilingual v2 or Flash v2.5
Sound effects quality depends heavily on concrete prompts for timing, texture, intensity, and environment

Tips & Best Practices

Use Eleven v3 when acting quality matters: add short tags such as [whispers], [laughs], [sighs], or [excited] near the words they should affect.

Use Multilingual v2 for audiobooks, brand narration, training videos, and anything that needs consistent delivery across longer text.

Keep Turbo v2.5 for compatibility with existing Martini workflows, but evaluate Flash v2.5 before building a new real-time voice product outside Martini.

For dialogue, write speaker turns clearly and keep each line direct; emotional tags work best when they are local and sparse.

For SFX, describe source, action, room, distance, and intensity: "close metallic door slam in a narrow concrete hallway, short reverb" beats "door sound".

Use ElevenLabs on Martini

Connect ElevenLabs with video, image, script, and music nodes on Martini's infinite canvas. No GPU required — start free.

Get Started Free

Frequently Asked Questions

Which ElevenLabs model should I use on Martini?

Use Eleven v3 when expressive performance and dialogue matter, Multilingual v2 for stable long-form narration, Turbo v2.5 when you need compatibility with existing low-latency Martini workflows, and Sound Effects v2 for non-speech production audio.

Does Eleven v3 support emotion tags and multiple speakers?

Yes. Eleven v3 supports inline audio tags for emotion, delivery, and non-verbal reactions, and ElevenLabs exposes dialogue endpoints for natural multi-speaker audio.

Is Turbo v2.5 still the latest low-latency ElevenLabs model?

No. ElevenLabs now recommends Flash v2.5 over Turbo v2.5 for new low-latency use cases because Flash is usually lower latency. Martini keeps Turbo v2.5 available for workflows that already rely on it.

Related Features

How-To Guides

Related Audio Models

Fish Audio

Fish Audio S2

Fish Audio S2-Pro is Fish Audio's next-generation expressive text-to-speech model for natural voice generation, open-ended emotion tags, multi-speaker dialogue, voice cloning, and 80+ language workflows.

View details

Back to All Audio Models

ElevenLabs

Try ElevenLabs Free

ElevenLabs Variants

Variant	Description
ElevenLabs TTS Eleven v3	Expressive TTS via provider model eleven_v3, with audio tags, emotional delivery, 70+ languages, and a 5,000 character request limit.
ElevenLabs Dialogue Eleven v3	Multi-speaker dialogue mode for natural conversations, character discussions, dramatic reads, and scripted exchanges.
ElevenLabs TTS Multilingual v2	Stable, high-quality multilingual TTS for narration, e-learning, corporate video, and long-form audio in 29 languages.
ElevenLabs TTS Turbo v2.5	Low-latency multilingual TTS kept for existing workflows; Flash v2.5 is ElevenLabs' newer low-latency recommendation.
ElevenLabs Sound Effects v2	Text-to-sound generation via eleven_text_to_sound_v2 for ambience, impacts, transitions, UI feedback, loops, and cinematic layers.

Capabilities

Text-to-Speech

Dialogue

Sound Effects

Voice Cloning

Music Generation

Multilingual

Best For

Expressive character voiceover with emotional direction
Multi-speaker dialogue, scripted scenes, and audio drama
Long-form narration where voice consistency matters
Fast multilingual voice generation for existing Turbo workflows
Text-prompted sound effects for videos, games, and social clips

Strengths

Eleven v3 supports inline tags for emotion, delivery direction, and non-verbal reactions like laughs or sighs
Natural multi-speaker dialogue is available through Eleven v3 dialogue workflows
Multilingual v2 is strong for stable long-form narration and number-heavy content
Large voice ecosystem with voice cloning, voice design, and many premade voices
Sound Effects v2 covers production audio needs beyond spoken narration

Limitations

Eleven v3 is more variable and higher latency than v2.5 or Flash models, so it is not the best fit for real-time agents
Flash v2.5 is ElevenLabs' current recommendation over Turbo v2.5 for new low-latency use cases
Long-form content may need chunking because Eleven v3 has a lower per-request character limit than Multilingual v2 or Flash v2.5
Sound effects quality depends heavily on concrete prompts for timing, texture, intensity, and environment

Tips & Best Practices

Use Eleven v3 when acting quality matters: add short tags such as [whispers], [laughs], [sighs], or [excited] near the words they should affect.

Use Multilingual v2 for audiobooks, brand narration, training videos, and anything that needs consistent delivery across longer text.

Keep Turbo v2.5 for compatibility with existing Martini workflows, but evaluate Flash v2.5 before building a new real-time voice product outside Martini.

For dialogue, write speaker turns clearly and keep each line direct; emotional tags work best when they are local and sparse.

For SFX, describe source, action, room, distance, and intensity: "close metallic door slam in a narrow concrete hallway, short reverb" beats "door sound".

Use ElevenLabs on Martini

Connect ElevenLabs with video, image, script, and music nodes on Martini's infinite canvas. No GPU required — start free.

Get Started Free

Frequently Asked Questions

Which ElevenLabs model should I use on Martini?

Does Eleven v3 support emotion tags and multiple speakers?

Yes. Eleven v3 supports inline audio tags for emotion, delivery, and non-verbal reactions, and ElevenLabs exposes dialogue endpoints for natural multi-speaker audio.

Is Turbo v2.5 still the latest low-latency ElevenLabs model?

Related Features

How-To Guides

Related Audio Models

Fish Audio

Fish Audio S2

View details

Back to All Audio Models

ElevenLabs

ElevenLabs Variants

Capabilities

Best For

Strengths

Limitations

Tips & Best Practices

Use ElevenLabs on Martini

Frequently Asked Questions

Related Features

How-To Guides

Related Reading

Related Audio Models

Fish Audio S2

This website uses cookies

ElevenLabs

ElevenLabs Variants

Capabilities

Best For

Strengths

Limitations

Tips & Best Practices

Use ElevenLabs on Martini

Frequently Asked Questions

Related Features

How-To Guides

Related Reading

Related Audio Models

Fish Audio S2