Fish Audio

Fish Audio S2

Fish Audio S2-Pro is Fish Audio's next-generation expressive text-to-speech model for natural voice generation, open-ended emotion tags, multi-speaker dialogue, voice cloning, and 80+ language workflows.

Fish Audio currently recommends s2-pro for new projects. S2-Pro adds natural-language bracket control such as [whispers sweetly] or [laughing nervously], supports multi-speaker dialogue, covers 80+ languages, targets 100ms time-to-first-audio, and ships with an open-source SGLang-based serving stack. The previous s1 model remains available for existing integrations and is still useful when a workflow depends on its parenthesis-based emotion syntax. On a Martini SEO page, Fish Audio is positioned as an expressive voice foundation model to compare against ElevenLabs, Minimax Speech, and other TTS systems; it is not wired into Martini's production generation menu unless a runtime integration is added separately.

Try Fish Audio S2 Free

Fish Audio S2 Variants

Variant	Description
Fish Audio S2-Pro	Recommended current model with bracket-style natural language control, multi-speaker dialogue, 80+ languages, and open-source serving.
Fish Audio S1	Previous 4B-parameter model with parenthesis-based emotional control, kept for existing integrations.

Capabilities

Text-to-Speech

Dialogue

Sound Effects

Voice Cloning

Music Generation

Multilingual

Best For

Expressive voiceovers with natural-language emotion and delivery cues
Multi-speaker dialogue and character audio prototypes
Developers who want open-source serving and fine-tuning options
Voice cloning workflows that need direct reference voices
Multilingual audio experiments across broad language coverage

Strengths

S2-Pro is the recommended current Fish Audio model for new projects
Open-ended bracket control is more flexible than a fixed list of emotion tokens
Multi-speaker dialogue is exclusive to S2-Pro in the Fish Audio TTS API
80+ language support with automatic language detection
Open-source model and serving stack reduce vendor lock-in for teams that can self-host

Limitations

Fish Audio is SEO-only in Martini until a production runtime integration is added
S2-Pro uses a different bracket syntax than S1, so old parenthesis prompts may need rewriting
Self-hosting and fine-tuning require infrastructure and audio engineering effort
Voice quality still depends on clear reference voices, transcripts, and prompt discipline

Tips & Best Practices

Use s2-pro for new Fish Audio projects; reserve s1 for legacy prompts or compatibility testing.

Keep bracket cues local: place [whispering], [pause], [laughing nervously], or similar tags exactly where the delivery should change.

For dialogue, use explicit speaker tags and keep turns short enough to inspect and re-generate quickly.

When cloning voices, use clean reference audio with matching transcript text before optimizing prompt style.

Compare Fish Audio against Eleven v3 when you need expressive dialogue, and against Multilingual v2 when you need long-form stability.

Use Fish Audio S2 on Martini

Connect Fish Audio S2 with video, image, script, and music nodes on Martini's infinite canvas. No GPU required — start free.

Get Started Free

Frequently Asked Questions

What is the latest Fish Audio TTS model?

Fish Audio currently recommends s2-pro for new projects. It adds natural-language bracket control, multi-speaker dialogue, 80+ languages, low time-to-first-audio, and an open-source serving stack. S1 remains available for existing integrations.

Is Fish Audio S2 available as a Martini generation model?

This page is an SEO and comparison page. Fish Audio is not added to Martini's production audio generation menu by this change; that would require a separate runtime provider integration, billing, UI controls, and webhook handling.

How is Fish Audio different from ElevenLabs?

Fish Audio S2 emphasizes open-source serving, flexible bracket-style control, and self-hosting options. ElevenLabs emphasizes a mature hosted voice ecosystem, Eleven v3 expressiveness, Multilingual v2 stability, and text-to-sound effects.

Related Features

How-To Guides

Related Audio Models

ElevenLabs

ElevenLabs is a leading AI voice family for expressive text-to-speech, multilingual narration, dialogue, voice design, and text-to-sound effects. On Martini, use Eleven v3, Multilingual v2, Turbo v2.5, Dialogue v3, and Sound Effects v2 in node-based audio workflows.

View details

Back to All Audio Models

Fish Audio

Fish Audio S2

Try Fish Audio S2 Free

Fish Audio S2 Variants

Variant	Description
Fish Audio S2-Pro	Recommended current model with bracket-style natural language control, multi-speaker dialogue, 80+ languages, and open-source serving.
Fish Audio S1	Previous 4B-parameter model with parenthesis-based emotional control, kept for existing integrations.

Capabilities

Text-to-Speech

Dialogue

Sound Effects

Voice Cloning

Music Generation

Multilingual

Best For

Expressive voiceovers with natural-language emotion and delivery cues
Multi-speaker dialogue and character audio prototypes
Developers who want open-source serving and fine-tuning options
Voice cloning workflows that need direct reference voices
Multilingual audio experiments across broad language coverage

Strengths

S2-Pro is the recommended current Fish Audio model for new projects
Open-ended bracket control is more flexible than a fixed list of emotion tokens
Multi-speaker dialogue is exclusive to S2-Pro in the Fish Audio TTS API
80+ language support with automatic language detection
Open-source model and serving stack reduce vendor lock-in for teams that can self-host

Limitations

Fish Audio is SEO-only in Martini until a production runtime integration is added
S2-Pro uses a different bracket syntax than S1, so old parenthesis prompts may need rewriting
Self-hosting and fine-tuning require infrastructure and audio engineering effort
Voice quality still depends on clear reference voices, transcripts, and prompt discipline

Tips & Best Practices

Use s2-pro for new Fish Audio projects; reserve s1 for legacy prompts or compatibility testing.

Keep bracket cues local: place [whispering], [pause], [laughing nervously], or similar tags exactly where the delivery should change.

For dialogue, use explicit speaker tags and keep turns short enough to inspect and re-generate quickly.

When cloning voices, use clean reference audio with matching transcript text before optimizing prompt style.

Compare Fish Audio against Eleven v3 when you need expressive dialogue, and against Multilingual v2 when you need long-form stability.

Use Fish Audio S2 on Martini

Connect Fish Audio S2 with video, image, script, and music nodes on Martini's infinite canvas. No GPU required — start free.

Get Started Free

Frequently Asked Questions

What is the latest Fish Audio TTS model?

Is Fish Audio S2 available as a Martini generation model?

How is Fish Audio different from ElevenLabs?

Related Features

How-To Guides

Related Audio Models

ElevenLabs

View details

Back to All Audio Models

Fish Audio S2

Fish Audio S2 Variants

Capabilities

Best For

Strengths

Limitations

Tips & Best Practices

Use Fish Audio S2 on Martini

Frequently Asked Questions

Related Features

How-To Guides

Related Reading

Related Audio Models

ElevenLabs

This website uses cookies

Fish Audio S2

Fish Audio S2 Variants

Capabilities

Best For

Strengths

Limitations

Tips & Best Practices

Use Fish Audio S2 on Martini

Frequently Asked Questions

Related Features

How-To Guides

Related Reading

Related Audio Models

ElevenLabs