2 Models Available
A podcaster or course creator clones their own voice from a 30-second sample, then generates new narration without re-recording. On Martini's canvas, drop a clean reference clip into an audio node, route it into ElevenLabs Voice Cloning, Fish Audio S2-Pro voice cloning, or Minimax Voice Design, and chain the cloned voice into downstream script-to-speech, dubbing, or lip-sync nodes. Use this for founder-voice training narration, course modules, or localizing existing video. Only clone voices you own or have permission to use. Pick a model below to walk through the cloning workflow.
ElevenLabs
ElevenLabs offers two voice cloning tiers that map directly to how much reference audio you have. Instant Voice Cloning trains on a 10-second sample and is ready in seconds — fine for internal narration drafts, prototype dubs, and personal video voiceover. Professional Voice Cloning needs 30+ minutes of clean studio audio, but the resulting voice can carry an entire course or audiobook without drifting. On Martini, both modes feed Eleven v3 (or Multilingual v2 for non-English work), so once your voice is registered you can generate new narration in 70+ languages with inline emotion tags. Critical: only clone voices you own or have explicit written permission to clone. ElevenLabs requires voice verification for your own voice, and consent matters whether the platform enforces it or not.
Fish Audio
Fish Audio S2-Pro is the open-source alternative to ElevenLabs cloning, with two real differentiators: natural-language bracket control inside the prompt (`[whispering]`, `[laughing nervously]`, `[pause]`) and an open serving stack you can self-host. Voice cloning needs a clean reference audio sample plus a matching transcript — Fish Audio uses the transcript text to disambiguate phonemes, so a misaligned transcript hurts cloning quality more than it does on ElevenLabs. Coverage is 80+ languages with automatic detection. Critical: only clone voices you own or have explicit written permission to clone. Fish Audio is open-source, which means consent enforcement is on you, not the platform — make the rights clearance explicit before you upload reference audio.