2 Models Available

How to Sync Lips to Audio With AI

A marketer takes a brand spokesperson portrait + ElevenLabs-generated script and ships a 30-second talking-head ad with no on-camera talent. On Martini's canvas, route the portrait into a lip-sync tool node, send the audio track from ElevenLabs Eleven v3 alongside, and pick Kling Avatar (tight talking head), OmniHuman (presenter with gesture and torso), or Kling O3 Video Edit for restyle. Most lip-sync models cap at 30-60 seconds per call, so chunk longer scripts. Pick a model below to walk through the UGC explainer or dub workflow.

Try Free

Choose a Model to Get Started

Kling

Kling AI Avatar

Kling AI Avatar is the focused-face lipsync model — it takes a portrait + audio track and produces a tight talking-head video where the mouth, jaw, and lower face animate naturally to the audio waveform. The framing stays head-and-shoulders; for full-body presenter video with gesture and torso movement, use OmniHuman instead. Kling AI Avatar runs as an audio-driven node with no text prompt and no configurable parameters — quality is entirely determined by the portrait and audio. Most lipsync calls cap at 30-60 seconds per generation; chunk longer scripts into multiple calls and concat downstream. The companion `tools/lip-sync` page covers routing details; this how-to focuses on the Kling-Avatar-paired pipeline specifically.

4 stepsView guide

ByteDance

OmniHuman 1.5

OmniHuman 1.5 is the full-upper-body lipsync model — it animates not just the face but the shoulders, arms, hands, and torso in response to the audio, producing presenter-style talking-head videos that look like recorded video rather than a still portrait with moving lips. The architecture is portrait + audio → synced video with natural micro-expressions, blink timing, head sway, and gesture. Where Kling AI Avatar gives you tight close-up framing, OmniHuman gives you a presenter who can read a script while gesturing naturally — the right pick for executive presentations, keynote-style marketing, courses with talent-on-screen, or UGC ads where presence matters. Output runs at 720p in 1:1, 16:9, or 9:16 aspect. The companion `tools/lip-sync` page covers tool routing; this how-to focuses on the OmniHuman-paired pipeline specifically.

4 stepsView guide

More How-To Guides

This website uses cookies

We use cookies to keep Martini secure, remember your preferences, and, if you allow it, measure product performance. Read more

Strictly necessary

Required for authentication, security, payments, and core product flows.

Functionality

Remembers product preferences such as theme, language, and your most recent workspace.

Performance

Helps us understand product usage and site performance with PostHog, Vercel Analytics, Speed Insights, and Ahrefs.

Targeting

Allows marketing and advertising tags we may run through Google Tag Manager.

How to Sync Lips to Audio With AI

Choose a Model to Get Started

Kling AI Avatar

OmniHuman 1.5

More How-To Guides

How to Upscale Images to 4K with AI

How to Remove Image Backgrounds with AI

How to Upscale AI Video to 4K

How to Remove Backgrounds From Images

This website uses cookies

How to Sync Lips to Audio With AI

Choose a Model to Get Started

Kling AI Avatar

OmniHuman 1.5

More How-To Guides

How to Upscale Images to 4K with AI

How to Remove Image Backgrounds with AI

How to Upscale AI Video to 4K

How to Remove Backgrounds From Images