ByteDance
OmniHuman is ByteDance's audio-driven portrait animation model that turns a still photo and an audio track into a talking-head video with synchronized lip movements, facial expressions, and natural head gestures.
OmniHuman specializes in animating portrait images driven by audio input. Version 1.0 delivers solid lip sync and head motion from a front-facing photo paired with speech audio. Version 1.5 improves lip-sync accuracy, handles a wider range of portrait styles including illustrated and stylized faces, and produces more natural head gestures. Both versions integrate well with text-to-speech models in a Martini workflow for end-to-end talking-head production.
| Variant | Description |
|---|---|
| OmniHuman v1 | Audio-driven portrait animation with lip sync, expressions, and head gestures. |
| OmniHuman v1.5 | Improved lip-sync accuracy with better handling of diverse portrait styles. |
Connect OmniHuman with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started Free