Kling
Kling O3 (Video 3.0 Omni) is Kling's unified multimodal flagship — a four-variant family that fuses text, image, video and audio inputs into one model. Capabilities span text-to-video, image-to-video with end-frame control, character reference generation, and prompt-guided video-to-video editing (Omni Edit). Shares the Kling 3.0 backbone with native 4K up to 60fps and multi-shot sequencing.
The Kling O3 family centers on reference-heavy, multimodal workflows. The base O3 model handles text-to-video and image-to-video with tail image control, letting you fix both the first and last frames for precise motion planning. O3 Reference adds character reference images for consistent appearances across clips and supports voice control over individual elements. O3 Video Edit (Omni Edit) takes existing footage and swaps characters, environments or specific elements while preserving the original motion and timing. O3 Video Ref combines video-to-video editing with reference images for the highest level of control. All variants share Kling 3.0's native 4K (3840×2160) up to 60fps with 16-bit HDR, six-shot multi-cut sequences up to 15 seconds, and synchronized native audio in English, Chinese, Japanese, Korean and Spanish, with Standard and Pro quality tiers throughout.
| Variant | Description |
|---|---|
| Kling O3 | Text-to-video and image-to-video with tail image (end-frame) control. |
| Kling O3 Reference | Adds character reference images for consistent appearance across generations. |
| Kling O3 Video Edit | Video-to-video editing that restyles footage while preserving motion. |
| Kling O3 Video Reference | Video-to-video editing with reference images for guided style and character control. |
Higher quality tiers generally offer better detail and consistency, but require more credits and generation time.
Connect Kling O3 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started Free