Kling
Kling 3.0 is the best video model for ads featuring people. It generates the most natural human motion, facial expressions, and lip movement of any model on Martini. With Standard and Pro quality tiers, it scales from quick storyboarding to final ad-quality output. If your video ad shows a person — drinking coffee, unboxing a product, giving a testimonial — Kling 3.0 Pro should be your first choice.
Kling 3.0 offers two quality tiers. Standard is faster and cheaper — use it for storyboarding, testing prompt ideas, and iterating on compositions. Pro produces noticeably smoother motion and more realistic facial detail — essential for any final ad that features a person's face. Default is Pro, and for ads with people, it's worth the cost difference.
Connect a product photo or talent image to the Video node as input. Kling 3.0 will animate from this frame, preserving the exact product appearance, person's likeness, or brand colors. This is critical for ads — text-to-video alone will approximate but never match your actual product or brand ambassador.
Kling 3.0 excels at following specific motion direction. Write your prompt as a single-shot brief: "The woman picks up the product with her right hand, turns it to show the label, then looks at camera with a subtle smile. Slow push-in, soft background blur." Describe what the subject DOES, not just what the scene LOOKS like.
Run the same prompt twice: once in 16:9 for YouTube and display ads, once in 9:16 for TikTok, Reels, and Shorts. Kling 3.0 supports a wide range of aspect ratios. This gives you platform-ready assets from one creative concept, and the composition adapts naturally to each format.
Lifestyle ad with human performance — Kling 3.0's strength is the "satisfied expression" micro-expression: the slight squint, the corner-of-mouth lift. Other models tend to produce a frozen smile.
A young professional in smart casual attire takes a sip from a branded coffee cup, looks up with a satisfied expression, soft office background with natural light through large windows, steadicam, lifestyle advertisement
Unboxing shot with hand interaction — the physical interaction between hands, paper, and box requires fine motor control that Kling 3.0 handles well. The overhead angle simplifies composition.
Close-up of hands carefully unwrapping a premium gift box, revealing a product inside, tissue paper rustling naturally, overhead angle, warm ambient lighting, ASMR-style product reveal
Always use Pro tier for final ads with people. The facial detail difference between Standard and Pro is immediately visible — Standard faces can look slightly waxy or stiff.
Kling 3.0 Motion Control is a separate model variant for precise camera path planning. Use it when you need an exact camera trajectory (e.g., a 180° product orbit) by uploading a reference video for the motion path.
For testimonial-style ads, combine Kling 3.0 for the video with ElevenLabs TTS for the voiceover. Connect both on the canvas for a complete ad pipeline.
Kling 3.0 generates 5-second clips by default. Plan your ad as multiple 5-second shots and stitch them together, rather than trying to fit everything into one generation.
Kling 3.0 Pro produces the best human motion and facial expressions among all video models on Martini. Standard tier is noticeably cheaper but the quality difference is visible in faces. Videos generate at up to 1080p, 5 seconds per clip. Pro generation takes 1-3 minutes. For product-only shots without people, Sora 2 may produce more physically accurate results; for anything with a human subject, Kling 3.0 Pro is the clear winner.
Connect Kling 3.0 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeOpenAI
Sora 2 is OpenAI's video model, and its standout strength is physics simulation — liquids pour realistically, fabrics drape naturally, and objects interact with believable weight and momentum. For video ads, this means product shots look physically convincing without the uncanny "AI float" that plagues other models. On Martini, Sora 2 costs 100 credits for a 10-second clip or 150 credits for 15 seconds, with only two aspect ratios: 16:9 (landscape) and 9:16 (portrait). There are no quality tiers, speed options, or other knobs to tune — Sora 2 is a zero-config model where all your creative energy goes into the prompt and reference image.
View guideGoogle's Veo 3 is the only video model on Martini that generates synchronized audio alongside video. Every other model produces silent video that requires separate audio work. For ads, this is transformative — you get ambient sound, sound effects, and even music in a single generation step. The latest version (Veo 3.1) offers Standard and Fast tiers with support for reference images.
View guideMinimax
Hailuo 02 by Minimax is the workhorse for video ad production — reliably generating clean, well-composed product commercials with consistent color accuracy. Where Sora 2 excels at physics and Kling 3.0 at people, Hailuo 02 excels at commercial polish: product reveals, beauty shots, and food content with the kind of clean, controlled composition that clients expect from ad agencies. Its Standard and Pro tiers let you iterate cheaply and deliver expensively.
View guide