Kling
Kling 3.0 is the best model on Martini for animating images that contain people. Its Pro tier generates the most natural facial expressions, body movement, and hair physics of any image-to-video model. Two tiers are available: Standard for fast iteration and Pro for delivery-grade output. The quality gap on human faces is dramatic — Pro handles the micro-expressions (blinking rhythm, mouth corners lifting, subtle head tilts) that make the difference between "obviously AI" and "wait, is that real?" For landscapes and objects without people, Standard is perfectly sufficient.
The source image matters more than the prompt for image-to-video quality — it's the single biggest factor in whether your animation looks natural or forced. Images with implied motion produce dramatically better results: a person mid-stride (the model continues the walking motion), wind-blown hair (the model adds flowing movement), waves about to crash (the model completes the wave). Static, symmetrical, formally posed images are the hardest to animate convincingly because the model has to invent motion from scratch rather than continuing motion that's already implied. If your source image is a stiff headshot against a plain background, even Kling 3.0 Pro will struggle. Consider using an Image node first to generate a more dynamic starting image — for example, use FLUX Kontext to edit a formal headshot into a more candid pose before feeding it to the Video node.
This is the most common and most expensive mistake in image-to-video: re-describing what's already visible. Kling 3.0 already sees your image — every pixel, every color, every detail. Your prompt should be 100% motion direction, 0% scene description. Good: "She blinks, tilts her head left, and a warm smile forms. Hair sways gently in a breeze. Static camera." Bad: "A beautiful woman with brown hair wearing a blue dress in a sunlit garden smiles warmly." When you re-describe the scene, you're competing with what the model already sees, which can cause it to try to "reconcile" your text description with the visual — leading to color shifts, detail changes, or the model subtly altering the image to match your text rather than just adding motion to what's already there. The rule is simple: verbs only. What does the subject DO? How does the camera MOVE? Nothing else.
The quality gap between Standard and Pro is specifically about human rendering. Standard tier can produce waxy skin, frozen expressions, or jerky mouth movements — artifacts that are immediately noticeable when viewers are looking at a face. Pro tier handles facial micro-expressions with far greater realism: the subtle squint of a genuine smile, the natural rhythm of blinking (not robotic on/off), the slight movement of eyebrows during expression changes. For landscapes, nature scenes, abstract art, or any image without a human face, Standard produces results that are visually indistinguishable from Pro. The decision is simple: is there a human face in the frame? If yes, Pro. If no, Standard.
Every image-to-video prompt should include an explicit camera instruction, even if that instruction is "static camera." Without camera direction, Kling 3.0 may add its own camera motion — sometimes effectively, sometimes distractingly. "Static camera" keeps all attention on the subject's motion and is best for portrait animations and close-ups. "Slow push forward" creates intimacy by gradually moving closer to the subject — effective for emotional content. "Gentle orbit" reveals depth and dimension — ideal for product images or architectural photos. "Pull back" creates a dramatic reveal of context around the subject. The key insight: Kling 3.0 creates natural parallax between foreground and background layers when you specify camera movement, meaning a simple "slow push forward" on a landscape photo creates a 3D depth effect from a flat image. This parallax effect is good but not as pronounced as Ray 2's — if camera movement is your primary goal, Ray 2 produces more cinematic camera physics at higher resolutions.
Landscape animation — this prompt demonstrates the motion-only rule perfectly. Notice zero scene description: no mention of beach, ocean, sky colors, sand texture, or time of day. It's 100% motion direction: waves crash (water movement), foam spreads (surface movement), birds glide (aerial movement), camera pushes (viewer movement). Four distinct motion vectors create a rich, layered animation from what might be a static photo. Kling 3.0 Standard is ideal here — no human faces means no need for Pro tier.
The waves gently crash on the shore, foam spreading across the sand, seabirds glide overhead. Slow subtle camera push forward.
Portrait animation — subtle, natural movements work best for portrait photos. This prompt asks for three micro-motions (blink, head tilt, smile) plus one environmental motion (hair in breeze), all at low intensity. Asking for dramatic action (jumping, running, dancing) from a static portrait photo will look unnatural because the model has to invent body positions that don't exist in the source image. Match the motion intensity to what the source image implies. The "static camera" instruction is critical — without it, the model might add a slow zoom or pan that competes with the subtle facial animation.
She blinks naturally and tilts her head slightly, a warm smile crosses her face. Hair moves gently in a light breeze. Static camera.
The #1 rule of image-to-video prompting: describe motion, not scenes. If your prompt mentions colors, backgrounds, clothing, or what the subject looks like, you're competing with the image the model already sees — causing reconciliation artifacts. Verbs and camera directions only.
Standard tier and Pro tier both produce 5-second clips. The tier upgrade is only worth it for human faces. For a batch of 10 landscape animations, Standard delivers identical perceived quality with no visible quality loss.
Pro tier doesn't just improve faces — it pays back the wait. You'll typically regenerate Standard results 3-4 times trying to get natural facial expressions, then switch to Pro anyway. Starting with Pro for face animations is faster end-to-end despite the longer single render.
Always specify explicit camera direction. "Static camera" is a valid and important instruction — it tells the model to keep the frame locked so all motion comes from the subject. Without any camera instruction, Kling 3.0 defaults to adding its own camera movement, which can be distracting for portrait work.
Kling 3.0 generates 5-second clips from your source image. Standard tier is one of the quickest image-to-video options on Martini — faster than Seedance 2.0 Standard and Ray 2. Pro tier maintains the highest fidelity to the original image while adding the most natural human motion of any model. The decision between Kling 3.0 and the other animate-images models comes down to what's in the frame: people → Kling 3.0 Pro (unmatched facial expressions), camera work → Ray 2 (cinematic dollies and orbits with filmic grain), dramatic action or illustration → Seedance 2.0 (high-energy motion, anime/illustration specialty). For a production workflow, many creators draft in Kling Standard to test motion direction, then finalize in Kling Pro for human subjects or switch to Ray 2 at 720p for landscapes that need cinematic camera physics.
Connect Kling 3.0 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeByteDance
Seedance 2.0 by ByteDance is optimized for dramatic, high-energy image animations — the kind of dynamic action that Kling 3.0 handles competently but Seedance handles exceptionally. Capes billowing, swords swinging, particles exploding, environmental destruction: Seedance turns these into fluid, cinematic clips. The model offers three tiers: Fast for rapid motion exploration, Standard for publishable quality, and Seedance 2 Pro for maximum detail. It supports 6 aspect ratios including 21:9 ultra-widescreen, and works with both image-to-video and text-to-video. The Omni Pro variant additionally supports video-to-video and reference images for even more control.
View guideLuma
Luma Ray 2 is the specialist for camera-driven image animation on Martini. While Kling 3.0 excels at moving subjects (people, objects), Ray 2 excels at moving the camera — producing smooth dollies, orbits, zooms, and pans that feel like a real cinematographer's work rather than digital effects. It also adds a distinctive filmic quality (natural grain, cinematic color grading) that other models don't replicate. Ray 2 offers three resolution tiers (540p, 720p, 1080p) for 5-second clips, with output detail and render time scaling up at each step. The lighter Ray Flash 2 variant generates at 540p faster than full Ray 2 — useful for testing camera angles before committing to a high-resolution final render.
View guide