Kling
Kling 3.0 is the best model on Martini for animating images that contain people. Its Pro tier generates the most natural facial expressions, body movement, and hair physics of any image-to-video model. Standard tier costs 19 credits/second (95 credits for a 5-second clip), while Pro costs 25 credits/second (125 credits for 5 seconds). The price gap is modest, but the quality gap on human faces is dramatic — Pro handles the micro-expressions (blinking rhythm, mouth corners lifting, subtle head tilts) that make the difference between "obviously AI" and "wait, is that real?" For landscapes and objects without people, Standard is perfectly sufficient.
The source image matters more than the prompt for image-to-video quality — it's the single biggest factor in whether your animation looks natural or forced. Images with implied motion produce dramatically better results: a person mid-stride (the model continues the walking motion), wind-blown hair (the model adds flowing movement), waves about to crash (the model completes the wave). Static, symmetrical, formally posed images are the hardest to animate convincingly because the model has to invent motion from scratch rather than continuing motion that's already implied. If your source image is a stiff headshot against a plain background, even Kling 3.0 Pro will struggle. Consider using an Image node first to generate a more dynamic starting image — for example, use FLUX Kontext to edit a formal headshot into a more candid pose before feeding it to the Video node.
This is the most common and most expensive mistake in image-to-video: re-describing what's already visible. Kling 3.0 already sees your image — every pixel, every color, every detail. Your prompt should be 100% motion direction, 0% scene description. Good: "She blinks, tilts her head left, and a warm smile forms. Hair sways gently in a breeze. Static camera." Bad: "A beautiful woman with brown hair wearing a blue dress in a sunlit garden smiles warmly." When you re-describe the scene, you're competing with what the model already sees, which can cause it to try to "reconcile" your text description with the visual — leading to color shifts, detail changes, or the model subtly altering the image to match your text rather than just adding motion to what's already there. The rule is simple: verbs only. What does the subject DO? How does the camera MOVE? Nothing else.
The quality gap between Standard (19 credits/second) and Pro (25 credits/second) is specifically about human rendering. Standard tier can produce waxy skin, frozen expressions, or jerky mouth movements — artifacts that are immediately noticeable when viewers are looking at a face. Pro tier handles facial micro-expressions with far greater realism: the subtle squint of a genuine smile, the natural rhythm of blinking (not robotic on/off), the slight movement of eyebrows during expression changes. For landscapes, nature scenes, abstract art, or any image without a human face, Standard produces results that are visually indistinguishable from Pro — the 30% cost savings (95 vs 125 credits per 5s) is worth it. The decision is simple: is there a human face in the frame? If yes, Pro. If no, Standard.
Every image-to-video prompt should include an explicit camera instruction, even if that instruction is "static camera." Without camera direction, Kling 3.0 may add its own camera motion — sometimes effectively, sometimes distractingly. "Static camera" keeps all attention on the subject's motion and is best for portrait animations and close-ups. "Slow push forward" creates intimacy by gradually moving closer to the subject — effective for emotional content. "Gentle orbit" reveals depth and dimension — ideal for product images or architectural photos. "Pull back" creates a dramatic reveal of context around the subject. The key insight: Kling 3.0 creates natural parallax between foreground and background layers when you specify camera movement, meaning a simple "slow push forward" on a landscape photo creates a 3D depth effect from a flat image. This parallax effect is good but not as pronounced as Ray 2's — if camera movement is your primary goal, Ray 2 (120 credits at 540p, 190 at 720p) produces more cinematic camera physics.
Landscape animation — this prompt demonstrates the motion-only rule perfectly. Notice zero scene description: no mention of beach, ocean, sky colors, sand texture, or time of day. It's 100% motion direction: waves crash (water movement), foam spreads (surface movement), birds glide (aerial movement), camera pushes (viewer movement). Four distinct motion vectors create a rich, layered animation from what might be a static photo. Kling 3.0 Standard (95 credits for 5s) is ideal here — no human faces means no need for Pro tier.
The waves gently crash on the shore, foam spreading across the sand, seabirds glide overhead. Slow subtle camera push forward.
Portrait animation — subtle, natural movements work best for portrait photos. This prompt asks for three micro-motions (blink, head tilt, smile) plus one environmental motion (hair in breeze), all at low intensity. Asking for dramatic action (jumping, running, dancing) from a static portrait photo will look unnatural because the model has to invent body positions that don't exist in the source image. Match the motion intensity to what the source image implies. The "static camera" instruction is critical — without it, the model might add a slow zoom or pan that competes with the subtle facial animation.
She blinks naturally and tilts her head slightly, a warm smile crosses her face. Hair moves gently in a light breeze. Static camera.
The #1 rule of image-to-video prompting: describe motion, not scenes. If your prompt mentions colors, backgrounds, clothing, or what the subject looks like, you're competing with the image the model already sees — causing reconciliation artifacts. Verbs and camera directions only.
Standard tier at 19 credits/second costs 95 credits per 5-second clip; Pro at 25 credits/second costs 125 credits. The 30-credit difference is only worth it for human faces. For a batch of 10 landscape animations, Standard saves 300 credits (950 vs 1,250) with no visible quality loss.
Pro tier doesn't just improve faces — it pays for itself. You'll typically regenerate Standard results 3-4 times trying to get natural facial expressions (285-380 credits wasted), then switch to Pro anyway. Starting with Pro for face animations is cheaper in practice despite the higher per-clip cost.
Always specify explicit camera direction. "Static camera" is a valid and important instruction — it tells the model to keep the frame locked so all motion comes from the subject. Without any camera instruction, Kling 3.0 defaults to adding its own camera movement, which can be distracting for portrait work.
Kling 3.0 generates 5-second clips from your source image. At Standard tier (19 credits/second, 95 credits per clip), it's one of the most affordable image-to-video options on Martini — cheaper than Seedance 2.0 Standard (20 credits/second) and far cheaper than Ray 2 (120-320 credits depending on resolution). Pro tier (25 credits/second, 125 credits per clip) maintains the highest fidelity to the original image while adding the most natural human motion of any model. The decision between Kling 3.0 and the other animate-images models comes down to what's in the frame: people → Kling 3.0 Pro (unmatched facial expressions), camera work → Ray 2 (cinematic dollies and orbits with filmic grain), dramatic action or illustration → Seedance 2.0 (high-energy motion, anime/illustration specialty). For a production workflow, many creators draft in Kling Standard (95 credits) to test motion direction, then finalize in Kling Pro (125 credits) for human subjects or switch to Ray 2 (190 credits at 720p) for landscapes that need cinematic camera physics.
Connect Kling 3.0 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeByteDance
Seedance 2.0 by ByteDance is optimized for dramatic, high-energy image animations — the kind of dynamic action that Kling 3.0 handles competently but Seedance handles exceptionally. Capes billowing, swords swinging, particles exploding, environmental destruction: Seedance turns these into fluid, cinematic clips. The model offers a tiered cost structure: Fast at 10 credits/second (5s clip = 50 credits), Standard at 20 credits/second (5s = 100 credits), and Pro at a flat 25 credits per 5s clip. It supports 6 aspect ratios including 21:9 ultra-widescreen, and works with both image-to-video and text-to-video. The Omni Pro variant additionally supports video-to-video and reference images for even more control.
View guideLuma
Luma Ray 2 is the specialist for camera-driven image animation on Martini. While Kling 3.0 excels at moving subjects (people, objects), Ray 2 excels at moving the camera — producing smooth dollies, orbits, zooms, and pans that feel like a real cinematographer's work rather than digital effects. It also adds a distinctive filmic quality (natural grain, cinematic color grading) that other models don't replicate. Ray 2 uses a resolution-based pricing model: 540p at 120 credits per 5-second clip, 720p at 190 credits, and 1080p at 320 credits. The budget option, Ray Flash 2, generates at 540p for 75 credits per 5-second clip — roughly 40% cheaper for testing camera angles before committing to a high-resolution final render.
View guide