Video

AI Video Reference Images

Most AI video tools take one reference image and forget it by the second clip. Martini chains multiple reference images across the canvas — character, style, scene — and feeds them into Vidu, Kling O3, Seedance 2, and Nano Banana 2 so subject and style stay locked across every generation, every cut, every shot in your sequence.

Try on Martini See pricing

What this feature solves

Reference images are how AI video models stay on-brief. Without one, the model generates whatever fits the prompt — and what fits the prompt this generation rarely matches what fit it last generation. With a reference, the model anchors to the visual you uploaded, holds the subject, and varies the prompt around that anchor. The problem is that most consumer-facing AI video tools accept one reference at a time and forget it the moment you start a new clip. The brand looks different shot to shot, the character drifts, the style wanders, and the resulting edit reads as a montage of unrelated generations.

The deeper issue is reference type. A real production needs three different references at once — character (who is in the shot), style (the visual treatment), scene (the environment or composition). Single-reference tools make you pick one and lose the others. Multi-reference workflows that keep all three locked are how cinematic-quality multi-shot AI video gets made, and they only exist on a canvas where the references travel forward with the workflow.

Then there is the cross-model problem. Different video models have different reference-handling vocabularies. Vidu treats reference images as authoritative for character and scene; Kling O3 with reference mode handles style and camera continuity; Seedance 2 anchors product and brand fidelity. The right approach is the right model per shot, all reading the same set of references — but that only works if the canvas keeps every reference accessible to every downstream node.

Why Martini is different

Martini treats reference images as canvas-level anchors, not per-clip uploads. Drop the character reference, the style reference, and the scene reference into image nodes once. Wire each into every downstream video node. Kling O3 reads the style; Vidu reads the character; Seedance 2 reads the product reference — and the same anchors flow into every cut in the sequence. References never get forgotten between generations because they live on the canvas, not inside one tab.

Multi-reference chaining unlocks production-grade consistency. For a branded film with a recurring character in a defined visual style across multiple environments, you wire the character reference, the style reference, and the per-scene reference into each video node. Each model receives all three and the cut lands looking intentional. For a product campaign, the product image, the brand color reference, and the scene composition reference travel together — every cut shares the same DNA.

Cross-model reference handling is the unlock. Different shots want different models, but the same references should drive all of them. Martini lets you fan one set of references into Vidu, Kling O3, and Seedance 2 in parallel for the same shot, and pick the model that holds the references best for that specific cut. The references stay locked; only the engine changes. That is impossible inside any single-model tool.

Common use cases

Lock a character across a multi-shot branded film

Pin one character reference and drive Kling O3, Vidu, and Seedance 2 video nodes from the same anchor so the protagonist stays identical across every cut.

Hold style consistency across a campaign

Use a style reference image (mood, color, treatment) as the second anchor and chain it into every video node so the campaign reads as one visual world.

Pin scene and environment continuity

Wire a scene reference (location, set, environment) into every shot in a sequence so the world stays continuous across cuts that should feel connected.

Multi-reference product video for ecommerce

Combine product reference, brand color reference, and scene composition reference into Seedance 2 and Vidu nodes for SKU campaigns that hold the brand.

Cross-model comparison with locked references

Fan one reference set into multiple models in parallel for the same shot, compare which engine holds the references best, and pick per-cut winners.

Editorial photo shoots translated to motion

Use the editorial photo set as a reference library and animate each frame with a model that respects the original treatment.

Recommended model stack

vidu

video

Authoritative reference handling for character and scene continuity across video.

kling-o3

video

Reference-mode video that holds style, character, and camera continuity through complex shots.

seedance-2

video

Strong product and brand reference adherence for commercial and ecommerce work.

nano-banana-2

image

Generate or refine the canonical reference images that feed every downstream video node.

kling-3

video

Cinematic camera language that respects the upstream reference image.

flux-kontext

image

Edit-ready image references when you need to vary scene or composition while preserving the subject.

How the workflow works in Martini

1
1. Build the reference library on the canvas
Drop your character, style, scene, and product references into image nodes — one per anchor. Label each clearly so the downstream nodes are easy to wire.
2
2. Pick the right reference set per shot
For a character-driven shot, wire the character reference. For a scene-driven shot, wire the scene reference. For a hero product cut, wire all of them. Each video node only needs the anchors that matter for that specific shot.
3
3. Choose the model that handles your reference type
Vidu and Kling O3 (reference mode) are best for character and scene anchors. Seedance 2 leads on product and brand fidelity. Pick per shot rather than per project.
4
4. Write prompts that complement the references
Tell the model what should happen — camera move, action, mood — without re-describing what the references already show. Less prompt, more references.
5
5. Fan out to compare reference handling
For tricky shots, duplicate the video node and swap models. Same references, different engines. Pick the take that holds the anchors best.
6
6. Sequence and export with consistency intact
Order the cuts in a sequence builder. The shared references make the edit feel intentional. NLE export drops the timeline into Premiere or DaVinci ready to grade.

Example workflow

A fashion brand is producing a 60-second campaign film with five shots, all featuring the same model in the same brand-styled wardrobe across different urban scenes. The team builds the reference library: model portrait reference (character), brand mood-board image (style), and one location reference per scene. They wire the model and style references into every video node, plus the per-scene location reference into the corresponding shot. Shot 1 (rooftop): Vidu with model + style + rooftop reference. Shot 2 (subway): Kling O3 with the same model + style + subway. Shot 3 (cafe): Seedance 2 for the seated product moment with all four references. The model looks identical across every shot because the character anchor never moved; the brand world holds because the style reference stays pinned. NLE export drops the timeline into DaVinci for color and finishing.

Tips and common mistakes

Tips

Use the highest-resolution references you have. Models down-sample but they cannot up-rez detail that was never in the source.
Separate references by type — character, style, scene, product. Mixing them into one composite weakens each.
Run hero shots through two or three models for the same reference set. The strongest reference handler varies by shot type.
Keep the canonical reference pinned on the canvas. Never re-generate it mid-project — drift compounds shot by shot.
Save the reference library as part of the canvas template. The next campaign reuses the references as a starting point.

Common mistakes

Combining character, style, and scene into one composite reference image. Models read it as a single anchor and lose specificity.
Re-uploading references per video node instead of wiring them from canvas-level image nodes. The lineage breaks.
Picking a model that does not respect references for a reference-critical shot. Use Vidu, Kling O3, or Seedance 2 — not a freestyle model.
Writing prompts that re-describe the reference. The reference does that job — the prompt should describe the action and camera.
Forgetting style continuity. Locking the character without a style reference still produces a shot-to-shot visual jumble.

Related how-to guides

Related models and tools

Tool

AI Video Frame Extraction

Extract frames from video for reference and image-to-video workflows.

Tool

AI Lip Sync

Lip-sync tools on Martini for syncing voice and dialogue to portraits and video.

Tool

AI Image Upscaling

Upscale images and keyframes before final video generation on Martini.

Provider

Google

Google's Veo video, Imagen image, and Nano Banana model workflows on Martini.

Provider

Kling

Kling 3, O3, and Avatar video model workflows on Martini.

Provider

Vidu

Vidu's reference-driven video and character consistency workflows on Martini.

Provider

ByteDance

ByteDance's Seedance video and Seedream image model families on Martini.

Provider

OpenAI

OpenAI's GPT Image and Sora video model workflows available on Martini.

Related features

AI Character Consistency Across Images and Video

Keep a subject consistent across image and video generations on Martini using reference workflows.

Multi-Shot AI Video — Build Connected Scenes, Not Isolated Clips

Plan, generate, and sequence multi-shot AI video on Martini — keep characters, style, and motion consistent across shots.

AI Image to Video — Animate Stills Into Production-Ready Shots

Turn still images into production-ready video shots on Martini's canvas — multi-model, reference-aware, NLE-export ready.

AI Influencer Video Generator — Repeatable Character Pipeline

Design, generate, and scale AI influencer videos on Martini — character library, voice cloning, lip-synced video, all on one canvas.

AI Product Video Generator — From Product Image to Ad Video

Create product ads and demos from product images on Martini's canvas — chain product photo to multi-shot video across Seedance, Runway Gen-4, and GPT Image.

AI Ad Creative Generator — Multi-Format Ad Visuals and Video

Generate ad visuals and videos across Ideogram, Flux, Seedance, and Runway on Martini — every aspect ratio, every variant, one canvas.

AI Avatar Video Generator — Talking Avatars from Image and Audio

Create talking avatar videos from image and audio on Martini's canvas — Kling Avatar, OmniHuman, ElevenLabs, locked identity across every clip.

AI Talking Head Video — Spokesperson, Course, and Narration

Produce spokesperson, course, and narration videos on Martini's canvas — Kling Avatar, OmniHuman, ElevenLabs, Fish Audio, locked identity end to end.

Video to Video AI — Restyle, Edit, Transform Source Footage

Restyle, transform, and edit source video on Martini's canvas — Runway Aleph, Kling O3, Wan chained into multi-shot pipelines.

AI Video Generator — Multi-Model AI Video Production on Martini

Multi-model AI video generation with text, image, reference, and editing workflows on Martini's canvas.

Text to Video AI — Generate Video From Prompts on Martini

Generate video from prompts and chain outputs into scenes on Martini's multi-model canvas.

Consistent Character AI Video — Reference-Driven Video on Martini

Preserve character identity through reference-driven video models on Martini.

AI Explainer Video — Educational and B2B Demo Videos

Generate explainer videos, B2B demos, and educational content on Martini's canvas.

Related docs

Comparisons

Martini vs kling-ai

/vs/kling-ai

Frequently asked questions

How many reference images can I use for one video shot?

Vidu and Kling O3 (reference mode) accept multiple reference images per generation — typically up to four anchors covering character, style, scene, and an additional context reference. Seedance 2 supports a primary reference plus secondary product anchors. The right number is the smallest set that locks what matters; more references is not always better.

Which model is best for character reference?

Vidu is the strongest character-reference handler in the registry — it treats the character image as authoritative and preserves face, wardrobe, and proportions across the take. Kling O3 with reference mode is the close second and adds stronger camera continuity. For talking-head character video specifically, Kling Avatar is purpose-built.

How do I keep style consistent across a multi-shot edit?

Use a single style reference image (mood board, color palette, treatment example) and wire it into every video node in the sequence alongside any scene-specific anchors. Style references work best when they are uncluttered and dominated by the visual treatment you want to lock — not by competing subjects.

Can I mix references from different sources?

Yes — character from a portrait shoot, style from a mood board, scene from a location scout. Just keep each reference focused on its own role; do not composite them into one image. The canvas holds them as separate anchors and feeds each into the video model intentionally.

What resolution should reference images be?

Highest available — 1024px on the long edge minimum, ideally 2048px or above. Reference quality directly drives generation quality, and downsampling is cheap while up-rezing is impossible.

How does this compare to single-reference tools like Runway?

Runway accepts a single reference per generation in most modes. Martini lets you wire multiple reference images on the canvas and feed any combination into Vidu, Kling O3, or Seedance 2 per shot — and chain the same references across an entire multi-cut sequence. For one-clip work, Runway direct is fine. For multi-shot brand work, the canvas reference workflow is the difference.

Build it on the canvas

Open Martini and wire this workflow up in minutes. Free to start — no card required.

Open the canvas See pricing

Video

AI Video Reference Images

Try on Martini See pricing

What this feature solves

Why Martini is different

Common use cases

Lock a character across a multi-shot branded film

Pin one character reference and drive Kling O3, Vidu, and Seedance 2 video nodes from the same anchor so the protagonist stays identical across every cut.

Hold style consistency across a campaign

Use a style reference image (mood, color, treatment) as the second anchor and chain it into every video node so the campaign reads as one visual world.

Pin scene and environment continuity

Wire a scene reference (location, set, environment) into every shot in a sequence so the world stays continuous across cuts that should feel connected.

Multi-reference product video for ecommerce

Combine product reference, brand color reference, and scene composition reference into Seedance 2 and Vidu nodes for SKU campaigns that hold the brand.

Cross-model comparison with locked references

Fan one reference set into multiple models in parallel for the same shot, compare which engine holds the references best, and pick per-cut winners.

Editorial photo shoots translated to motion

Use the editorial photo set as a reference library and animate each frame with a model that respects the original treatment.

Recommended model stack

vidu

video

Authoritative reference handling for character and scene continuity across video.

kling-o3

video

Reference-mode video that holds style, character, and camera continuity through complex shots.

seedance-2

video

Strong product and brand reference adherence for commercial and ecommerce work.

nano-banana-2

image

Generate or refine the canonical reference images that feed every downstream video node.

kling-3

video

Cinematic camera language that respects the upstream reference image.

flux-kontext

image

Edit-ready image references when you need to vary scene or composition while preserving the subject.

How the workflow works in Martini

1
1. Build the reference library on the canvas
Drop your character, style, scene, and product references into image nodes — one per anchor. Label each clearly so the downstream nodes are easy to wire.
2
2. Pick the right reference set per shot
For a character-driven shot, wire the character reference. For a scene-driven shot, wire the scene reference. For a hero product cut, wire all of them. Each video node only needs the anchors that matter for that specific shot.
3
3. Choose the model that handles your reference type
Vidu and Kling O3 (reference mode) are best for character and scene anchors. Seedance 2 leads on product and brand fidelity. Pick per shot rather than per project.
4
4. Write prompts that complement the references
Tell the model what should happen — camera move, action, mood — without re-describing what the references already show. Less prompt, more references.
5
5. Fan out to compare reference handling
For tricky shots, duplicate the video node and swap models. Same references, different engines. Pick the take that holds the anchors best.
6
6. Sequence and export with consistency intact
Order the cuts in a sequence builder. The shared references make the edit feel intentional. NLE export drops the timeline into Premiere or DaVinci ready to grade.

Example workflow

Tips and common mistakes

Tips

Use the highest-resolution references you have. Models down-sample but they cannot up-rez detail that was never in the source.
Separate references by type — character, style, scene, product. Mixing them into one composite weakens each.
Run hero shots through two or three models for the same reference set. The strongest reference handler varies by shot type.
Keep the canonical reference pinned on the canvas. Never re-generate it mid-project — drift compounds shot by shot.
Save the reference library as part of the canvas template. The next campaign reuses the references as a starting point.

Common mistakes

Combining character, style, and scene into one composite reference image. Models read it as a single anchor and lose specificity.
Re-uploading references per video node instead of wiring them from canvas-level image nodes. The lineage breaks.
Picking a model that does not respect references for a reference-critical shot. Use Vidu, Kling O3, or Seedance 2 — not a freestyle model.
Writing prompts that re-describe the reference. The reference does that job — the prompt should describe the action and camera.
Forgetting style continuity. Locking the character without a style reference still produces a shot-to-shot visual jumble.

Related how-to guides

Related models and tools

Tool

AI Video Frame Extraction

Extract frames from video for reference and image-to-video workflows.

Tool

AI Lip Sync

Lip-sync tools on Martini for syncing voice and dialogue to portraits and video.

Tool

AI Image Upscaling

Upscale images and keyframes before final video generation on Martini.

Provider

Google

Google's Veo video, Imagen image, and Nano Banana model workflows on Martini.

Provider

Kling

Kling 3, O3, and Avatar video model workflows on Martini.

Provider

Vidu

Vidu's reference-driven video and character consistency workflows on Martini.

Provider

ByteDance

ByteDance's Seedance video and Seedream image model families on Martini.

Provider

OpenAI

OpenAI's GPT Image and Sora video model workflows available on Martini.

Related docs

Comparisons

Martini vs kling-ai

/vs/kling-ai

Frequently asked questions

How many reference images can I use for one video shot?

Which model is best for character reference?

How do I keep style consistent across a multi-shot edit?

Can I mix references from different sources?

What resolution should reference images be?

Highest available — 1024px on the long edge minimum, ideally 2048px or above. Reference quality directly drives generation quality, and downsampling is cheap while up-rezing is impossible.

How does this compare to single-reference tools like Runway?

Build it on the canvas

Open Martini and wire this workflow up in minutes. Free to start — no card required.

Open the canvas See pricing

What this feature solves

Why Martini is different

Common use cases

Lock a character across a multi-shot branded film

Hold style consistency across a campaign

Pin scene and environment continuity

Multi-reference product video for ecommerce

Cross-model comparison with locked references

Editorial photo shoots translated to motion

Recommended model stack

vidu

kling-o3

seedance-2

nano-banana-2

kling-3

flux-kontext

How the workflow works in Martini

1. Build the reference library on the canvas

2. Pick the right reference set per shot

3. Choose the model that handles your reference type

4. Write prompts that complement the references

5. Fan out to compare reference handling

6. Sequence and export with consistency intact

Example workflow

Tips and common mistakes

Tips

Common mistakes

Related how-to guides

Related models and tools

AI Video Frame Extraction

AI Lip Sync

AI Image Upscaling

Google

Kling

Vidu

ByteDance

OpenAI

Related features

AI Character Consistency Across Images and Video

Multi-Shot AI Video — Build Connected Scenes, Not Isolated Clips

AI Image to Video — Animate Stills Into Production-Ready Shots

AI Influencer Video Generator — Repeatable Character Pipeline

AI Product Video Generator — From Product Image to Ad Video

AI Ad Creative Generator — Multi-Format Ad Visuals and Video

AI Avatar Video Generator — Talking Avatars from Image and Audio

AI Talking Head Video — Spokesperson, Course, and Narration

Video to Video AI — Restyle, Edit, Transform Source Footage

AI Video Generator — Multi-Model AI Video Production on Martini

Text to Video AI — Generate Video From Prompts on Martini

Consistent Character AI Video — Reference-Driven Video on Martini

AI Explainer Video — Educational and B2B Demo Videos

Related docs

Related reading

Comparisons

Martini vs kling-ai

Frequently asked questions

How many reference images can I use for one video shot?

Which model is best for character reference?

How do I keep style consistent across a multi-shot edit?

Can I mix references from different sources?

What resolution should reference images be?

How does this compare to single-reference tools like Runway?

Build it on the canvas

This website uses cookies

What this feature solves

Why Martini is different

Common use cases

Lock a character across a multi-shot branded film

Hold style consistency across a campaign

Pin scene and environment continuity

Multi-reference product video for ecommerce

Cross-model comparison with locked references

Editorial photo shoots translated to motion

Recommended model stack

vidu

kling-o3

seedance-2

nano-banana-2

kling-3

flux-kontext