Vidu
Generate consistent character stills on Martini using Vidu Reference-to-Image — accepts 1-7 reference images per generation and outputs character stills that flow directly into Vidu video nodes for matched motion. Vidu's reference workflow is optimized for the image-to-video character pipeline: the same model family that locks identity on the still also handles the motion, eliminating cross-model identity drift at the modality boundary. For producers who plan to ship character video content downstream, Vidu Reference-to-Image is the cleanest single-vendor path.
Vidu Reference-to-Image accepts up to seven reference inputs per generation. Build the reference stack: canonical face portrait (slot 1), wardrobe reference (slot 2), pose reference (slot 3), lighting moodboard (slot 4), location still (slot 5), accessory references (slots 6-7). The model balances all seven references; slot 1 carries face identity, the rest direct context.
Vidu's differentiator is single-vendor character continuity from image to video. If your downstream is a Vidu video node (Q2 or Q3), generate the character still here — the same training distribution carries identity across modalities. If your downstream is Sora 2 or Kling 3, Nano Banana 2 is a stronger pick for the still. Match the still model to the video model when possible.
Like Nano Banana 2, Vidu reads identity from the reference, not from the prompt. Write the prompt around what changes: pose, expression, scene, lighting, action. "Same character, walking through a forest at golden hour, casual hiking outfit, light backpack, looking forward, slight smile, three-quarter angle." Identity stays in the reference; everything else flows through the prompt.
Run three Vidu Reference-to-Image generations from the same canvas — front view, three-quarter (45-degree profile), full profile. Each shares the slot-1 face reference; only the angle prompt changes. This three-frame sheet becomes the master reference for downstream Vidu video nodes; the same model family means tighter consistency across the still-to-motion handoff.
For an episodic series (12-week AI character content), duplicate the Vidu Reference-to-Image node per episode setting. Same slot-1 face reference; episode-specific scene/wardrobe in slots 2-5; episode-specific prompt. Each episode then chains into Vidu Q2 or Q3 video for the motion shot. The model continuity between still and motion is the key advantage.
Wire the Vidu Reference-to-Image output as the starting frame for a Vidu Q2 or Q3 video node. Q2 accepts 1-7 character references, matching this still's reference workflow. Q3 is the general-purpose video baseline. The same character identity carries from still to motion — the cleanest character-to-video pipeline on the canvas when you stay within the Vidu family.
Anchor frame for the Vidu character pipeline. Slot 1 reused on every downstream still and video generation.
[Reference slot 1: canonical face portrait] + Generate the canonical front-view character still. Studio lighting, neutral grey background, sharp focus, three-quarter body framing, 1024x1024 resolution.
Multi-reference for episode-specific scene. Slot 1 = face, slot 2 = wardrobe context. Vidu balances both.
[Slot 1: face] + [Slot 2: wardrobe moodboard - hiking apparel] + Same character, walking through a forest at golden hour, casual hiking outfit, light backpack, looking forward, slight smile, three-quarter angle.
Episodic content frame for vertical short-form. The vertical aspect feeds directly into Vidu Q2 vertical video.
[Slot 1: face] + [Slot 2: location reference - urban rooftop at sunset] + Same character, standing on rooftop at sunset, business casual outfit (open collar shirt, dark trousers), looking out at the skyline, contemplative expression, profile angle, 9:16 vertical.
Three-reference workflow for richer context. Pose reference helps Vidu compose the body language consistently.
[Slots 1-3: face + wardrobe + pose reference] + Same character at a coffee shop counter morning light, holding a takeaway cup, looking three-quarter to camera right, slight smile, casual outfit (cream sweater, jeans), 4:5 social aspect.
Slot 1 is for the canonical face. Always. Reordering breaks the lock.
Use 7 references for high-stakes hero stills, 1-3 for variants and exploration. Each reference adds processing time.
Match the still model to your downstream video model. Vidu Reference-to-Image to Vidu Q2/Q3 is the single-vendor pipeline; mixing vendors across stills and motion can introduce drift.
For episodic content, save the slot-1 reference and the prompt scaffolding as a Martini template. Each episode reuses the same character lock with new scene prompts.
Aspect ratio matters for the downstream video. Generate the still at the same aspect as the planned video clip — re-cropping at the video node introduces composition drift.
Pair with Vidu Q2 video for character-led shots (1-7 character references), Vidu Q3 for general-purpose motion. Q1 requires multiple reference images per call.
Vidu Reference-to-Image returns 1024x1024 or rectangular outputs (configurable per aspect) with character identity holding at ~90% across 30+ generations from the same slot-1 reference. Generation time 20-40s per output. Up to 7 reference inputs per generation. Output drops onto the canvas as the locked still — primary downstream is a Vidu Q2 or Q3 video node for the motion shot, keeping the entire character pipeline within the Vidu model family for tighter still-to-motion consistency.
Connect Vidu Reference-to-Image with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeBuild an AI persona once on Nano Banana 2 and ship a character sheet that holds across pose, outfit, and scene shifts on the Martini canvas. Nano Banana 2 is the strongest face-locker in the stack: it accepts up to 10 reference images and outputs at 1K, 2K, or 4K, with face consistency that survives 50+ generations from the same canonical reference. For AI influencer producers who keep one persona identical across a 12-week content series, this is the load-bearing model — every other model in the chain inherits the lock from here.
View guideBlack Forest Labs
Edit your locked character into new outfits, scenes, and poses on Martini using FLUX Kontext — built specifically for instructed image edits that preserve subject identity. Where Nano Banana 2 generates the canonical character sheet, FLUX Kontext is the wardrobe-and-scene editor that takes that locked still and modifies it without losing the face. The two-model chain (Nano Banana 2 to lock identity, FLUX Kontext to vary wardrobe) is the cleanest character-consistency pipeline on the canvas.
View guide