Build an AI persona once on Nano Banana 2 and ship a character sheet that holds across pose, outfit, and scene shifts on the Martini canvas. Nano Banana 2 is the strongest face-locker in the stack: it accepts up to 10 reference images and outputs at 1K, 2K, or 4K, with face consistency that survives 50+ generations from the same canonical reference. For AI influencer producers who keep one persona identical across a 12-week content series, this is the load-bearing model — every other model in the chain inherits the lock from here.
Start with a single high-quality character portrait — uploaded photo, AI-generated baseline, or a brief that Nano Banana 2 generates. Drop it onto the canvas as the canonical reference image. This becomes slot 1 for every subsequent generation. Identity locks at this image; pose, outfit, and scene can vary downstream, but the face does not.
Nano Banana 2 reads up to 10 reference images per generation. Beyond the canonical face: wardrobe references (slot 2), pose references (slot 3), lighting moodboards (slot 4), location stills (slot 5). The model balances all 10 — face from slot 1, outfit from slot 2, pose from slot 3, etc. This multi-reference approach is what makes Nano Banana 2 the strongest character-consistency tool in the stack.
Don't describe the character ("brown hair, brown eyes, oval face") — the reference image holds that. Describe what changes: pose, outfit, scene, lighting, expression. "Same character, three-quarter profile, casual streetwear, on a Tokyo night street, neon reflections, slight smile." Identity locks via reference; everything else flows through the prompt.
For a complete character sheet, run three generations from the same canvas — front view, three-quarter (45-degree profile), and full profile. Each shares slot 1; only the angle prompt changes. This three-frame sheet becomes the master reference for every downstream generation across the 12-week series.
Once the character sheet locks, duplicate the Nano Banana 2 node 12+ times — one per wardrobe/location combination (gym workout, coffee shop, beach sunset, evening event, etc.). Each duplicate keeps slot 1 (the canonical face) identical; only the variation prompt changes. The fan-out is how AI influencer producers ship a full content batch in one session.
For image-to-video character work, route the locked Nano Banana 2 output as the starting frame into a Sora 2, Kling 3, or Vidu Q2 video node. The video model inherits the character identity from the still — face stays locked even when the camera moves and the body animates. This image-to-video hand-off is unique to Martini's canvas; competitors break character at the modality boundary.
Anchor frame for the entire character sheet. Slot 1 will be reused on every downstream generation across the 12-week series.
Generate the canonical front-view character portrait. [Reference image slot 1: upload of canonical face]. Studio lighting, neutral background, sharp focus, 1K resolution, three-quarter body framing.
Variation prompt focuses on what changes (angle, outfit, scene). The reference holds identity; the prompt holds the rest.
[Reference slot 1: canonical face] + Same character, three-quarter profile angle, casual streetwear (oversized hoodie, dark jeans), Tokyo neon-lit street at night, slight smile, slight head tilt left. 4:5 aspect, 2K resolution.
Multi-reference workflow. Slot 1 = face, slot 2 = wardrobe context. Nano Banana 2 reads both and balances them.
[Reference slot 1] + [Reference slot 2: wardrobe moodboard - athletic apparel] + Same character, mid-workout, gym setting, dramatic side light, sweat detail, focused expression, 1:1 aspect, 2K resolution.
Different scene, same identity. The 4K resolution gives downstream video models enough detail to upscale cleanly into vertical content.
[Reference slot 1] + Same character, evening event outfit (cocktail dress in deep emerald), holding a champagne flute, soft golden ballroom lighting, three-quarter angle, 9:16 aspect, 4K resolution.
Slot 1 is sacred. Always put the canonical face reference in slot 1. Reordering mid-series breaks the lock.
Use 1K for drafts, 2K for production stills, 4K only when downstream video upscales need the headroom. Generation cost scales with resolution.
For best face lock, the canonical reference should be a clean, well-lit, front or three-quarter portrait. Profile-only references produce weaker locks.
Add wardrobe and scene moodboards in slots 2-5 only when needed. Empty slots beat noisy slots — Nano Banana 2 weighs every reference.
Avoid describing facial features in the prompt ("brown eyes, oval face"). The reference image already has them; describing them again can introduce drift.
For image-to-video hand-off, generate the still at 4K and let the video model downsample. Higher input resolution = cleaner motion output.
Nano Banana 2 returns 1K, 2K, or 4K outputs with face consistency holding at ~95% across 50+ generations from the same slot-1 reference. Generation time: 1K 15-30s, 2K 30-60s, 4K 60-120s. Up to 10 reference inputs per generation — the densest reference slot among Martini image models. Output drops onto the canvas as the locked still; route into wardrobe/scene fan-out, downstream Flux Kontext for outfit edits, or Sora 2/Kling 3 video nodes for animated character content. The reference-image conditioning is the technical novelty that no prompt-only workflow can replicate.
Connect Nano Banana 2 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeBlack Forest Labs
Edit your locked character into new outfits, scenes, and poses on Martini using FLUX Kontext — built specifically for instructed image edits that preserve subject identity. Where Nano Banana 2 generates the canonical character sheet, FLUX Kontext is the wardrobe-and-scene editor that takes that locked still and modifies it without losing the face. The two-model chain (Nano Banana 2 to lock identity, FLUX Kontext to vary wardrobe) is the cleanest character-consistency pipeline on the canvas.
View guideVidu
Generate consistent character stills on Martini using Vidu Reference-to-Image — accepts 1-7 reference images per generation and outputs character stills that flow directly into Vidu video nodes for matched motion. Vidu's reference workflow is optimized for the image-to-video character pipeline: the same model family that locks identity on the still also handles the motion, eliminating cross-model identity drift at the modality boundary. For producers who plan to ship character video content downstream, Vidu Reference-to-Image is the cleanest single-vendor path.
View guide