Vidu
Vidu Q2 Subject Ref accepts 1-7 character reference images per generation — the densest character-reference slot among the three models in this scenario. For an AI influencer producer keeping "Mia" identical across a 12-week content series, that 7-image character sheet (front, three-quarter, profile, full-body, hands, expression range) gives Vidu more identity vectors than any single-anchor model. The result is the strongest face/jaw/hairline lock across multiple shots, especially when wardrobe and location vary.
Before opening Vidu, generate the character sheet on Nano Banana 2 (the strongest face-locker for the source images). Aim for 5-7 distinct angles and expressions: front portrait, three-quarter, profile, full-body, neutral expression, smiling expression, hands close-up. This sheet is the identity bedrock — Vidu will sample identity from every image you provide.
On Martini's canvas, drop the Vidu Q2 Subject Ref node and attach all 7 character images. Vidu accepts up to 7 — fewer is fine, but 7 is the maximum identity density. Pin the front portrait as the primary reference, the others fill in angles and expressions the prompt may invoke.
Vidu reads prompts in concert with references. For a profile shot of "Mia" walking, the prompt "profile shot, character walking left to right, soft daylight" tells Vidu to lean on the profile reference. For a front-facing dialogue shot, "medium close-up, character looks toward camera and speaks" leans on the front reference. Match prompt direction to the reference set.
For a 12-week content series, the wardrobe and location vary per episode — but identity must lock. With 7 references loaded, vary only the wardrobe and environment language in the prompt: "Mia in red leather jacket, downtown street at night" → "Mia in cream sweater, autumn forest path" → "Mia in business casual, modern office." Identity stays anchored; the rest is creative variation.
For a follow-up shot that needs to mirror a previous clip's motion (Mia tilts her head exactly the same way she did in episode 3), switch to Vidu V2V Pro — it merges video and image references in a single pass. Drop the previous motion clip as video reference + the same 7-image character sheet, and Vidu generates new footage with both motion and identity locked.
Once the Vidu Q2 Subject Ref node is set with the 7-image character sheet, save the canvas as a "Mia series" template. Each new episode reuses the same canvas with the wardrobe + location prompt swapped. Identity reads identical across the series because the reference stack does not change. This is the workflow that turns a 12-week content series into a single canvas.
Wardrobe + location variation, identity locked by the 7-image reference stack. Note no description of Mia's face — Vidu reads that from the references.
Mia in red leather jacket, downtown street at night, neon signs reflect on wet pavement, slow forward dolly, 6 seconds
Profile-specific shot. Vidu samples the profile angle from the reference sheet.
Profile shot of Mia walking left to right, autumn forest path, soft afternoon light, slight handheld breathing, 5 seconds
Front-facing dialogue beat. Vidu pulls the front reference and its expression range.
Medium close-up of Mia looking toward camera, soft golden hour key light, ambient outdoor cafe sound, 4 seconds
Vidu Q2 Subject Ref accepts up to 7 reference images — load the maximum for the densest identity lock.
Build the character sheet on Nano Banana 2 first; it is the strongest source-image face-locker.
Vary wardrobe + location in prompts but never describe the character's face — Vidu reads that from references.
Save the canvas as a series template — reuse the same Vidu node + 7 references across episodes.
For motion mirroring across episodes, switch to Vidu V2V Pro to merge video + image references.
Vidu Q2 Subject Ref outputs 5-8 second clips at 1080p with the strongest character-identity lock among the three models on this page — the 7-image reference slot is the differentiator. Render times 90-150 seconds. Best for AI influencer episodic content, recurring spokesperson, or fashion/beauty catalogs where identity must hold across 12+ shots. For motion-mirroring across episodes use Vidu V2V Pro; for budget-tier reference work use Seedance 2 Omni; for choreographed tight action use Kling O3 Reference.
Connect Vidu Q2 Subject Ref with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeKling
Kling O3 Reference adds character reference images for consistent appearance across clips and supports voice control over individual elements. Sharing the Kling 3.0 backbone (native 4K, 16-bit HDR, Omni Native Audio), it is the right pick when an AI influencer or brand spokesperson needs to deliver lip-synced dialogue across multiple cuts at festival-grade detail. The reference is stronger than Vidu on choreographed tight action; less reference-dense than Vidu Q2 (Vidu accepts 7, Kling O3 Reference reads fewer with stricter ranking).
View guideByteDance
Seedance 2 Omni adds character reference images to a generation that already accepts up to 12 reference assets — a unique combo of identity lock plus broad multimodal context (audio reference, location reference, palette reference). For an AI influencer producer running high-volume content where each episode varies wardrobe, location, and mood while identity stays anchored, Seedance Omni delivers strong per-clip Sutui economics. It is the pragmatic middle option between Vidu Q2 (densest reference) and Kling O3 Reference (tightest choreography).
View guide