Nano Banana 2 Workflows for Multi-Image Reference and Character Consistency
Multi-image reference and character consistency workflows on Martini using Nano Banana 2.
Key takeaways
- Nano Banana 2 is the canvas's strongest model for keeping one character recognizable across many generations — this is the use case it was tuned for.
- It accepts multi-image references in the same prompt, which is what makes it different from a single-reference image model.
- Build your character library once: generate three to five seed reference images of the character at different angles, then re-feed those references on every downstream generation.
- Chain Nano Banana 2 into Vidu, Kling Avatar, or Seedance 2 Omni for video that keeps the same identity. The image-side consistency carries through.
- Pair with Flux Kontext for surgical edits and clothing swaps without breaking the character's face — a recurring pattern for AI-influencer and recurring-spokesperson workflows.
What Nano Banana 2 actually does
Nano Banana 2 is the second-generation evolution of the Nano Banana image model, tuned specifically for the problem that breaks every other image model: keeping one character looking like the same person across many independent generations. The model ships with native multi-image reference handling, which means you can pass it three or four images of your character — front, three-quarter, profile, expression study — and ask it to compose a new frame that respects all of them. Single-reference models will hold the face for a few generations and then drift; Nano Banana 2 holds it across dozens.
Underneath, the model behaves like a quietly opinionated character animator. It is willing to invent new poses, new outfits, and new lighting setups, but it will fight you if you try to drift the face away from the references you passed in. That is exactly the behavior you want for a character library. The cost is that Nano Banana 2 is not the right pick for one-off aesthetic shots where you want stylistic novelty — for that, run Flux or Imagen 4 instead.
The 2.0 generation also handles fabric, hair, and skin tone with noticeably better stability than the original Nano Banana release. This matters in practice because the failure mode of older character models was usually subtle — the face was right but the skin tone shifted between frames, or the fabric of the outfit changed weave. Nano Banana 2 closes most of those gaps.
Building the character library — the canonical first step
Before you do any production work with Nano Banana 2, generate a character library. This is three to five canonical reference images of your character at the angles and expressions you will need downstream. The library is the source of truth that every later generation references. Spend twenty minutes here and the rest of your project gets easier; skip it and every shot will feel almost-right.
On the Martini canvas, this looks like: drop a Nano Banana 2 image node and write a detailed character description (age, build, hair, ethnicity, distinguishing features, signature wardrobe). Generate four or five takes, pick the strongest as the canonical front view, and pin it in the version tray. Then duplicate the node four times, each time wiring the canonical front view in as a reference, and prompt for the additional angles: three-quarter left, three-quarter right, profile, smiling expression close-up. You now have a five-image library that defines the character.
For an AI influencer or recurring spokesperson, expand this to ten or fifteen seed images covering more outfits, more emotional expressions, and a couple of full-body poses. The library lives on the canvas as a folder of pinned takes; every downstream Nano Banana 2 node gets one or more of them wired in as references.
Multi-image reference prompting
The thing Nano Banana 2 does that single-reference models cannot is read multiple images at once and weight them in the composition. The prompt structure that works is: action description first, then explicit reference attribution. For example, "the same character standing at a coffee bar in morning light, referencing image 1 for face and hair, image 2 for outfit, and image 3 for full-body proportions." Be explicit about which reference governs which attribute and the model will follow you.
A common mistake is passing too many references and watering down the signal. Three to four references per generation is the sweet spot. If you pass eight, the model has to average across all of them and you lose the precision that makes Nano Banana 2 worth picking in the first place. Treat references like ingredients, not like a buffet.
For composed scenes — the character interacting with a product, or the character placed in a specific environment — the right pattern is to pass the character references plus one environmental or product reference. Then prompt for the interaction explicitly: "character holding the bottle from image 4, looking down at the label, soft window light from frame left." This consistently produces frames that respect both the character identity and the product spec.
Pairing Nano Banana 2 with Flux Kontext for edits
The other model that earns a permanent slot next to Nano Banana 2 on most canvases is Flux Kontext. Nano Banana 2 generates the character; Kontext handles edits and clothing swaps without breaking the face. The pattern is the same as the GPT Image 2 / Kontext pairing — generation first, surgical edits downstream — but it matters more for character work because re-prompting Nano Banana 2 for an outfit change tends to subtly shift the face along with the outfit.
On the canvas, drop a Nano Banana 2 node, generate the character in the base outfit, pin the take, then drop a Flux Kontext node downstream wired to that take. Use Kontext to swap the outfit, change the background, fix a hand, or adjust an accessory. The face stays. Repeat the Kontext step for each variant — same character, fifteen outfits, all from one Nano Banana 2 base.
This pairing is the production backbone of an AI-influencer feed. One canonical character library, one base generation per shot type, and Kontext fanning out the wardrobe and environment variants downstream. It is cheaper, faster, and more consistent than re-generating from scratch.
Chaining Nano Banana 2 into video
Character consistency on the image side is the prerequisite for character consistency in video. The chains that hold identity best from Nano Banana 2 stills are: into Seedance 2 Omni for cinematic motion that respects the source image, into Vidu when you need fast iteration on character video at lower cost, and into Kling Avatar when the shot is a talking head and lip-sync is the deciding factor.
The wired-image pattern from the Seedance 2 handbook applies here. Pin a Nano Banana 2 still, wire it into a Seedance 2 Omni node, and write a tight motion prompt. The Omni variant is the one that genuinely respects the image input as a character reference rather than just a starting frame. Run the same character image through three or four parallel Seedance Omni nodes with different motion prompts and you get a multi-shot sequence that holds identity across cuts.
For talking-head video, the chain is Nano Banana 2 still into Kling Avatar with the dialogue audio wired in. Avatar handles the lip-sync; the Nano Banana 2 reference holds the face. This is the cleanest production pattern for a recurring spokesperson video where the same character delivers different scripts across episodes.
What this actually changes
Before models like Nano Banana 2, character consistency in AI workflows was a problem you solved with manual touch-up, with LoRA training, or with brute-force regeneration until the face matched. None of those scaled. Nano Banana 2 makes character consistency a property of the workflow rather than a property of post-production, which is the difference between "we make AI content sometimes" and "we run a character-driven channel."
The Martini-specific unlock is the canvas pattern: one character library pinned in the version tray, one shared reference set wired into every downstream image and video node, and Kontext fan-out for variant production. That whole loop runs without leaving the canvas, and any change to the canonical character library propagates to every downstream node automatically. That is what turns Nano Banana 2 from "good model" into the spine of a production system.
Workflow example
AI-influencer week-of-content production on Martini: drop a Nano Banana 2 node, generate the character library (front, three-quarter left, three-quarter right, profile, smiling close-up), and pin the five takes. Drop seven Nano Banana 2 nodes for the week's seven posts, wire all five reference images into each, and prompt each for a different scene (coffee shop, gym, home office, restaurant, park, car, kitchen). Drop a Flux Kontext node downstream of each base to swap the outfit per post. For the three video posts in the week, wire the chosen still into a Seedance 2 Omni node with a fifteen-second motion prompt. Export the seven images and three videos through the NLE node and you have a week of identity-consistent content from one character library.
Recommended models
Recommended features
Related how-to guides
Related reading
How to Build a Consistent AI Character Across Images and Video
Reference workflows that keep character identity stable across image and video generations on Martini.
GPT Image 2 Guide: Workflows, Strengths, and Where It Fits on Martini
How GPT Image 2 fits product, text, and reference image workflows on Martini's multi-model canvas.
Seedance 2 Handbook: Variants, Best Workflows, and How to Use It on Martini
Hands-on guide to Seedance 2 — variants, strengths, and the production workflows it fits on Martini's canvas.
Frequently asked questions
- How is Nano Banana 2 different from Flux Kontext?
- Nano Banana 2 is the generation model that holds character identity across many independent generations. Flux Kontext is the editor that modifies a chosen frame without breaking it. They sit next to each other on most production canvases — Nano Banana 2 generates, Kontext edits.
- How many reference images should I pass in one Nano Banana 2 prompt?
- Three to four is the sweet spot. Below that you lose the multi-angle benefit; above that you start averaging across too many signals and lose precision. Treat references like ingredients, not a buffet.
- Will Nano Banana 2 keep my character recognizable across video?
- Yes, when you chain it correctly. Pin a Nano Banana 2 still, wire it into a Seedance 2 Omni or Kling Avatar video node, and the identity carries through. The image-side library is the prerequisite — invest there first.
- Can I use Nano Banana 2 for product photography?
- You can, but GPT Image 2 is usually the better pick for product fidelity and in-frame text. Nano Banana 2 is the right node when the product needs to appear with a recurring character; otherwise reach for GPT Image 2.
- How do I swap outfits without breaking the face?
- Generate the base character with Nano Banana 2, then wire it into a Flux Kontext node downstream and use Kontext to change only the clothing. This preserves the face entirely. Re-prompting Nano Banana 2 for an outfit change tends to subtly drift the face.
- What if the character drifts after fifteen or twenty generations?
- Re-pass the canonical reference library on every generation; do not chain references through intermediate generations. Each new shot should reference the original library directly. This keeps drift bounded.
Ready to try it on the canvas?
Open Martini and fan your prompt across every frontier model in one workflow.