Generate the canonical reference image for an Image-to-3D-World workflow on Martini using Nano Banana 2 — the cleaner the source, the navigable the resulting scene. The output of the world node is a navigable canvas-internal scene preview you can orbit and screenshot, not a portable .obj, .fbx, .glb, or USD mesh file. Concept artists use this to lock a location once on Nano Banana 2, pass the locked still into the World Labs or Image-to-3D-World node, and capture matched-angle stills that feed downstream Sora 2 or Kling 3 nodes for shots that all share the same world.
Drop a Nano Banana 2 node onto the canvas and generate the source scene — a clean, well-lit, well-composed interior or exterior. Foreground composition matters more than far-field detail; the world node will reconstruct the unseen parts. "An empty mid-century living room with afternoon light through tall windows, polished wooden floor, single armchair near the fireplace, neutral palette, photorealistic, 4K resolution."
Nano Banana 2 outputs at 1K, 2K, or 4K. For the Image-to-3D-World handoff, generate at 4K — the world node has more depth and texture cues to work with on a higher-resolution input. Lower resolutions still work but the navigable scene will look softer at far-field detail. Slot 1 is the canonical scene reference; supporting references in slots 2-5 (lighting moodboard, palette swatch) bias the output without changing composition.
Drop a World Labs or Image-to-3D-World node onto the canvas and wire the Nano Banana 2 output as the input. Generation takes around 5 minutes for a full navigable world. The output is a canvas-internal scene preview — you can orbit, pan, and screenshot inside the canvas, but cannot export the world as a portable mesh or splat file from Martini.
Inside the navigable preview, pan and orbit to find usable angles. Capture stills from the four-angle pattern: front view, three-quarter left, three-quarter right, back/over-shoulder. Each capture lands as an image node on the canvas. These are your shot starting frames for downstream video — capture more than you need; regenerating the world produces a different scene.
Wire each captured still into its own Sora 2 or Kling 3 video node — image-to-video with the captured angle as the starting frame. Add cinematographic motion prompts ("slow camera push forward," "gentle orbit," "static camera"). Each video clip inherits the world; only the camera move changes. The locked location is what makes a multi-shot sequence read as one place instead of five different rooms the AI hallucinated.
Once a world is locked, save the entire canvas — Nano Banana 2 reference, world node, captured stills, and the multi-shot Sora/Kling fan-out. Next project, swap the Nano Banana 2 prompt and re-run; the workflow is reusable for every new location. Treat captured stills as the deliverable; the world preview itself is an in-canvas reference, not an exportable asset.
Canonical scene reference for the world node. Clean composition + clear depth cues + 4K = strongest world reconstruction.
An empty mid-century living room with afternoon light through tall windows, polished wooden floor, single armchair near the stone fireplace, neutral warm palette, photorealistic, sharp focus throughout, 4K resolution.
Atmospheric exterior with strong depth cues. The fog gradient gives the world node clear depth signals to reconstruct.
A foggy alley at dusk in a Tokyo backstreet, neon signs glowing in mid-distance, wet cobblestones reflecting light, single figure walking away from camera, rain particles in air, photorealistic, atmospheric depth, 4K resolution.
Multi-reference workflow. Lighting moodboard in slot 2 biases the output without changing composition.
[Nano Banana 2 reference slot 1: lighting moodboard - golden hour] + A coastal cliff at golden hour, lighthouse small in the upper-right third, mist rising from the rocks below, path winding inland, warm gold palette, photorealistic, 4K resolution.
Interior with strong vertical depth. The two-story composition gives the world node depth signals on multiple axes.
A grand library interior with two stories of bookshelves, ladders connecting upper level, soft warm lamp light, central reading table, leather chairs, deep wood tones, photorealistic, atmospheric depth, 4K resolution.
Generate the source at 4K. Lower resolutions still work, but the navigable world will look softer at far-field detail.
Foreground composition matters more than far-field detail. The world node reconstructs the unseen parts; clean foreground = strong reconstruction.
Use clean, well-lit, single-subject sources. Cluttered references produce weaker, less coherent worlds.
Capture stills BEFORE iterating. Re-running the world node produces a different scene; screenshot first, iterate later.
For shot-to-shot consistency in downstream video, capture all angles you need from one world generation. Don't plan to come back later.
The world output is canvas-internal — you cannot export it as .obj, .fbx, .glb, or USD from Martini. Captured stills are the deliverable.
The Nano Banana 2 source still returns at 1K-4K resolution. The downstream world node returns a navigable canvas-internal scene preview that can be orbited and screenshotted but cannot be exported as a portable mesh file. Captured stills (the actual deliverable) land as image nodes on the canvas, ready to feed Sora 2 / Kling 3 / Runway Gen4 video nodes for matched-angle shots. World generation runs ~5 minutes; foreground holds at high fidelity, far-field is suggestive only. For exportable mesh assets, route through Martini's Tripo3D or Hunyuan3D image-to-3d nodes instead — those produce GLB/FBX, the world node does not.
Connect Nano Banana 2 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeBlack Forest Labs
Generate the source reference image for an Image-to-3D-World workflow on Martini using FLUX.2 — its prompt-fidelity rendering produces clean, literal scene compositions that the world node can reconstruct cleanly. The world node's output is a navigable canvas-internal scene preview you can orbit and screenshot, not a portable .obj, .fbx, .glb, or USD mesh file. Concept artists use FLUX.2 when they need an alt-look reference (different palette, different lighting, different style) than what Nano Banana 2 produces — same workflow, different aesthetic.
View guideOpenAI
Use Sora 2 as the downstream camera-move engine for an Image-to-3D-World workflow on Martini — the captured stills from the navigable world feed directly into Sora 2 video nodes for matched-angle motion shots. The world node's output is a canvas-internal navigable scene preview, not a portable .obj, .fbx, .glb, or USD mesh. Sora 2 takes the captured stills as starting frames and produces video clips that all share the same locked location, with cinematographic camera moves that respect the spatial structure of the source world.
View guide