OpenAI
Use Sora 2 as the downstream camera-move engine for an Image-to-3D-World workflow on Martini — the captured stills from the navigable world feed directly into Sora 2 video nodes for matched-angle motion shots. The world node's output is a canvas-internal navigable scene preview, not a portable .obj, .fbx, .glb, or USD mesh. Sora 2 takes the captured stills as starting frames and produces video clips that all share the same locked location, with cinematographic camera moves that respect the spatial structure of the source world.
Sora 2 is the camera-move engine, not the world generator. Upstream: drop a Nano Banana 2 or FLUX.2 image node, generate the source scene at 4K, then wire into a World Labs or Image-to-3D-World node. The navigable scene preview lands inside the canvas (~5 minute generation) — orbit and capture stills before moving to Sora 2.
Inside the navigable preview, capture stills from the four-angle pattern: front view, three-quarter left, three-quarter right, back/over-shoulder. Each capture lands as an image node. These are your Sora 2 starting frames — capture more than you need. Re-running the world node produces a different scene, so screenshot first, iterate later.
Sora 2 has deep understanding of 3D space, motion, and scene continuity — captured stills from a navigable world are the cleanest input format. Wire each captured angle into its own Sora 2 image-to-video node. The video model inherits the spatial structure from the still and produces motion that respects parallax, occlusion, and depth.
Use cinematographic verbs in the Sora 2 prompt: "slow camera push forward," "gentle orbit clockwise," "static camera, character moves out of frame right," "dolly forward with parallax." Sora 2 maps these directly to its training distribution. Avoid generic verbs ("move closer," "spin") — they leave the model guessing and produce inconsistent shot-to-shot behavior.
For sequences longer than one Sora 2 clip, route the last frame of clip N into a frame-extraction tool node, then feed it as the starting frame of clip N+1. Combined with the locked 3D world reference, this gives both spatial AND temporal continuity. The world locks the location; frame chaining locks the motion thread.
Drop the Sora 2 outputs into Martini's sequence builder in story order. Each clip is 5-10s. Cut markers preserved. Layer audio (ElevenLabs Eleven v3 + Minimax Music). Export as native sequence to Premiere, DaVinci Resolve, or Final Cut. The locked 3D world is what made the multi-shot read as one place; the NLE export is the final delivery.
Establishing shot via slow push. The captured still locks the location; Sora 2 adds the camera move.
[Captured still from world: front view of empty mid-century living room] + Sora 2 prompt: slow camera push forward through the living room toward the fireplace, soft afternoon light from the windows on camera left, no character, 8 seconds, 16:9.
Medium shot via orbit. Same world; new angle. The orbit instruction maps to Sora 2's 3D-aware training.
[Captured still: three-quarter left angle of same living room] + Sora 2 prompt: gentle orbit clockwise around the center of the room, lighting unchanged, depth of field shallow on the armchair, 6 seconds, 16:9.
Detail close-up via static zoom. The static instruction tells Sora 2 not to add unwanted parallax.
[Captured still: tight angle on the fireplace] + Sora 2 prompt: static camera, slow zoom in toward the fireplace mantelpiece, no other motion in frame, atmospheric, 5 seconds, 16:9.
Reverse shot with parallax dolly. Sora 2's strongest move type — the depth structure of the still drives the parallax effect.
[Captured still: reverse over-shoulder angle, looking back toward the windows] + Sora 2 prompt: dolly forward with parallax, soft afternoon light revealing dust motes in the air, 7 seconds, 16:9.
Sora 2 is the camera-move engine, not the world generator. The world comes from a Nano Banana 2 / FLUX.2 source + World Labs / Image-to-3D-World node upstream.
Capture stills from the world BEFORE running Sora 2. Re-running the world produces a different scene; capture once, fan out to many Sora 2 nodes.
Use cinematographic verbs (dolly, orbit, push, pull, static, parallax) — they map to Sora 2's training distribution. Generic verbs produce inconsistent results.
For sequences, use last-frame chaining: Sora 2 clip N's last frame = clip N+1's starting frame. Combined with the locked world, both spatial and temporal continuity are preserved.
Sora 2 image-to-video clips are 5-10s. For longer takes, chain shorter clips rather than asking for a single impossibly-long clip.
The world node output is canvas-internal — Sora 2 uses captured stills, not the navigable world directly. Export from Martini = NLE-ready video sequence, not a 3D file.
Sora 2 returns 5-10s 1080p video clips per node, with strong 3D spatial reasoning that respects the depth structure of captured stills from a navigable world. Generation time 60-120s per clip. Cinematographic camera moves (dolly, orbit, push, parallax) are Sora 2's strongest territory. Output drops onto the canvas; chain via sequence builder for multi-shot delivery, NLE export for native Premiere/DaVinci sequences. The world remains canvas-internal; Sora 2 outputs are exportable video deliverables.
Connect Sora 2 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeGenerate the canonical reference image for an Image-to-3D-World workflow on Martini using Nano Banana 2 — the cleaner the source, the navigable the resulting scene. The output of the world node is a navigable canvas-internal scene preview you can orbit and screenshot, not a portable .obj, .fbx, .glb, or USD mesh file. Concept artists use this to lock a location once on Nano Banana 2, pass the locked still into the World Labs or Image-to-3D-World node, and capture matched-angle stills that feed downstream Sora 2 or Kling 3 nodes for shots that all share the same world.
View guideBlack Forest Labs
Generate the source reference image for an Image-to-3D-World workflow on Martini using FLUX.2 — its prompt-fidelity rendering produces clean, literal scene compositions that the world node can reconstruct cleanly. The world node's output is a navigable canvas-internal scene preview you can orbit and screenshot, not a portable .obj, .fbx, .glb, or USD mesh file. Concept artists use FLUX.2 when they need an alt-look reference (different palette, different lighting, different style) than what Nano Banana 2 produces — same workflow, different aesthetic.
View guide