OpenAI
Lock down a location once with a navigable 3D world on Martini, capture matched-angle stills, and feed each as a starting frame into Sora 2 for a five-shot sequence that reads as one set instead of five different rooms. Sora 2 has deep understanding of 3D space, motion, and scene continuity — captured stills from a navigable world translate to consistent camera moves with parallax, occlusion, and depth honored. The 3D world is canvas-internal reference, not an exportable .obj/.fbx/.glb/USD file; Sora 2 reads the captured stills and ships exportable video clips that all share the same spatial anchor.
Source the world from Nano Banana 2 or FLUX.2 → World Labs / Image-to-3D-World node, OR from a text prompt routed through the Marble 3D node. Generation runs ~5 minutes. The output is a navigable canvas-internal scene preview — orbit, pan, screenshot. Cannot be exported as .obj/.fbx/.glb/USD from Martini. The world is the spine of the multi-shot sequence; Sora 2 derives every shot from it.
Inside the navigable world, capture five matched-angle stills planned around your sequence: wide establishing (frame 1), medium (frame 2), tight close-up (frame 3), reverse over-shoulder (frame 4), wide closing tag (frame 5). Each lands as an image node. Capture more than you need — re-running the world produces a different scene, so screenshot first.
On the canvas, place five Sora 2 image-to-video nodes in story order — wide establishing on the left, closing tag on the right. Wire each captured still into its corresponding Sora 2 node as the starting frame. Each clip inherits the world from the still; only the camera move and per-shot prompt change.
Sora 2 maps cinematographic verbs directly to its training distribution. Shot 1 (wide establishing): "slow camera push forward, atmospheric." Shot 2 (medium): "gentle orbit clockwise around the central subject." Shot 3 (close-up): "static camera, subject moves slightly within frame." Shot 4 (reverse): "dolly forward with parallax." Shot 5 (closing tag): "slow camera pull back revealing the full space." Generic verbs ("move closer") leave Sora 2 guessing.
For sequences beyond five shots, use last-frame chaining: route Sora 2 clip N's last frame into a frame-extraction tool node, feed it as the starting frame of clip N+1. Combined with the locked 3D world reference, this gives both spatial AND temporal continuity. The world locks the location; frame chaining locks the motion thread across cuts.
Drop the five Sora 2 outputs into Martini's sequence builder in story order. Total runtime 25-50s for a five-shot sequence (5-10s per clip). Layer audio (ElevenLabs Eleven v3 dialogue + Minimax Music ambient bed). Export as native sequence to Premiere, DaVinci Resolve, or Final Cut. The locked 3D world is what made the multi-shot read as one set; the NLE export is the final delivery.
Shot 1 establishing. The wide composition sets the spatial expectation; subsequent shots inherit this anchor.
[Captured still: wide establishing of mid-century living room from front] + Sora 2 prompt: slow camera push forward through the room toward the fireplace, soft afternoon light from the windows on camera left, no character, atmospheric, 8 seconds, 16:9.
Shot 2 medium via orbit. Sora 2's 3D-aware training handles the orbit motion respecting parallax with the established space.
[Captured still: three-quarter left of same living room] + Sora 2 prompt: gentle orbit clockwise around the central armchair, lighting unchanged, depth of field shallow on the chair, 6 seconds, 16:9.
Shot 3 close-up via static zoom. Static instruction stops Sora 2 from adding unwanted parallax that would break continuity with the wider shots.
[Captured still: tight close on the fireplace mantelpiece] + Sora 2 prompt: static camera, slow zoom in toward the mantelpiece, embers glow softly, no other motion, 5 seconds, 16:9.
Shot 4 reverse with parallax dolly. The depth structure of the captured still drives the parallax effect.
[Captured still: reverse over-shoulder from the fireplace looking back at the windows] + Sora 2 prompt: dolly forward with parallax revealing dust motes in afternoon light, 7 seconds, 16:9.
Capture stills from the world BEFORE running Sora 2. Re-running the world produces a different scene; capture once, fan out to many Sora 2 nodes.
Use cinematographic verbs (dolly, orbit, push, pull, static, parallax) — they map to Sora 2's training distribution. Generic verbs produce inconsistent results.
For multi-shot sequences, plan the camera-move arc before generating: wide → medium → close → reverse → tag is the workhorse pattern.
Sora 2 image-to-video clips are 5-10s. For longer takes, chain shorter clips with last-frame stitching rather than asking for one impossibly long clip.
Far-field background detail can drift between clips even with the same starting frame. Frame the action so the hero is in foreground; far-field is suggestive only.
The 3D world is canvas-internal — Sora 2 uses captured stills, not the navigable world directly. Export from Martini = NLE-ready video sequence, not a 3D file.
Sora 2 returns 5-10s 1080p video clips per node, with strong 3D spatial reasoning that respects the depth structure of captured stills from a navigable world. Generation time 60-120s per clip. The 5-shot multi-shot sequence runs 5-10 minutes end-to-end on the canvas. The 3D world remains canvas-internal (not exportable as .obj/.fbx/.glb/USD); Sora 2 outputs are exportable video deliverables. Chain via sequence builder for multi-shot delivery, NLE export for native Premiere/DaVinci sequences.
Connect Sora 2 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeKling
Drive Kling 3.0 multi-shot sequences on Martini using captured stills from a navigable 3D world — Kling supports 2-6 scenes per video with explicit per-scene descriptions, which makes it the strongest single-pass multi-shot pick when paired with locked location backplates. The 3D world is canvas-internal reference, not an exportable .obj/.fbx/.glb/USD file. Kling reads the captured stills as starting frames and renders multi-shot videos that all share the locked location, with 3D Spacetime Joint Attention handling parallax and occlusion across cuts.
View guideRunway
Use Runway Gen4 Turbo on Martini for fast iteration on 3D-world-derived shots — captured stills from the navigable world feed into Gen4 image-to-video nodes that ship 5- or 10-second clips in under a minute. The 3D world is canvas-internal reference, not an exportable .obj/.fbx/.glb/USD file. Gen4 Turbo is the speed pick when the brief lands at 4 PM and a sequence ships at 9: per-clip generation completes faster than Sora 2 or Kling 3, which makes it the right tool for fast multi-shot iteration before committing the bigger render budget on the hero clips.
View guide