OpenAI

How to Create a 3D Scene From a Prompt with Sora 2

Use Sora 2 as the downstream camera-move engine for a Text-to-3D-Scene workflow on Martini — captured stills from the navigable Marble scene feed into Sora 2 video nodes for cinematographic shots that respect the scene's spatial structure. Sora 2 does not generate the scene itself; the scene comes from a text-conditioned Marble 3D node (or from an upstream Midjourney/FLUX.2 frame routed into Marble). Marble's output is a canvas-internal navigable preview, not a portable .obj, .fbx, .glb, or USD mesh — Sora 2 takes the captured stills as starting frames and produces motion clips that all share the same locked location.

Try Sora 2 Free

Step-by-Step Guide

Generate the navigable scene with Marble (text or image conditioned)

Sora 2 needs a starting frame from a real spatial source. Upstream: drop a 3D node configured for Marble. For text-only: write the location prompt directly ("foggy alley at dusk, neon signs, wet cobblestones"). For stronger reconstruction: generate a concept frame on Midjourney or FLUX.2 first and wire it into Marble as image conditioning. Marble runs ~5 minutes; output is a navigable canvas-internal preview.

Capture four matched angles from the Marble scene

Inside the navigable Marble preview, capture stills from the four-angle pattern: front view, three-quarter left, three-quarter right, back/over-shoulder. Each capture lands as an image node. Capture more than you need — re-running Marble produces a different scene, so screenshot first, iterate later. These are the Sora 2 starting frames.

Wire each captured still into its own Sora 2 video node

Sora 2 has deep understanding of 3D space, motion, and scene continuity — captured stills from a Marble scene are clean input format. Wire each captured angle into its own Sora 2 image-to-video node. The video model inherits the spatial structure from the still and produces motion that respects parallax, occlusion, and depth.

Write cinematographic motion prompts per shot

Use cinematographic verbs in the Sora 2 prompt: "slow camera push forward through the alley toward the ramen shop," "gentle orbit clockwise around the central fixture," "static camera, neon flickers in the foreground." Sora 2 maps these directly to its training distribution. Avoid generic verbs ("move closer," "spin") — they leave the model guessing.

Stitch shots with last-frame to first-frame chaining

For sequences longer than one Sora 2 clip, route the last frame of clip N into a frame-extraction tool node, then feed it as the starting frame of clip N+1. Combined with the locked Marble scene, this gives both spatial AND temporal continuity. The scene locks the location; frame chaining locks the motion thread.

Export the multi-shot sequence to NLE

Drop the Sora 2 outputs into Martini's sequence builder in story order. Each clip is 5-10s. Layer audio (ElevenLabs Eleven v3 + Minimax Music). Export as native sequence to Premiere, DaVinci Resolve, or Final Cut. The locked Marble scene made the multi-shot read as one place; the NLE export is the final delivery.

Prompt Examples

Establishing shot via slow push. The captured still locks the location; Sora 2 adds the camera move with parallax through the alley.

[Captured still from Marble: front view of foggy Tokyo alley] + Sora 2 prompt: slow camera push forward through the alley toward the ramen shop, neon signs flickering in mid-distance, rain particles in air, atmospheric, 8 seconds, 16:9.

Medium shot via orbit. Same scene; new angle. Orbit instruction maps to Sora 2's 3D-aware training.

[Captured still: three-quarter angle on the ramen shop entrance] + Sora 2 prompt: gentle orbit clockwise around the lantern outside the shop, soft lantern light, no character, 6 seconds, 16:9.

Detail close-up via static zoom. Static instruction tells Sora 2 not to add unwanted parallax.

[Captured still: tight close on the vending machine] + Sora 2 prompt: static camera, slow zoom in toward the vending machine, neon reflections shimmer on wet cobblestone in the foreground, 5 seconds, 16:9.

Reverse shot with parallax dolly. Sora 2's strongest move type — depth structure of the still drives parallax.

[Captured still: reverse over-shoulder, looking out of the alley] + Sora 2 prompt: dolly forward with parallax revealing depth of the alley behind, rain falls heavier in foreground, 7 seconds, 16:9.

Parameter Tips

Sora 2 is the camera-move engine, not the 3D scene generator. Marble (text or image conditioned) generates the scene upstream.

For best results, route a Midjourney or FLUX.2 frame into Marble as image conditioning. Text-only Marble runs are weaker than image-conditioned.

Capture stills BEFORE iterating Marble. Re-running produces a different scene; capture once, fan out to many Sora 2 nodes.

Use cinematographic verbs (dolly, orbit, push, pull, static, parallax) — they map to Sora 2's training distribution.

For sequences, use last-frame chaining: clip N's last frame = clip N+1's starting frame. Combined with the locked Marble scene, spatial and temporal continuity are preserved.

The Marble scene is canvas-internal — Sora 2 uses captured stills, not the navigable scene directly. Export from Martini = NLE-ready video, not a 3D file.

What to Expect

Sora 2 returns 5-10s 1080p video clips per node, with strong 3D spatial reasoning that respects the depth structure of captured stills from a Marble scene. Generation time 60-120s per clip. Cinematographic camera moves are Sora 2's strongest territory. The Marble scene remains canvas-internal (not exportable as .obj/.fbx/.glb/USD); Sora 2 outputs are exportable video deliverables. Chain via sequence builder for multi-shot delivery, NLE export for native Premiere/DaVinci sequences.

Use Sora 2 on Martini

Connect Sora 2 with other AI models on Martini's infinite canvas. No GPU required — start free.

Get Started Free

Related features

Docs

nodes/threed

Try Other Models for This Task

Midjourney

Midjourney v7

Generate the cinematic concept frame on Martini using Midjourney v7 — then feed that frame into the Marble 3D node to draft a navigable scene from a description that started as text. Marble's output is a viewable canvas-internal scene preview, not a clean .obj, .fbx, .glb, or USD mesh file. Directors with no concept frame use Midjourney to produce the painterly, mood-rich anchor first ("foggy alley at dusk, neon signs, wet cobblestones"), then route the locked frame into Marble for the spatial draft. Image-conditioned Marble runs hold geometry and lighting more reliably than text-only — Midjourney + Marble is the cleanest text-to-3D-scene pipeline on the canvas.

View guide

Black Forest Labs

FLUX.2

Generate the literal-staging concept frame on Martini using FLUX.2 — then feed that frame into the Marble 3D node to produce a navigable scene from a text description. Marble's output is a viewable canvas-internal scene preview, not a clean .obj, .fbx, .glb, or USD mesh file. Where Midjourney provides the painterly atmosphere, FLUX.2 is the prompt-fidelity pick: it renders the scene with literal foreground/mid-ground/background depth structure, which is exactly what Marble's image-conditioned mode needs to reconstruct geometry reliably.

View guide

How to Create a 3D Scene From a Prompt

OpenAI

How to Create a 3D Scene From a Prompt with Sora 2

Try Sora 2 Free

Step-by-Step Guide

Generate the navigable scene with Marble (text or image conditioned)

Capture four matched angles from the Marble scene

Wire each captured still into its own Sora 2 video node

Write cinematographic motion prompts per shot

Stitch shots with last-frame to first-frame chaining

Export the multi-shot sequence to NLE

Prompt Examples

Establishing shot via slow push. The captured still locks the location; Sora 2 adds the camera move with parallax through the alley.

Medium shot via orbit. Same scene; new angle. Orbit instruction maps to Sora 2's 3D-aware training.

[Captured still: three-quarter angle on the ramen shop entrance] + Sora 2 prompt: gentle orbit clockwise around the lantern outside the shop, soft lantern light, no character, 6 seconds, 16:9.

Detail close-up via static zoom. Static instruction tells Sora 2 not to add unwanted parallax.

Reverse shot with parallax dolly. Sora 2's strongest move type — depth structure of the still drives parallax.

[Captured still: reverse over-shoulder, looking out of the alley] + Sora 2 prompt: dolly forward with parallax revealing depth of the alley behind, rain falls heavier in foreground, 7 seconds, 16:9.

Parameter Tips

Sora 2 is the camera-move engine, not the 3D scene generator. Marble (text or image conditioned) generates the scene upstream.

For best results, route a Midjourney or FLUX.2 frame into Marble as image conditioning. Text-only Marble runs are weaker than image-conditioned.

Capture stills BEFORE iterating Marble. Re-running produces a different scene; capture once, fan out to many Sora 2 nodes.

Use cinematographic verbs (dolly, orbit, push, pull, static, parallax) — they map to Sora 2's training distribution.

For sequences, use last-frame chaining: clip N's last frame = clip N+1's starting frame. Combined with the locked Marble scene, spatial and temporal continuity are preserved.

The Marble scene is canvas-internal — Sora 2 uses captured stills, not the navigable scene directly. Export from Martini = NLE-ready video, not a 3D file.

What to Expect

Use Sora 2 on Martini

Connect Sora 2 with other AI models on Martini's infinite canvas. No GPU required — start free.

Get Started Free

Related features

Docs

nodes/threed

Try Other Models for This Task

Midjourney

Midjourney v7

View guide

Black Forest Labs

FLUX.2

View guide

How to Create a 3D Scene From a Prompt

How to Create a 3D Scene From a Prompt with Sora 2

Step-by-Step Guide

Generate the navigable scene with Marble (text or image conditioned)

Capture four matched angles from the Marble scene

Wire each captured still into its own Sora 2 video node

Write cinematographic motion prompts per shot

Stitch shots with last-frame to first-frame chaining

Export the multi-shot sequence to NLE

Prompt Examples

Parameter Tips

What to Expect

Use Sora 2 on Martini

Related features

Docs

Related reading

Try Other Models for This Task

Midjourney v7

FLUX.2

This website uses cookies

How to Create a 3D Scene From a Prompt with Sora 2

Step-by-Step Guide

Generate the navigable scene with Marble (text or image conditioned)

Capture four matched angles from the Marble scene

Wire each captured still into its own Sora 2 video node

Write cinematographic motion prompts per shot

Stitch shots with last-frame to first-frame chaining

Export the multi-shot sequence to NLE

Prompt Examples

Parameter Tips

What to Expect

Use Sora 2 on Martini

Related features

Docs

Related reading

Try Other Models for This Task

Midjourney v7

FLUX.2