3 Models Available

How to Create a 3D Scene From a Prompt

A director with no concept frame describes a location ("foggy alley at dusk, neon signs, wet cobblestones") and gets a navigable Marble or World Labs scene in minutes. On Martini's canvas, type the location prompt into a world node, optionally chain a Nano Banana 2 or Flux concept frame in front to strengthen image conditioning (World Labs is weaker on text alone), then capture stills and route them into Sora 2 video nodes. Treat output as a spatial mood board, not a finished mesh export. Pick a model below to walk through the text-to-3D pre-viz workflow.

Try Free

Choose a Model to Get Started

Midjourney

Midjourney v7

Generate the cinematic concept frame on Martini using Midjourney v7 — then feed that frame into the Marble 3D node to draft a navigable scene from a description that started as text. Marble's output is a viewable canvas-internal scene preview, not a clean .obj, .fbx, .glb, or USD mesh file. Directors with no concept frame use Midjourney to produce the painterly, mood-rich anchor first ("foggy alley at dusk, neon signs, wet cobblestones"), then route the locked frame into Marble for the spatial draft. Image-conditioned Marble runs hold geometry and lighting more reliably than text-only — Midjourney + Marble is the cleanest text-to-3D-scene pipeline on the canvas.

6 steps + 4 promptsView guide

Black Forest Labs

FLUX.2

Generate the literal-staging concept frame on Martini using FLUX.2 — then feed that frame into the Marble 3D node to produce a navigable scene from a text description. Marble's output is a viewable canvas-internal scene preview, not a clean .obj, .fbx, .glb, or USD mesh file. Where Midjourney provides the painterly atmosphere, FLUX.2 is the prompt-fidelity pick: it renders the scene with literal foreground/mid-ground/background depth structure, which is exactly what Marble's image-conditioned mode needs to reconstruct geometry reliably.

6 steps + 4 promptsView guide

OpenAI

Sora 2

Use Sora 2 as the downstream camera-move engine for a Text-to-3D-Scene workflow on Martini — captured stills from the navigable Marble scene feed into Sora 2 video nodes for cinematographic shots that respect the scene's spatial structure. Sora 2 does not generate the scene itself; the scene comes from a text-conditioned Marble 3D node (or from an upstream Midjourney/FLUX.2 frame routed into Marble). Marble's output is a canvas-internal navigable preview, not a portable .obj, .fbx, .glb, or USD mesh — Sora 2 takes the captured stills as starting frames and produces motion clips that all share the same locked location.

6 steps + 4 promptsView guide

More How-To Guides

This website uses cookies

We use cookies to keep Martini secure, remember your preferences, and, if you allow it, measure product performance. Read more

Strictly necessary

Required for authentication, security, payments, and core product flows.

Functionality

Remembers product preferences such as theme, language, and your most recent workspace.

Performance

Helps us understand product usage and site performance with PostHog, Vercel Analytics, Speed Insights, and Ahrefs.

Targeting

Allows marketing and advertising tags we may run through Google Tag Manager.

How to Create a 3D Scene From a Prompt

Choose a Model to Get Started

Midjourney v7

FLUX.2

Sora 2

More How-To Guides

How to Generate AI Art & Illustrations

How to Create AI Product Photography

How to Design Social Media Graphics with AI

How to Edit & Transform Photos with AI

This website uses cookies

How to Create a 3D Scene From a Prompt

Choose a Model to Get Started

Midjourney v7

FLUX.2

Sora 2

More How-To Guides

How to Generate AI Art & Illustrations

How to Create AI Product Photography

How to Design Social Media Graphics with AI

How to Edit & Transform Photos with AI