Midjourney
Generate the cinematic concept frame on Martini using Midjourney v7 — then feed that frame into the Marble 3D node to draft a navigable scene from a description that started as text. Marble's output is a viewable canvas-internal scene preview, not a clean .obj, .fbx, .glb, or USD mesh file. Directors with no concept frame use Midjourney to produce the painterly, mood-rich anchor first ("foggy alley at dusk, neon signs, wet cobblestones"), then route the locked frame into Marble for the spatial draft. Image-conditioned Marble runs hold geometry and lighting more reliably than text-only — Midjourney + Marble is the cleanest text-to-3D-scene pipeline on the canvas.
Drop a single cinematic anchor reference onto the canvas — a film still, a moodboard photo, or a generated baseline frame. Wire it into Midjourney v7 as the style reference (sref). This is what holds the painterly cinematography across alternate concept attempts; without it, generating multiple candidates produces inconsistent palettes and lighting voices.
Midjourney v7 reads cinematic language. Skip "fantasy alley scene" and write: "Foggy alley at dusk, neon signs glowing in mid-distance, wet cobblestones reflecting sign light, rain particles in air, anamorphic compression, atmospheric depth, 16:9 cinematic." This gives Midjourney the painterly atmosphere that Marble's text-only mode struggles to produce.
Midjourney v7 returns four variants per generation. Place all four on the canvas as image nodes laid out left-to-right. Compare palette, lighting direction, depth structure, and atmospheric voice. Pick the strongest candidate — the one that best matches the location brief — and route it into the Marble 3D node downstream. The rejected candidates archive on the canvas for reference.
Drop a 3D node configured for Marble onto the canvas. Wire the chosen Midjourney frame as the image-conditioning input. Image-conditioned Marble runs are stronger than text-only — geometry and lighting hold together more reliably. Generation runs around 5 minutes for the full navigable scene; the output is a canvas-internal scene preview, not a portable mesh.
Inside the Marble preview, pan and orbit to find usable angles. Capture stills from four angles: front, three-quarter left, three-quarter right, back/over-shoulder. Each capture lands as an image node. These are starting frames for downstream video — capture more than you need; re-running Marble produces a different scene from the same source frame.
Wire each captured still into its own Sora 2 or Kling 3 video node — image-to-video with the captured angle as the starting frame. Add cinematographic motion prompts. Each video clip inherits the Marble scene; only the camera move changes. The locked location is what makes a multi-shot sequence read as one place across cuts. NLE-export the sequence for native Premiere/DaVinci delivery.
Cinematic concept frame for Marble. The painterly atmosphere is what Marble's image-conditioned mode reconstructs into a navigable scene.
Foggy alley at dusk, neon signs glowing in mid-distance, wet cobblestones reflecting sign light, rain particles in air, anamorphic compression, atmospheric depth, 16:9 cinematic --sref [reference URL] --sw 200 --stylize 250
Interior with strong vertical depth. Midjourney's cinematic light becomes the lighting bake for the resulting Marble scene.
Grand library interior with two stories of bookshelves, ladders connecting upper level, soft warm lamp light, central reading table, leather chairs, deep wood tones, atmospheric depth, 16:9 --sref [same reference URL] --sw 200
Atmospheric exterior. Lower --sw allows palette to evolve while keeping cinematography family intact.
Coastal cliff at golden hour, lighthouse small in upper-right third, mist rising from the rocks below, path winding inland, warm gold palette, anamorphic, 16:9 --sref [same reference URL] --sw 150
Clean interior. Midjourney provides the painterly anchor; Marble reconstructs the spatial draft from it.
Empty mid-century living room with afternoon light through tall windows, polished wooden floor, single armchair near the stone fireplace, neutral warm palette, atmospheric depth, 16:9 --sref [same reference URL] --sw 200
Use --sref + --sw 150-250 across alternate concept candidates. The style reference holds cinematography while you explore composition.
Pin --stylize at 200-300 for cinematic concept frames. Higher than 400 risks over-painterly results that confuse Marble's reconstruction.
Generate at 16:9 (Marble's preferred aspect for cinematic scenes). 21:9 ultra-wide also works; vertical aspects produce weaker world reconstruction.
Avoid --weird on concept frames destined for Marble. Painterly weirdness breaks the depth signals Marble needs to reconstruct geometry.
For best Marble reconstruction, the Midjourney frame should have clear foreground/mid-ground/background depth structure. Atmospheric haze + parallax cues = strongest reconstruction.
The Marble output is canvas-internal — you cannot export it as .obj, .fbx, .glb, or USD from Martini. Captured stills are the deliverable.
Midjourney v7 returns four 1024-2048 wide cinematic candidates per generation. The Marble 3D node downstream returns a navigable canvas-internal scene preview (not exportable as .obj/.fbx/.glb/USD). Generation time: Midjourney 30-60s per batch, Marble ~5 minutes for full world. Captured stills land on the canvas ready to feed Sora 2 / Kling 3 / Runway Gen4 video nodes. For exportable mesh assets, route through Martini's Tripo3D or Hunyuan3D image-to-3d nodes — the Marble node does not export portable mesh.
Connect Midjourney v7 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeBlack Forest Labs
Generate the literal-staging concept frame on Martini using FLUX.2 — then feed that frame into the Marble 3D node to produce a navigable scene from a text description. Marble's output is a viewable canvas-internal scene preview, not a clean .obj, .fbx, .glb, or USD mesh file. Where Midjourney provides the painterly atmosphere, FLUX.2 is the prompt-fidelity pick: it renders the scene with literal foreground/mid-ground/background depth structure, which is exactly what Marble's image-conditioned mode needs to reconstruct geometry reliably.
View guideOpenAI
Use Sora 2 as the downstream camera-move engine for a Text-to-3D-Scene workflow on Martini — captured stills from the navigable Marble scene feed into Sora 2 video nodes for cinematographic shots that respect the scene's spatial structure. Sora 2 does not generate the scene itself; the scene comes from a text-conditioned Marble 3D node (or from an upstream Midjourney/FLUX.2 frame routed into Marble). Marble's output is a canvas-internal navigable preview, not a portable .obj, .fbx, .glb, or USD mesh — Sora 2 takes the captured stills as starting frames and produces motion clips that all share the same locked location.
View guide