Black Forest Labs
Generate the literal-staging concept frame on Martini using FLUX.2 — then feed that frame into the Marble 3D node to produce a navigable scene from a text description. Marble's output is a viewable canvas-internal scene preview, not a clean .obj, .fbx, .glb, or USD mesh file. Where Midjourney provides the painterly atmosphere, FLUX.2 is the prompt-fidelity pick: it renders the scene with literal foreground/mid-ground/background depth structure, which is exactly what Marble's image-conditioned mode needs to reconstruct geometry reliably.
FLUX.2 reads literal compositional prompts. Write the scene with explicit foreground/mid-ground/background structure: "Wide shot of a Tokyo backstreet at dusk, foreground left a vending machine glowing pink, mid-ground center wet cobblestone alley with neon reflections, background right a small ramen shop with warm interior light, atmospheric depth, photorealistic, 4K, 16:9." The depth-axis structure gives Marble the strongest signals to reconstruct geometry.
FLUX.2 Pro's prompt adherence is meaningfully tighter on detailed compositions. For any frame destined for Marble image-conditioning, pin Pro tier — the literal staging gives Marble more accurate depth signals. Base tier still works for exploratory drafts; switch to Pro before the final source generation.
Save the realism language in a Text node: "Photorealistic cinematic photography, atmospheric depth, soft natural light, sharp foreground with bokeh on the far-field, no text overlays." Wire it into the FLUX.2 source node as a brand-style prefix. This keeps the look consistent if you generate multiple source candidates and ensures the foreground stays sharp — Marble reconstructs the foreground best.
Drop a 3D node configured for Marble onto the canvas. Wire the FLUX.2 output as the image-conditioning input. Image-conditioned Marble runs are stronger than text-only — geometry and lighting hold together more reliably with a literal-staged source. Generation runs around 5 minutes for the full navigable scene; the output is canvas-internal, not exportable mesh.
Inside the Marble preview, capture stills from four angles: front, three-quarter left, three-quarter right, back/over-shoulder. Each capture lands as an image node. Re-running Marble from the same FLUX.2 source produces a different scene (it's probabilistic), so capture all the angles you need from one run before iterating.
Wire each captured still into its own Sora 2, Kling 3, or Runway Gen4 video node — image-to-video with the captured angle as the starting frame. Add cinematographic motion prompts. The locked Marble scene anchors the location; each video clip inherits it. Drop outputs into the sequence builder, NLE-export to Premiere/DaVinci as a native scene-locked sequence.
Source frame for Marble. Foreground/mid-ground/background structure gives Marble clear depth signals.
[Realism prefix] + Wide shot of a Tokyo backstreet at dusk, foreground left a vending machine glowing pink, mid-ground center wet cobblestone alley with neon reflections, background right a small ramen shop with warm interior light, atmospheric depth, photorealistic, 4K, 16:9, FLUX.2 Pro tier.
Interior with strong vertical depth. Two-story composition gives Marble depth signals on multiple axes.
[Realism prefix] + Grand library interior, foreground a central reading table with leather chairs, mid-ground rows of bookshelves on both sides, background a tall window with afternoon light, ladders connecting upper level, deep wood tones, photorealistic, 4K, 16:9.
Atmospheric exterior with explicit depth-axis staging. Mist gradient supports Marble's depth reconstruction.
[Realism prefix] + Coastal cliff at golden hour, foreground a winding path with tall grass, mid-ground a lighthouse small in the upper-right, background mist over the sea horizon, warm gold palette, photorealistic, 4K, 16:9.
Clean interior reference. Foreground/mid-ground/background structure + clean composition = strongest Marble reconstruction.
[Realism prefix] + Empty mid-century living room, foreground polished wooden floor, mid-ground a single armchair near a stone fireplace, background tall windows with afternoon light, neutral warm palette, photorealistic, 4K, 16:9.
Use FLUX.2 Pro for source frames destined for Marble image-conditioning. Prompt-literal compositions give Marble tighter depth signals than text-only mode.
Write explicit foreground/mid-ground/background staging. The depth-axis structure is what Marble reconstruction needs most.
Generate at 4K, 16:9. Lower resolutions still work but Marble's far-field will look softer; vertical aspects produce weaker reconstruction.
Avoid cluttered scenes. Single hero composition with clean depth cues outperforms busy multi-element scenes for Marble.
Capture all needed angles from ONE Marble run before iterating. Re-running produces a different scene from the same source frame.
The Marble output is canvas-internal — you cannot export it as .obj, .fbx, .glb, or USD from Martini. Captured stills are the deliverable.
FLUX.2 returns 1024-2048 wide source references with literal compositional fidelity. The Marble 3D node downstream returns a navigable canvas-internal scene preview (not exportable as .obj/.fbx/.glb/USD). Generation time: FLUX.2 source 30-60s on Pro tier, Marble ~5 minutes for full scene. Captured stills land on the canvas ready to feed Sora 2 / Kling 3 / Runway Gen4 video nodes. For exportable mesh assets, route through Martini's Tripo3D or Hunyuan3D image-to-3d nodes — those produce GLB/FBX; the Marble node does not.
Connect FLUX.2 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeMidjourney
Generate the cinematic concept frame on Martini using Midjourney v7 — then feed that frame into the Marble 3D node to draft a navigable scene from a description that started as text. Marble's output is a viewable canvas-internal scene preview, not a clean .obj, .fbx, .glb, or USD mesh file. Directors with no concept frame use Midjourney to produce the painterly, mood-rich anchor first ("foggy alley at dusk, neon signs, wet cobblestones"), then route the locked frame into Marble for the spatial draft. Image-conditioned Marble runs hold geometry and lighting more reliably than text-only — Midjourney + Marble is the cleanest text-to-3D-scene pipeline on the canvas.
View guideOpenAI
Use Sora 2 as the downstream camera-move engine for a Text-to-3D-Scene workflow on Martini — captured stills from the navigable Marble scene feed into Sora 2 video nodes for cinematographic shots that respect the scene's spatial structure. Sora 2 does not generate the scene itself; the scene comes from a text-conditioned Marble 3D node (or from an upstream Midjourney/FLUX.2 frame routed into Marble). Marble's output is a canvas-internal navigable preview, not a portable .obj, .fbx, .glb, or USD mesh — Sora 2 takes the captured stills as starting frames and produces motion clips that all share the same locked location.
View guide