Image to 3D World on Martini
Image to 3D World turns a single visual reference — a product photo, location frame, character backdrop, or concept still — into a navigable 3D world you can re-use across an entire project. On Martini it lives as a canvas node that other image and video nodes can pull matched angles from, so every downstream shot stays anchored to the same space.
What it creates
Image to 3D World takes a single reference image and reconstructs the implied space around it as a navigable scene. Inside Martini you get an interactive viewport on the node — you can orbit, dolly, and reframe the camera to pull matched stills from many angles instead of generating each viewpoint separately. The geometry and lighting are inferred from the reference, so a clean, well-lit input gives the strongest result.
The model is meant to make a single visual reference reusable. Once a location is locked, every downstream node — character composites, product shots, video starting frames — can read from the same world and stay in the same space. That is the difference between a one-off image and a project-wide location library.
Treat the output as a navigable reference scene rather than a finished export-ready 3D asset. It is excellent for pre-viz, location reuse, social filter backdrops, and reference frames; it is not a substitute for a finished mesh, splat, or render destined for a game engine or DCC pipeline.
Inputs and outputs
The input is one reference image — uploaded, generated by an upstream Nano Banana 2 or Flux node, or hand-painted. Cleaner, less cluttered references with clear depth cues produce the most coherent worlds. The output is a navigable scene rendered in an in-canvas preview viewport, with the ability to capture stills from any camera angle. Captured frames behave like normal image outputs and can be wired into image, video, or storyboard nodes elsewhere on the canvas. The scene itself is referenceable inside Martini rather than exported as a 3D file.
Best workflows
- •Product environment — drop a product reference shot in, generate the implied environment, and pull matched angles for a full marketing set without a real photo shoot.
- •Location library — turn key location reference frames into reusable navigable worlds your whole project can pull from, so every shot stays in the same space.
- •Social filter backgrounds — reuse a single backdrop reference as a navigable space for character composites, AR-style posts, and short-form video sequences.
- •Set reference for actor compositing — capture matched-angle stills from the world and use them as backplates when compositing AI characters or live-action subjects.
- •Storyboard locations — keep every panel in the same world rather than letting an AI image model hallucinate a slightly different room each frame.
- •Sora 2 starting frames — feed angle captures into Sora 2 as image-to-video inputs so cinematic camera moves stay locked to one consistent location.
How to use it in Martini
- 1
Add an Image to 3D World node to the canvas and connect a reference image to its input port. The reference can come from a Nano Banana 2 or Flux generation, an upload, or any other image-producing node — clean, well-lit references give the strongest reconstructions.
- 2
Run the node and wait for the navigable scene to appear in the preview viewport. Use the in-canvas controls to orbit, dolly, and pan until you find the angles you actually want to use downstream. Foreground geometry is the strongest area; pushing the camera far past the framed region surfaces reconstruction artifacts.
- 3
Capture stills from each angle you need. Each capture becomes a regular image output on the node, ready to wire into anything else on the canvas — Nano Banana 2 for hero stills, Flux for stylized variants, Sora 2 for image-to-video, or storyboard nodes for paneling.
- 4
Pin the world as the project location anchor. Re-run downstream nodes against captured angles instead of regenerating the location from scratch every time, so character placement, lighting, and backdrop stay consistent across the project.
- 5
For larger projects, branch the canvas: one branch captures wide establishing angles for storyboards, another pulls medium angles for video starting frames, and a third pulls tight product or character angles for finishing stills.
Pair with image / video models
Limitations
- !The output is a navigable scene rendered inside Martini, not an exportable glTF, USD, or Gaussian splat asset for use in Blender, Unreal, or Unity.
- !Navigation depth is bounded — the model reconstructs the space implied by the reference, so pushing the camera well outside the framed area produces stretched geometry and reconstruction artifacts.
- !Lighting is inferred from the input image and effectively baked in. You cannot relight the world after generation, so plan the reference frame for the lighting mood you want.
- !Cluttered or low-quality reference images produce noticeably weaker worlds than clean, well-composed ones. Wide, busy outdoor scenes tend to hold together less than focused interiors or architectural shots.
- !Thin geometry (railings, foliage edges, fine wires) and very far-field detail are the weakest parts of the reconstruction; design downstream shots so those areas are not the hero of the frame.
Related features
Related how-to guides
Related docs
Frequently asked questions
How is this different from the Image to 3D World feature page?
The feature page is the workflow article — what the technique is, why it matters, and how it fits into a creative pipeline. This model page is the canvas-level entry: the actual node you drop into a Martini canvas, with inputs, outputs, pairing, and limitations spelled out.
Can I export the world to a 3D file?
No — the world lives as a navigable reference inside Martini, not as an exported mesh or splat. Capture the angles you need as stills and feed them downstream, or use a dedicated 3D model generator if you need a portable asset.
What kind of reference image works best?
Clean, well-lit images with clear depth cues — interiors, architecture, focused product shots, and concept frames hold together best. Cluttered crowd scenes or very wide landscapes are the hardest cases.
Can I generate a world from text alone on this node?
This node is image-conditioned. If you only have a text idea, generate a reference frame first with Nano Banana 2 or Flux and feed that into the node — image-conditioned worlds are noticeably more coherent than text-only ones.
How do I make sure every video shot stays in the same location?
Generate the world once, capture matched-angle stills from each viewpoint you need, and feed those into Sora 2 or another video model as image-to-video starting frames. The shared world reference is what keeps shots locked to a single space.
Why does the world look great in the framed area but stretched off to the side?
The model reconstructs the space implied by the reference image. Stay close to the framed region for clean results; treat the framed area as hero, what is just outside as usable, and anything beyond that as suggestive only.
Ready to build with Image to 3D World?
Open Martini, drop a world node, and chain it into your image and video pipeline. No GPU required.
Get started in dashboard