3D & World
AI World Generator on Martini
Build the world once, shoot it from every angle. The umbrella for the 3D cluster — text-to-world and image-to-world both live here. Where image-to-3d-world handles image input, this page covers the full text and image pipeline for navigable scenes you can film with multiple AI camera takes. Sister pages: image-to-3d-world (image input only) and ai-3d-model-generator (asset and pre-viz framing).
What this feature solves
Most 3D AI tools dead-end at a render. You generate a single image of a scene, get one camera angle, and that is the deliverable. The moment a director asks for a wider shot of the same scene, a different camera height, or a parallax move through the environment, the tool cannot help — the scene is a single render, not a navigable space. Pre-production teams who actually need to plan a shoot in a location that does not exist yet cannot do it with image-only AI tools, and the gap between concept and pre-vis stays wide.
The deeper break is shot reuse. A real production needs the same world across multiple cuts — the wide establish, the medium, the close-up, the over-the-shoulder, all in the same imaginary space. Tab-based AI tools force you to re-prompt the world every time, and the world drifts: the building changes, the lighting shifts, the spatial layout reorganizes. Storyboard and pre-vis fall apart because the location no longer reads as one place. Worlds need to be persistent assets, not one-shot prompts.
And there is the pipeline gap to actual video. Even when AI 3D tools produce a navigable scene, getting from that scene to a usable shot that matches the rest of a video sequence is rarely seamless. The 3D scene exports as imagery or a render, but reconciling that render with the video model output for the next cut typically requires a separate compositing tool. The world cluster remains an island disconnected from the video production pipeline.
Why Martini is different
Martini's canvas treats the world as an upstream node that feeds multiple downstream camera shots. Generate a world from text or image — the world model captures the scene, lighting, and spatial structure — and wire that world reference into multiple video nodes for different camera takes. The same world feeds the wide establish on Sora 2, the medium on Kling 3, the close-up on Seedance 2. Spatial consistency holds because every shot anchors to the same scene reference. The shoot becomes possible because the location finally exists as a reusable asset.
Both text and image input are first-class. Drop a concept image into the world model for image-to-world, or write a prompt for text-to-world. The output is a navigable scene that the canvas can reference. Sister pages — image-to-3d-world for image-only input, ai-3d-model-generator for asset and pre-viz framing — handle different entry points; this page sits as the umbrella that spans both and connects to the video pipeline downstream. The cluster acts as a coherent set rather than three competing tools.
Be honest about what the world is and is not. Martini's world output is a referenceable scene for video shot generation and pre-viz framing, not a glTF or USDZ asset for export into a game engine or CAD tool. The wedge is pre-production and storyboard work — give a director a navigable concept space, fan out video shots from it, and integrate those shots back into the cut. For users who need exportable engine geometry, the canvas is the wrong starting point and we say so plainly.
Common use cases
Build a navigable concept space for a director review
Generate the world from text or image, capture multiple camera takes from inside it, and present a real spatial review rather than a stack of disconnected stills.
Pre-vis a multi-cut location shoot before booking
Lock the location as a navigable AI scene, capture the establish, medium, and close-up cuts as video, and use the pre-vis to plan the live-action day.
Storyboard a short film with persistent locations
Each location in the script becomes a world node. Multiple cuts in each location reuse the world reference for spatial consistency.
Camera fan-out across the same scene for ad creative
Generate the campaign world once, then run multiple video models with different camera moves anchored to the same scene reference.
Hero plate for a music-video sequence
A single world reference feeds five or six video shots that all read as the same imaginary place across the cut.
Pre-production for an episodic series
Build a world per recurring location and reuse the canvas template across episodes so the location stays consistent week to week.
Recommended model stack
sora-2
videoLong-take camera moves through a generated world for establishing shots.
kling-3
videoCinematic camera moves through the world for medium and close-up cuts.
runway-gen4
videoReliable iteration for camera takes inside the world for editor-ready cuts.
nano-banana-2
imageGenerate concept stills inside the world for storyboard and pre-vis frames.
flux
imageHigh-fidelity world concept stills for the world model input.
midjourney
imageStylized concept input for text-driven world generation.
How the workflow works in Martini
- 1
1. Decide on the entry point: text or image
For pure concept work with no reference, start with a text prompt for the world. For grounded work with a reference (storyboard, location photo, concept art), start with an image input.
- 2
2. Generate the world via the world model node
Drop a world model node onto the canvas. Run the prompt or image input. The output is a navigable scene reference that downstream nodes can consume.
- 3
3. Refine the world if needed
Iterate the prompt, re-render, or chain through a world refinement step. The goal is a world that holds up under multiple camera angles, not a one-shot best frame.
- 4
4. Wire the world into multiple video nodes
Connect the world reference to several video nodes — Sora 2 for the establish, Kling 3 for the medium, Seedance 2 for a hero close-up. Each video shot anchors to the same world.
- 5
5. Capture the camera fan-out
Each video node generates a different camera move through the world. Same scene, different angles. Pre-vis or finished cut depending on the deliverable.
- 6
6. Sequence and export the shots
Drop the captured cuts into the sequence builder, NLE export to Premiere Pro or DaVinci Resolve. The cluster of shots reads as the same location across the cut.
Example workflow
A short-film director is pre-vising the opening of a sci-fi piece set in an abandoned space station. They write a prompt for the world model: "derelict orbital station, faded yellow lighting, soft hum, drifting debris, weightless interior corridors." The world generates as a navigable scene reference. They wire it into four video nodes: a Sora 2 wide establish drifting down the central corridor, a Kling 3 medium tracking past a control panel, a Seedance 2 hero close-up of a flickering light, and a Runway Gen-4 over-the-shoulder of a stand-in protagonist. All four shots anchor to the same world. The spatial layout reads as the same station across every cut. The director sequences the four shots, exports a one-minute pre-vis, and shows it to the producer for greenlight. The world stays as a reusable canvas template for any future shot they need inside the same station.
Tips and common mistakes
Tips
- Start with a clear scene description — material, light, scale, atmosphere. Generic prompts produce generic worlds.
- For grounded work, use image-to-world with a concept frame as input. The world inherits the input's spatial cues.
- Test the world with two or three camera angles before committing. A world that only renders well from one angle is a still, not a world.
- Save the world canvas as a reusable template for any project that returns to that location.
- Pair with the camera-control tool to direct camera moves more precisely inside generated worlds.
Common mistakes
- Expecting glTF or USDZ exportable geometry. Martini worlds are referenceable scenes for video and pre-vis, not engine assets.
- Re-prompting the world for every shot. The wedge is reuse — wire one world into multiple downstream cuts.
- Asking the world to handle production-grade lighting fidelity for live action. AI worlds are pre-vis quality, not VFX-final renders.
- Skipping the camera fan-out. One shot off a world is just an image; multiple shots off the same world is the point.
- Treating the world as a finished cut. The world is upstream; the video shots downstream are the deliverable.
Related how-to guides
Related models and tools
Tool
AI Video Frame Extraction
Extract frames from video for reference and image-to-video workflows.
Tool
AI Image Upscaling
Upscale images and keyframes before final video generation on Martini.
Provider
OpenAI
OpenAI's GPT Image and Sora video model workflows available on Martini.
Provider
Google's Veo video, Imagen image, and Nano Banana model workflows on Martini.
Provider
ByteDance
ByteDance's Seedance video and Seedream image model families on Martini.
Provider
Kling
Kling 3, O3, and Avatar video model workflows on Martini.
3D model
Marble 3D AI
Marble 3D and world generation workflows on Martini.
3D model
Image to 3D
Convert images into 3D assets and scenes on Martini.
3D model
Gaussian Splat AI
Gaussian splat 3D outputs on Martini's canvas.
World model
World Labs
World Labs image/text-to-navigable-world workflows on Martini.
World model
Image to 3D World
Turn a visual reference into a reusable navigable 3D world on Martini.
Related features
Image to 3D World — Convert References Into Navigable Scenes
Convert image references into navigable world and 3D scene workflows on Martini.
AI 3D Model Generator — Generate 3D Assets for Scenes
Generate 3D assets, scene references, and dimensional scenes on Martini's canvas — Sora 2, Kling 3, Nano Banana 2 chained into 3D-aware video and world workflows.
AI Storyboard Generator — Plan Shots, Generate Frames, Then Animate
Plan shots, generate storyboard frames, and convert frames into video on Martini's canvas.
AI Video Workflow — Node-Based Production From Concept to Final Sequence
Build node-based AI video production pipelines on Martini's canvas — from concept and storyboard to final NLE-ready sequence.
Related docs
Related reading
Comparisons
Frequently asked questions
How is this different from image-to-3d-world?
image-to-3d-world handles image input specifically. ai-world-generator is the umbrella covering text-to-world and image-to-world together, plus the camera fan-out pattern that turns the world into multiple video shots. Use image-to-3d-world for the image-input wedge; come here for the broader world workflow.
How is this different from ai-3d-model-generator?
ai-3d-model-generator focuses on individual 3D assets and pre-viz framing. ai-world-generator focuses on full navigable scenes that feed multiple video shots. Both live in the 3D cluster but address different deliverables.
Can I export the generated world as glTF, USDZ, or engine geometry?
No. Martini worlds are referenceable scenes inside the canvas for downstream video shot generation and pre-vis framing. They are not exportable engine geometry. For game-engine pipelines, the canvas is the wrong starting point.
Which video models work best with generated worlds?
Sora 2 for long lyrical camera moves, Kling 3 for cinematic medium and close-up shots, Seedance 2 for hero detail, Runway Gen-4 for reliable iteration. Different shots within the same world benefit from different engines — the camera fan-out across multiple models is the pattern.
Can the world stay consistent across multiple shots?
Yes. The world reference node persists on the canvas, and every downstream video shot anchors to the same reference. Spatial consistency holds across the camera fan-out — wide establish, medium, close-up, over-the-shoulder all read as the same place.
How does this fit into a real production pipeline?
It fits at pre-vis and storyboard. Generate worlds for each location, capture multiple camera takes, sequence and export to NLE for the director and producer review. The pipeline integration is at the canvas-to-NLE handoff; downstream finishing happens in Premiere Pro or DaVinci Resolve.
Build it on the canvas
Open Martini and wire this workflow up in minutes. Free to start — no card required.