3D & World
Image to 3D World
You have a reference image of a place — a concept frame, a location photo, a moodboard still — and you need to move through it like a 3D scene. Martini turns the reference into a navigable world on the canvas, then chains camera moves into video nodes for cinematic shots you can export to your NLE. Pre-vis, set design, and immersive scene work, all from one image.
What this feature solves
Concept artists, directors, and game designers regularly hit the same wall: a single still tells you what a place looks like, but it does not tell you what it feels like to move through. You cannot stage a shot, plan a camera path, or pre-vis blocking from a flat image. The traditional fix is a full 3D build in Blender or Unreal, which takes days and skill that most production teams do not have on staff.
AI 3D and world generation is starting to close that gap, but most tools either spit out untextured meshes or render single fly-throughs with no production usability. The output is impressive demo footage and unusable production input. Teams need a workflow where the 3D scene becomes the upstream of real video shots, not a separate gallery deliverable.
Set design and pre-vis especially need rapid iteration. Director wants a different angle on the same world? A different time of day? A push-in instead of a dolly? Without a canvas where the world generation, camera moves, and downstream video shots all live together, every iteration is a new tool, a new wait, a new manual handoff.
Why Martini is different
Martini lets the world live as an upstream node feeding video nodes. Generate the navigable scene from your reference image, then connect it to multiple video nodes — each one a different camera move through the same environment. The world is the source of truth; the video shots are derived takes that share the same space, lighting, and composition rules.
Reference inputs control the world. A concept frame, a location photo, a Midjourney render — any of them can drive the 3D generation. The canvas treats the reference as the anchor so iterations on the world (different lighting, different camera, different time of day) maintain the spatial integrity of the original reference.
Camera moves chain into video output. Once the world exists, push, pull, dolly, and crane moves become parameters on downstream video nodes. Each move renders as a real cinematic clip, exports through NLE export, and edits as part of your production sequence — not as a one-off render trapped inside a 3D viewer.
Common use cases
Director pre-vis from concept frames
Turn a concept frame into a navigable scene so the director can plan camera angles before scouting or building sets.
Set and location design exploration
Generate a 3D scene from a moodboard reference and test different angles, lighting, and atmospheres before committing to construction.
Game level and environment concepting
Use a reference image as the seed for a navigable environment that the art team can iterate on for engine implementation.
Architectural and interior visualization
Convert renderings or photos of spaces into navigable scenes for client walkthroughs and design review.
Music video and short-film world building
Build the visual world of a music video or short from a concept image and shoot multiple camera moves through the same space.
Cinematic camera move tests for editorial
Generate a world, then run multiple camera moves on it as separate video nodes to test which one cuts best in editorial.
Recommended model stack
sora-2
videoLong-take coherence for sustained navigation through a generated world.
kling-3
videoCinematic camera moves for crane, dolly, and push shots through the scene.
runway-gen4
videoDirector-controlled camera language for editorial-grade world shots.
midjourney
imageGenerate or refine the upstream world reference image.
flux
imagePhotoreal reference frames for grounded environment generation.
How the workflow works in Martini
- 1
1. Prepare the reference image
Start with a strong, well-composed reference of the location or environment — a concept frame, a Midjourney render, or a photo. The richer the reference, the more detailed the navigable world.
- 2
2. Drop the reference onto the canvas
Add the image as an anchor node. Label it clearly — this is the source of truth for the entire world chain.
- 3
3. Generate the navigable scene
Wire the reference into a 3D-capable scene node. The model interprets depth, layout, and lighting to construct a navigable environment derived from your reference.
- 4
4. Add camera move video nodes
For each shot you need, add a video node — Sora 2 for long takes, Kling 3 for cinematic moves, Runway Gen-4 for editorial polish. Set the camera path (push, pull, crane, dolly) per node.
- 5
5. Run shots and review the world
Render each camera move and review for spatial consistency. The same world should appear in every shot with the same architecture, lighting, and atmosphere.
- 6
6. Sequence and export
Connect the chosen takes into the sequence builder, then NLE export to Premiere, DaVinci, or Final Cut for editorial finish.
Example workflow
A music video director is building a one-shot performance through an abandoned cathedral. They start with a Midjourney render of the cathedral interior — soft window light, tall vaults, dust motes. The render drops onto the canvas as the reference. They generate the navigable scene from the reference, then add four video nodes: a slow push through the nave on Sora 2, a Kling 3 crane up to the vaults, a Runway Gen-4 dolly past the windows, and a Sora 2 reverse pull-out from the altar. Each shot inherits the same spatial language and lighting. Sequence builder cuts the four shots into a 45-second performance arc, NLE export drops the cut into DaVinci for grading. The world that started as one frame becomes a set the director can shoot in.
Tips and common mistakes
Tips
- Use a high-detail reference image. The world generation interprets every cue in the source — vague references produce vague worlds.
- Plan the camera moves before you generate. Knowing what shots you need shapes which video models to chain in.
- Mix camera move models. Sora 2 for sustained takes, Kling 3 for cinematic language, Runway Gen-4 for editorial polish.
- Re-render the world before re-running every camera move. Once the world looks right, the camera shots become surgical.
- Save the canvas as a template if you build worlds frequently — the reference-to-shot chain is reusable.
Common mistakes
- Starting from a low-quality or ambiguous reference. The model fills in gaps with assumptions you may not want.
- Treating the world generation as the final deliverable. The world is the upstream of real video shots — keep going.
- Using one camera move model for every shot. Different moves need different engines for the right cinematic feel.
- Skipping the world review step. If the world drifts between shots, fix the reference or the world node before running camera moves.
- Trying to export a single fly-through as a deliverable. Compose multiple camera moves into a real cut for production usability.
Related how-to guides
Related models and tools
Tool
AI Image Upscaling
Upscale images and keyframes before final video generation on Martini.
Tool
AI Video Frame Extraction
Extract frames from video for reference and image-to-video workflows.
Provider
OpenAI
OpenAI's GPT Image and Sora video model workflows available on Martini.
Provider
Google's Veo video, Imagen image, and Nano Banana model workflows on Martini.
Provider
ByteDance
ByteDance's Seedance video and Seedream image model families on Martini.
Provider
Kling
Kling 3, O3, and Avatar video model workflows on Martini.
3D model
Marble 3D AI
Marble 3D and world generation workflows on Martini.
3D model
Image to 3D
Convert images into 3D assets and scenes on Martini.
3D model
Gaussian Splat AI
Gaussian splat 3D outputs on Martini's canvas.
World model
World Labs
World Labs image/text-to-navigable-world workflows on Martini.
World model
Image to 3D World
Turn a visual reference into a reusable navigable 3D world on Martini.
Related features
AI Image to Video — Animate Stills Into Production-Ready Shots
Turn still images into production-ready video shots on Martini's canvas — multi-model, reference-aware, NLE-export ready.
AI 3D Model Generator — Generate 3D Assets for Scenes
Generate 3D assets, scene references, and dimensional scenes on Martini's canvas — Sora 2, Kling 3, Nano Banana 2 chained into 3D-aware video and world workflows.
AI Video Workflow — Node-Based Production From Concept to Final Sequence
Build node-based AI video production pipelines on Martini's canvas — from concept and storyboard to final NLE-ready sequence.
AI World Generator — Build Reusable Worlds on Martini
Generate reusable worlds for shots, stories, and campaigns on Martini's canvas.
Related docs
Related reading
Comparisons
Frequently asked questions
What kind of reference image works best?
High-detail, well-composed environment shots — concept renders, location photos, or Midjourney/Flux generations of spaces. The model uses depth, layout, and lighting cues from the reference, so detailed sources produce richer navigable worlds. Avoid heavily stylized or abstract references unless that style is the goal.
Can I generate camera moves through the same world for multiple shots?
Yes. Generate the world once, then add multiple downstream video nodes — each one a different camera move (push, pull, crane, dolly). All shots share the same spatial language, lighting, and architecture because they derive from the same upstream world generation.
How is this different from a 3D scene in Blender or Unreal?
Blender and Unreal give you a true 3D engine with full control and zero dependence on AI. They also take days of skilled work to build a scene from scratch. Image-to-3D-world workflows on Martini compress that to minutes for pre-vis and concept work, with the trade-off that the output is generative, not engine-native.
Will the camera shots cut together cleanly?
Yes — that is the point of generating the world once and chaining camera moves downstream. Every shot inherits the same world generation, so the spatial language stays consistent. The sequence builder assembles the cuts and NLE export ships them as one timeline.
Can I use this for game environment concepting?
Yes. The output is generative video, not engine-ready geometry, so the workflow is best for concept and pre-vis stages. Art teams use the camera moves as reference for engine implementation by environment artists.
How does this fit a real production pipeline?
Image-to-3D-world is a pre-vis and concept tool. The natural pipeline is: reference image → navigable world → multiple camera moves → sequence → NLE export → editorial review → live-action shoot or further AI production. The canvas keeps the whole chain visible and revisable.
Build it on the canvas
Open Martini and wire this workflow up in minutes. Free to start — no card required.