Workflow
AI Short Film Workflow
This workflow takes a one to three minute narrative short from script to NLE-ready cut on Martini's canvas. Storyboard frames, character reference, location reference, per-shot generation, continuity checks, audio design, and titling all live on the same graph in cut order. The canvas mirrors the timeline: drop the script left, lay the storyboard frames in sequence, fan out video generation, layer audio underneath, and export the bundle to your NLE for picture finishing.
When to use this workflow
- Drafting a one to three minute narrative short over a weekend before booking talent or locations
- Producing a 30-60 second branded story spot with eight to twelve shots and one returning protagonist
- Pre-visualizing a festival-pre-viz scene so the director can block coverage before the shoot
- Standing up an episodic AI series pilot that needs cross-shot continuity from frame one
- Producing a film-school exercise or portfolio piece with the same scope as a live-action short
Required inputs
- A finished or near-finished script for the short, broken into scenes and beats
- A shot list with framing intent per shot (master, two-shot, single, insert, reaction)
- Character reference images for every speaking role (one canonical portrait each)
- Location reference images or descriptions for every distinct setting
- A look-and-feel reference (lighting, color, mood) the cinematic frames will inherit
- Audio plan: who speaks, where ambience changes, where SFX or score lift
Steps
- 1
1. Drop the script as a text node, pin character and style anchors
Open the canvas and drop the script as a text node on the left edge. Add image nodes alongside for each canonical character portrait and the location and style references. These anchors stay pinned and visible while you build the rest of the canvas — the script is the source of truth, and the reference rail to its right is the visual contract every downstream frame inherits. Resist scattering anchors across the canvas; one rail keeps the pipeline auditable.
- 2
2. Generate a storyboard board, one frame per shot
Drop one image node per shot, laid left to right in cut order across the canvas. Use Midjourney for cinematic look development on hero frames, GPT Image 2 for structured shot-intent boards where framing precision matters, and Nano Banana 2 for shots that depend on the canonical character anchor. Each frame gets the same character and style references wired in — the frames vary by camera framing and action, not by re-prompting identity. The result is an animatic-ready board the team can review in one pass.
- 3
3. Iterate per frame, not per board
When a frame is wrong, regenerate that single node — never the whole storyboard. Re-running the board destroys the frames that already work and forces continuity re-locking. Keep good frames frozen and tweak only the broken ones. The canvas makes this granular: each frame is its own node with its own retry button. Shot four can iterate twelve times while shots one through three sit untouched, and the team only reviews the changed frame, not the whole sequence.
- 4
4. Lock the cinematic look on hero frames
Pick the three or four hero frames that define the look of the short — the establishing shot, the hero close-up, the climactic reveal. Run those on Midjourney for cinematic composition and Flux for high-fidelity rendering. The look those frames lock becomes the style reference every other frame inherits. Run GPT Image 2 on more structural shots where blocking and framing matter more than mood — coverage shots, reverses, inserts. Mixing image models per shot intent is a feature, not a bug.
- 5
5. Chain each storyboard frame into a video node
Wire each approved storyboard still into its own video node. Use Sora 2 for hero takes that need long-take coherence and within-shot multi-action. Use Kling 3 for cinematic camera moves shot to shot. Use Vidu for character-locked cuts where the cast appears across multiple shots. Each video node carries its own motion prompt — describe the action and camera, not the character or scene. The image already locked the visual identity; the video prompt only describes what moves.
- 6
6. Run a continuity check before audio
Once the first video pass finishes, scrub through the cuts in order on the canvas and check for continuity drift — eye-line mismatches, prop placement shifts, lighting direction changes between shot four and shot five. Flag the weak shots and re-run only those. For shots that demand continuous action across cuts, use last-frame extraction from the prior clip as the first-frame anchor of the next — this is the cleanest way to chain action without breaking the model coherence limit at eight to ten seconds.
- 7
7. Run audio passes — VO, ambience, Foley
Add an ElevenLabs node for dialogue and voiceover with inline emotion tags where the script supports them. Add a Hunyuan Foley node for SFX (footsteps, doors, props) tied to specific cuts. Layer ambience underneath each scene — different rooms get different beds. Audio runs in parallel with picture iteration on the canvas, so by the time picture is locked, the dialogue and SFX layers are ready to drop into the sequence builder. Bake audio on the canvas — handling it after NLE export means re-syncing per clip in the editor.
- 8
8. Sequence in cut order, then export to NLE
Drop a sequence-builder node and wire every video and audio output into it in cut order. Preview the whole short on the canvas — this is your animatic, with sound. Adjust trim and shot order without leaving the graph. Then add an NLE export node and ship to Premiere, DaVinci Resolve, or Final Cut. Use ProRes 422 for the master if the editor will color in DaVinci; H.264 for cutting proxies if the team is iterating on the edit. Match the final-cut frame rate (24 fps for cinema, 25 for European broadcast).
- 9
9. Add titling and on-screen graphics
Titles, lower-thirds, and end-card credits go in the NLE, not on the canvas. Premiere uses Essential Graphics, DaVinci uses Fusion or Resolve's text tools, Final Cut uses Motion. The canvas hands off picture and audio; the NLE owns titling, color, and final mix. This separation is intentional — the canvas is the generation factory, the NLE is the finishing room. Treat regenerations as canvas-side fixes that re-export at the same spec, not as titling decisions.
- 10
10. Publish and save the canvas as a film template
Export the finished master at the platform spec — H.264 1080p for festival submissions, ProRes 422 for archive, MP4 for streaming uploads. Save the canvas itself as a film template. The next short reuses the wiring: swap the script, swap the canonical character, swap the location reference, and the chain produces a new bundle without rebuilding nodes. A second short that took two weeks the first time becomes a four-day templated job by the third.
Recommended models
Martini canvas notes
- The reference rail on the left of the canvas (script + character + location + style) is wired into every downstream node — drop references once, inherit everywhere.
- Storyboard frames lay left-to-right in cut order, so the canvas literally mirrors the final timeline before any video is generated.
- Per-frame regeneration is granular — only the changed node runs, the rest stay frozen, so iteration cost stays bounded as the project grows.
- Last-frame extraction wires the final frame of one clip into the first-frame anchor of the next, chaining action across the eight-to-ten-second model coherence limit.
- The sequence builder previews picture and audio together on the canvas, so the animatic with sound exists before anything ships to the NLE.
Variations
30-second branded narrative
Eight shots, one returning protagonist, single location, ElevenLabs voiceover and SFX bed. Use Sora 2 for the hero, Kling 3 for coverage.
90-second festival short
Twelve to fifteen shots across two or three locations. Lock cinematic look on Midjourney, fan video across Sora 2 and Kling 3, finish in DaVinci at 24 fps.
Episodic AI series pilot
Three minutes, one returning lead, recurring set. Save the canvas as a series template; episodes two onward reuse the wiring with a swapped script.
Dialogue-heavy scene exercise
Six shots, two characters, shot-reverse-shot coverage. Lock both leads as dual anchors, use Kling O3 for character-aware reaction beats, ElevenLabs for both voices.
Related features
Multi-Shot AI Video — Build Connected Scenes, Not Isolated Clips
Plan, generate, and sequence multi-shot AI video on Martini — keep characters, style, and motion consistent across shots.
AI Storyboard Generator — Plan Shots, Generate Frames, Then Animate
Plan shots, generate storyboard frames, and convert frames into video on Martini's canvas.
AI Character Consistency Across Images and Video
Keep a subject consistent across image and video generations on Martini using reference workflows.
Consistent Character AI Video — Reference-Driven Video on Martini
Preserve character identity through reference-driven video models on Martini.
AI Video NLE Export — From Generation to Premiere, DaVinci, Final Cut
Move AI-generated sequences from Martini into Premiere Pro, DaVinci Resolve, and Final Cut Pro.
Related how-to guides
Related reading
Related docs
Frequently asked questions
Should I storyboard before generating any video?
Yes — the storyboard board is the cheapest sanity check the workflow has. Storyboard frames cost a fraction of video generation and let the team approve framing, blocking, and look before any motion runs. The canvas mirrors the cut, so what you sign off on at the storyboard stage is what video generation produces. Skipping storyboarding means rerunning expensive video nodes on shots that should have been caught at the still phase.
How do I handle long takes that drift past eight seconds?
Chain shorter clips with last-frame hand-offs. Take the final frame of clip A, wire it as the first-frame anchor of clip B, and the cut reads as a continuous action even though it is two generations stitched together. The canvas makes this cheap because last-frame extraction is a node, not an external tool round-trip. For genuinely long single takes (long-take coherence), Sora 2 holds the longest of the recommended models.
When do I use Sora 2 versus Kling 3?
Sora 2 for hero takes that need long-take coherence and within-shot multi-action — the master shot, the climactic move, the long establishing pan. Kling 3 for cinematic camera moves shot to shot — coverage, reverses, hero close-ups. The canvas runs different models on different shots in the same sequence; this is the right call, not a complication.
Where does audio belong in this workflow?
On the canvas, baked alongside picture before NLE export. Run ElevenLabs for dialogue, Hunyuan Foley for SFX, and ambience beds underneath. Audio runs in parallel with picture iteration on the graph, so by the time the picture is locked, the audio layers are ready to wire into the sequence builder. Handling audio after export means per-clip re-syncing in the editor — slower and more error-prone than baking it on the canvas.
What frame rate should I export at?
Match the destination. Cinema and most festival submissions want 24 fps; European broadcast is 25 fps; streaming and most online platforms accept 24 or 30. Set the frame rate on the NLE export node before exporting; do not let the NLE conform later — that introduces interpolation artifacts. For dual-deliverable shorts (cinema + streaming), generate at 24 fps and let the streaming target use the same master.
Can I add titles and credits on the canvas or do they belong in the NLE?
In the NLE. Premiere's Essential Graphics, DaVinci's Fusion or text tools, and Final Cut's Motion all handle titling better than any canvas node — typography, motion graphics, brand kit lockup are NLE strengths. The canvas hands off picture and audio; titling is the editor's job. This division of labor keeps the canvas lean and lets the editor finish in their familiar toolset.
Build it on the canvas
Open Martini and wire this workflow up in minutes. Free to start — no card required.