Midjourney
Draft a shot list as image nodes laid out left-to-right on the Martini canvas using Midjourney v7 — cinematic frames with editorial mood that read as cohesive storyboard panels for a commercial or short film. Midjourney is the strongest pick when the storyboard needs atmospheric weight and painterly cinematography rather than literal staging. Lock a single style reference, fan into 8-12 frames keyed to script beats, then feed the strongest panels straight into Sora 2 or Kling 3 for animatic motion tests.
Before Midjourney runs, sit with the script and list the visual beats: opening establishing shot, character entry, story turn, tension peak, resolution, closing tag. 8-12 beats is the working range for a 30-60s commercial or short-form narrative; 6 beats for a 15s ad. Each beat becomes one Midjourney node on the canvas.
Drop a single cinematic anchor reference onto the canvas — a film still, a photo from your moodboard, or a generated baseline frame. Wire it into Midjourney v7 as the style reference (sref) for every storyboard node. This is what holds palette, lighting, grain, and tonal voice across all 12 frames; without it, frame-to-frame consistency falls apart.
Midjourney v7 reads cinematic language. Skip "the character looks sad" and write: "Medium close-up, soft window light from camera left, character in profile, slight tear track on cheek, shallow depth of field, anamorphic compression, 16:9." Specify shot size (wide, medium, close-up), camera angle, lighting direction, and aspect ratio per beat.
On the Martini canvas, place the Midjourney nodes in chronological order left-to-right — frame 1 establishing shot at far left, frame 12 closing tag at far right. Each shares the style reference (sref); each has its own beat-specific shot description. The visual sequence reads as a board even before any frames generate.
Midjourney v7 alone is weaker than Nano Banana 2 on character lock. For storyboards with a recurring protagonist, generate the character once on Nano Banana 2, drop the locked portrait onto the canvas, and use it as a character reference (cref) on each Midjourney frame node. Sref locks style; cref locks character. Together they produce frame-to-frame coherence on both axes.
Each storyboard frame is also a video keyframe. Once the board is approved, route the chosen frames into Sora 2 or Kling 3 video nodes — image-to-video with the cinematic frame as the starting frame. The animatic ships from the same canvas, no re-prompting the video model from scratch. This is the canvas-native workflow advantage over single-tool storyboard apps.
Establishing shot anchor. The wide composition + cool palette sets the visual language for the rest of the board.
Frame 1 (establishing): Wide shot of a foggy coastal cliff at dawn, lone figure walking the path away from camera, cool blue palette, soft volumetric fog, anamorphic compression, 16:9 cinematic --sref [reference URL] --sw 200 --stylize 250
Story turn. Sref + cref combo locks both style and character. Nano Banana 2 portrait carries the face; Midjourney handles the cinematic atmosphere.
Frame 4 (story turn): Medium close-up of the same character looking back over shoulder, soft golden side light breaking through fog, slight surprise on face, shallow depth of field, 16:9 --sref [same reference URL] --cref [Nano Banana 2 portrait] --cw 100 --sw 200
Tension peak in tight close-up. Cref weight 100 keeps the face locked at this proximity.
Frame 8 (tension peak): Tight close-up of character's eyes, dramatic side light, single tear track, anamorphic compression with shallow depth of field, 16:9 --sref [same reference URL] --cref [same portrait] --cw 100 --sw 200
Closing tag with tonal shift. Lower --sw on the closing frame allows the palette to evolve while keeping the cinematography family intact.
Frame 12 (closing tag): Wide reverse over the cliff, character silhouetted against breaking sun, golden palette emerging, hopeful tonal shift, 16:9 --sref [same reference URL] --sw 150 --stylize 200
Use --sref + --sw 150-250 across all storyboard frames. Lower sw lets palette evolve at story turns; higher sw locks cinematography tighter.
Pair with --cref + --cw 80-100 for character-locked frames. Use a Nano Banana 2 portrait as the cref source — Midjourney alone is weaker on face lock.
Pin --stylize at 200-300 for cinematic storyboards. Higher than 400 risks over-painterly results that break the production look.
Match the aspect ratio to the final delivery (16:9 for film/commercial, 21:9 for cinematic widescreen, 9:16 for vertical short-form). Re-cropping panels later breaks the board read.
Avoid --weird on storyboards. It is for ideation only and will break frame-to-frame coherence.
For animatics, generate the full 8-12 frame board first, get approval, then feed selected frames to Sora 2 / Kling 3 video nodes — do not animate every panel.
Midjourney v7 returns four 1024-2048 wide variants per generation. Sref-locked storyboard frames hold cinematography at ~85-90% consistency across 12+ frames. Generation time 30-60s per batch. Output drops onto the canvas as four selectable variants per beat — pick the strongest, route others to reject. Each chosen frame is also a video keyframe — wire downstream into Sora 2 / Kling 3 video nodes for animatic. The frame-as-keyframe workflow is the canvas-native advantage that single-tool storyboard apps cannot replicate.
Connect Midjourney v7 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeOpenAI
Generate storyboard frames on Martini using GPT Image 2 — strong narrative reasoning makes it the right pick when the script reads like a beat-driven story rather than a literal shot list. GPT Image 2 interprets emotional context and composes scenes that carry meaning across panels, which is exactly what directors need when boarding a 30-60 second commercial or short-form narrative. Pair it with a saved style prompt block and feed selected frames into Sora 2 or Kling 3 for the animatic.
View guideBlack Forest Labs
Generate storyboard frames on Martini using FLUX.2 — the prompt-fidelity pick for boards where every shot is staged literally and frame composition must match the director's shot list verbatim. FLUX.2 renders explicit camera angles, character positions, and prop placements with literal accuracy, which is exactly what feature-quality pre-viz needs. Pair it with a saved cinematic style prompt, fan into 8-12 frames keyed to the shot list, and feed selected panels into Sora 2 or Kling 3 for animatic.
View guide