OpenAI
Sora 2 is OpenAI's flagship for cinematic short film work — realistic lighting, believable reflections, and camera moves that read like a real DP shot them. The base Sora 2 handles text-to-video and image-to-video at 1080p; Sora 2 Pro lifts fidelity and unlocks 15-second clips with clarity control. For an indie filmmaker drafting a 3-5 shot festival short over a weekend, Sora 2 hits the bar where the pre-viz looks production-ready before any crew is booked.
Before any video render, lay the shot list across Martini's canvas as image nodes — wide, medium, close-up, reverse, tag. Use the storyboard generator (GPT Image, FLUX, Midjourney) to lock character, palette, and location. Sora 2 generates dramatically better motion when fed a strong starting frame than from text alone.
Pin a Nano Banana 2 character sheet to the canvas as the identity anchor. For each shot, route the character reference into the Sora 2 image-to-video node alongside the storyboard frame. Identity reads more consistently across cuts when Sora is image-conditioned than when each shot is described from scratch in text.
Sora 2 understands cinematography vocabulary. Use real shot terms: "low-angle dolly forward, 35mm lens, soft golden hour key light from camera right, character walks toward us through wheat field, slight handheld breathing." This level of camera direction reads as production language to Sora and produces noticeably more cinematic motion than vague atmosphere.
Sora 2 base is fine for blocking and most shots. Reserve Sora 2 Pro for the hero opener, the climax shot, or the closing tag — anywhere clarity control and the 15-second window matter. For a 3-5 shot short, expect to use Pro on 1-2 hero cuts and base on the rest. The cost difference is meaningful at a short-film scale.
For shots that should read as a continuous moment (character runs out of frame in shot A, runs into frame in shot B), route the last frame of shot A through a frame-extraction tool node and feed it as the starting reference for shot B. Sora respects starting frames tightly, so the cuts feel like the same world even when the camera angle changes.
Once all 3-5 shots are rendered, route them into Martini's sequence builder for an ordered timeline, then export as a native sequence to Premiere, DaVinci, or Final Cut. Layer dialogue (ElevenLabs Eleven v3 Dialogue) and ambience (ElevenLabs SFX v2) on the audio tracks. The result is a festival-ready pre-viz reel.
Opening establisher. The cinematography vocabulary ("anamorphic flares", "low-angle dolly") gets Sora into a cinematic register.
Wide establishing shot, character silhouette walks across a misty wheat field at dawn, low-angle dolly forward, 35mm anamorphic lens flares, soft golden key light, 8 seconds
Character beat in the middle of the short. Adding "slight handheld breathing" gives the shot a documentary feel without losing composition.
Medium shot, character's face turns toward camera, wind catches loose hair, slight handheld breathing, soft rim light from behind, golden hour ambient, 6 seconds
Detail insert that earns the close-up. Sora 2 base handles macro motion well; reserve Pro for the climax.
Close-up of weathered hands picking a single wheat stalk, shallow depth of field, soft top light, ambient bee buzz, 5 seconds
15-second hero window for the closing — Sora 2 Pro only. Clarity control + longer duration = the ending you want.
Closing tag shot, character disappears behind a hill at sunset, sky burns orange, slow pull-out reveals the empty field, 12 seconds, Sora 2 Pro
Always image-condition Sora 2 with a storyboard frame from GPT Image / FLUX / Midjourney — text-only prompts produce more variance.
Use Sora 2 Pro for the opener and closer; base for middle blocking. Cost-aware short-film budgeting.
Cinematography terms ("dolly", "rim light", "anamorphic") work better than mood adjectives ("epic", "beautiful").
Keep clip duration under 12 seconds on base, under 15 on Pro — longer prompts get diluted.
Pin the character reference to every shot prompt so identity holds across cuts.
Sora 2 produces 1080p footage at base, 1080p with clarity control on Pro. Realistic lighting and reflections are the family's signature, and image-conditioned generations look noticeably more polished than text-only ones. Render times run 90-180 seconds on base, 180-240 on Pro. The Storyboard variant of Sora 2 Pro is for multi-shot in one pass — covered separately in the create-multi-shot-video scenario.
Connect Sora 2 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeKling
Kling 3.0 is the first major video model to render native 4K (3840x2160) at the diffusion stage rather than via post-process upscaling — sharper textures, accurate film grain, finer hair, fabric, and skin detail than any upscaler can recover. For a short film bound for a festival projector, that detail floor matters. Kling also bakes Omni Native Audio into the same pass (English, Chinese, Japanese, Korean, Spanish), so dialogue lip-sync and ambience can ship without a separate audio chain.
View guideGoogle Veo 3.1 ships native audio synthesis baked into the same generation pass as the video — describe the ambient sound right in the prompt and Veo synchronizes it to the picture. For an indie short film where the dialogue, footsteps, and music bed all have to land in time, Veo 3.1 is the cleanest end-to-end option. Output goes up to 1080p with Fast and Standard tiers, plus an Extend variant that continues an existing clip in V2V mode for seamless multi-clip assembly.
View guide