Google

How to Create an AI Short Film with Google Veo 3.1

Google Veo 3.1 ships native audio synthesis baked into the same generation pass as the video — describe the ambient sound right in the prompt and Veo synchronizes it to the picture. For an indie short film where the dialogue, footsteps, and music bed all have to land in time, Veo 3.1 is the cleanest end-to-end option. Output goes up to 1080p with Fast and Standard tiers, plus an Extend variant that continues an existing clip in V2V mode for seamless multi-clip assembly.

Try Google Veo 3.1 Free

Step-by-Step Guide

Describe ambient sound in the prompt itself

Veo 3.1's native audio synthesis means ambience is part of the prompt, not a separate node. Write: "rain on a tin roof, distant thunder, fire crackles in a wood stove, soft folk guitar in the background." Veo will synthesize all of those layered with the visual content. This is meaningfully tighter than chaining ElevenLabs SFX afterward.

Lock the protagonist with a reference image

Veo 3.1 supports reference images for style and character guidance. Pin the Nano Banana 2 character sheet to the canvas and feed it into each shot's Veo node. Identity is consistent across cuts — and because the audio renders in-pass, the character's footsteps and dialogue stay matched to their movement.

Pick Fast for blocking, Standard for hero shots

Veo 3.1 Fast returns drafts in 60-120s; Standard takes 120-180s but produces noticeably crisper detail and better audio fidelity. For a 3-5 shot short, run Fast on the first pass to lock prompt language, then re-render hero opener and closer on Standard with the proven prompt.

Use the Extend variant to assemble continuous clips

For a continuous take that runs longer than a single Veo render, the Extend variant takes an existing clip and continues it seamlessly in V2V mode. Render the first 8 seconds with Veo 3.1 Standard, then route the output into a Veo 3.1 Extend node with a continuation prompt. The result is a longer continuous shot without a visible cut.

Mix dialogue lines into the prompt for in-pass lip-sync

Like Kling, Veo 3.1 generates dialogue lip-sync when you supply a script line in the prompt. For the climactic dialogue beat, write the line in quotes: "Character whispers: 'We need to leave now.' Soft golden hour light, medium close-up, ambient cricket sound, 6 seconds." Lip-sync renders in-pass.

Export the assembled timeline at 1080p

Once all 3-5 shots are in the canvas, route them through the sequence builder and export as a 1080p native sequence. Veo 3.1 caps at 1080p — for 4K festival delivery, route the timeline through a video-upscale tool node (2x is enough; reserve 4x for hero only). Audio is already baked, so no separate audio export needed.

Prompt Examples

Opening with full ambient soundscape baked in. No separate SFX or music nodes needed.

Wide establishing shot of a remote cabin at dusk, rain on tin roof, distant thunder, fire crackles in wood stove, soft folk guitar in background, 8 seconds

Dialogue beat with in-pass lip-sync. Veo renders the spoken line synchronized to mouth movement.

Medium close-up. Character whispers: "We need to leave now." Soft golden hour light from camera right, ambient cricket sound, 6 seconds

First half of a long continuous chase. Pipe to Veo Extend for the next 8 seconds without a visible cut.

Continuous follow shot, character runs through wet forest at night, breathing heavily, leaves rustle, distant siren, handheld camera, 8 seconds (then continue with Veo Extend)

Parameter Tips

Veo 3.1 ambient sound renders in the same pass — describe the soundscape directly in the prompt.

Use Fast for blocking, Standard for hero shots — the audio fidelity gap is significant.

For dialogue, write the line in quotes inside the prompt; Veo renders lip-sync in-pass.

Veo 3.1 Extend is V2V-only — feed it an existing clip plus a continuation prompt for seamless multi-clip assembly.

Output is capped at 1080p — for 4K festival delivery, chain a video-upscale tool node downstream.

What to Expect

Veo 3.1 outputs at 720p or 1080p with native synchronized audio in the same generation pass — uniquely tight pairing of picture and sound. Render times: Fast 60-120s, Standard 120-180s. Reference images guide style and character. The Extend variant is V2V-only and useful for continuous shots that exceed a single render. For 4K final delivery, chain a video-upscale tool node downstream — Veo itself caps at 1080p.

Use Google Veo 3.1 on Martini

Connect Google Veo 3.1 with other AI models on Martini's infinite canvas. No GPU required — start free.

Get Started Free

Related features

Docs

nodes/video

Try Other Models for This Task

OpenAI

Sora 2

Sora 2 is OpenAI's flagship for cinematic short film work — realistic lighting, believable reflections, and camera moves that read like a real DP shot them. The base Sora 2 handles text-to-video and image-to-video at 1080p; Sora 2 Pro lifts fidelity and unlocks 15-second clips with clarity control. For an indie filmmaker drafting a 3-5 shot festival short over a weekend, Sora 2 hits the bar where the pre-viz looks production-ready before any crew is booked.

View guide

Kling

Kling 3.0

Kling 3.0 is the first major video model to render native 4K (3840x2160) at the diffusion stage rather than via post-process upscaling — sharper textures, accurate film grain, finer hair, fabric, and skin detail than any upscaler can recover. For a short film bound for a festival projector, that detail floor matters. Kling also bakes Omni Native Audio into the same pass (English, Chinese, Japanese, Korean, Spanish), so dialogue lip-sync and ambience can ship without a separate audio chain.

View guide

How to Create an AI Short Film

Google

How to Create an AI Short Film with Google Veo 3.1

Try Google Veo 3.1 Free

Step-by-Step Guide

Describe ambient sound in the prompt itself

Lock the protagonist with a reference image

Pick Fast for blocking, Standard for hero shots

Use the Extend variant to assemble continuous clips

Mix dialogue lines into the prompt for in-pass lip-sync

Export the assembled timeline at 1080p

Prompt Examples

Opening with full ambient soundscape baked in. No separate SFX or music nodes needed.

Wide establishing shot of a remote cabin at dusk, rain on tin roof, distant thunder, fire crackles in wood stove, soft folk guitar in background, 8 seconds

Dialogue beat with in-pass lip-sync. Veo renders the spoken line synchronized to mouth movement.

Medium close-up. Character whispers: "We need to leave now." Soft golden hour light from camera right, ambient cricket sound, 6 seconds

First half of a long continuous chase. Pipe to Veo Extend for the next 8 seconds without a visible cut.

Continuous follow shot, character runs through wet forest at night, breathing heavily, leaves rustle, distant siren, handheld camera, 8 seconds (then continue with Veo Extend)

Parameter Tips

Veo 3.1 ambient sound renders in the same pass — describe the soundscape directly in the prompt.

Use Fast for blocking, Standard for hero shots — the audio fidelity gap is significant.

For dialogue, write the line in quotes inside the prompt; Veo renders lip-sync in-pass.

Veo 3.1 Extend is V2V-only — feed it an existing clip plus a continuation prompt for seamless multi-clip assembly.

Output is capped at 1080p — for 4K festival delivery, chain a video-upscale tool node downstream.

What to Expect

Use Google Veo 3.1 on Martini

Connect Google Veo 3.1 with other AI models on Martini's infinite canvas. No GPU required — start free.

Get Started Free

Related features

Docs

nodes/video

Try Other Models for This Task

OpenAI

Sora 2

View guide

Kling

Kling 3.0

View guide

How to Create an AI Short Film

How to Create an AI Short Film with Google Veo 3.1

Step-by-Step Guide

Describe ambient sound in the prompt itself

Lock the protagonist with a reference image

Pick Fast for blocking, Standard for hero shots

Use the Extend variant to assemble continuous clips

Mix dialogue lines into the prompt for in-pass lip-sync

Export the assembled timeline at 1080p

Prompt Examples

Parameter Tips

What to Expect

Use Google Veo 3.1 on Martini

Related features

Docs

Related reading

Try Other Models for This Task

Sora 2

Kling 3.0

This website uses cookies

How to Create an AI Short Film with Google Veo 3.1

Step-by-Step Guide

Describe ambient sound in the prompt itself

Lock the protagonist with a reference image

Pick Fast for blocking, Standard for hero shots

Use the Extend variant to assemble continuous clips

Mix dialogue lines into the prompt for in-pass lip-sync

Export the assembled timeline at 1080p

Prompt Examples

Parameter Tips

What to Expect

Use Google Veo 3.1 on Martini

Related features

Docs

Related reading

Try Other Models for This Task

Sora 2

Kling 3.0