ByteDance
Seedance 2 Omni adds character reference images to a generation that already accepts up to 12 reference assets — a unique combo of identity lock plus broad multimodal context (audio reference, location reference, palette reference). For an AI influencer producer running high-volume content where each episode varies wardrobe, location, and mood while identity stays anchored, Seedance Omni delivers strong per-clip Sutui economics. It is the pragmatic middle option between Vidu Q2 (densest reference) and Kling O3 Reference (tightest choreography).
Seedance Omni's 12-asset reference slot is what differentiates it. Build the slot: 4 character references (front, three-quarter, profile, expressive) + 1 wardrobe reference + 1 location reference + 1 palette swatch + 1 audio tone reference. The character references anchor identity; the others anchor episode-specific context. Vidu and Kling do not accept this multimodal density.
Seedance 2.0 Omni Pro is the variant that combines all capabilities: text-to-video, image-to-video, video-to-video, and reference-driven generation. For a brand spokesperson running across multiple shot types in one campaign, Omni Pro is the right pick. Omni Premium is faster when batch turnaround matters; Standard for cost-aware drafts.
For an episodic series, the 4 character references stay fixed across episodes. Only the wardrobe ref + location ref + palette + audio tone slots vary. Save the canvas with the character references locked, then swap only the context references per episode. This is the canvas-as-template pattern at its cleanest.
Seedance reads camera + motion language well. "Slow forward dolly toward Mia, soft golden hour rim light, 1:1 aspect, 6 seconds." Don't describe Mia's face — the references handle that. Describe the camera, the light, the action she takes, and the duration. Cinematography vocabulary unlocks Seedance's motion engine.
Seedance Omni's audio reference slot biases the in-pass ambient sound. For a series with consistent sonic branding (always warm cafe ambience for the lifestyle vlog series), pin one audio tone reference and reuse across all episodes. Audio reads consistent across the series automatically.
Seedance supports six aspect ratios. Render the same character + context setup in 1:1 (Meta), 9:16 (TikTok/Reels), and 16:9 (YouTube) by duplicating the node and changing the aspect parameter. The references stay; the aspect changes. Identity reads identical across formats.
Identity-locked Meta cutdown. The references do all the visual heavy lifting; the prompt handles camera and timing.
Slow forward dolly toward Mia, soft golden hour rim light, 1:1 aspect, 6 seconds. Wardrobe and location from references.
Vertical TikTok cutdown using the audio tone reference for ambience continuity.
Mia walks left to right through the location, slight handheld breathing, ambient cafe sound from audio reference, 9:16 vertical, 5 seconds
Wide YouTube cut with palette guidance from the swatch reference.
Mia in profile, slow turn toward camera, palette and lighting from palette swatch reference, 16:9, 8 seconds
Stack 4 character references + context references (wardrobe, location, palette, audio tone) in the 12-slot Omni reference set.
Save the canvas with character references locked; swap only context refs per episode for the cleanest template pattern.
Don't describe the character's face in prompts — Seedance reads identity from the reference stack.
Use the audio tone reference for series-consistent ambience without a separate audio chain.
Render the same setup in multiple aspects by duplicating the node, not by re-prompting from scratch.
Pick Omni Pro for the densest multi-modal work; Omni Premium when batch speed matters; Standard for drafts.
Seedance 2 Omni outputs 4-15 second clips at 1080p across six aspect ratios with strong character identity from the 4-image reference + multimodal context. Render times: Standard 60-120s, Pro 90-180s. Best for high-volume episodic content where audio + palette + location must hold while identity locks. For tightest identity at maximum reference density use Vidu Q2; for choreographed action with native dialogue lip-sync use Kling O3 Reference; Seedance is the pragmatic middle pick.
Connect Seedance 2.0 with other AI models on Martini's infinite canvas. No GPU required — start free.
Get Started FreeVidu
Vidu Q2 Subject Ref accepts 1-7 character reference images per generation — the densest character-reference slot among the three models in this scenario. For an AI influencer producer keeping "Mia" identical across a 12-week content series, that 7-image character sheet (front, three-quarter, profile, full-body, hands, expression range) gives Vidu more identity vectors than any single-anchor model. The result is the strongest face/jaw/hairline lock across multiple shots, especially when wardrobe and location vary.
View guideKling
Kling O3 Reference adds character reference images for consistent appearance across clips and supports voice control over individual elements. Sharing the Kling 3.0 backbone (native 4K, 16-bit HDR, Omni Native Audio), it is the right pick when an AI influencer or brand spokesperson needs to deliver lip-synced dialogue across multiple cuts at festival-grade detail. The reference is stronger than Vidu on choreographed tight action; less reference-dense than Vidu Q2 (Vidu accepts 7, Kling O3 Reference reads fewer with stricter ranking).
View guide