Alibaba

HappyHorse 1.0

Name: HappyHorse 1.0
Author: Alibaba

HappyHorse 1.0 is Alibaba's flagship multimodal AI video model — a unified 15B-parameter Transformer that generates 1080p video with native synchronized audio from a single text or image prompt. It topped public benchmarks at launch with an Elo score of 1381, leading the second-place model by 107 points.

HappyHorse 1.0 supports Text-to-Video (T2V), Image-to-Video (I2V) and Subject-to-Video (S2V), letting you generate from a prompt, animate a still image, or insert a reference subject into a generated video while preserving identity. Output is up to 15 seconds of 1080p with multiple shots and synchronized audio — including lip-synced dialogue, ambient soundscapes and emotionally expressive vocal performances. Video editing capabilities include Video-to-Video (V2V) for restyling existing footage while preserving structure and motion, and Subject-and-Video-to-Video (SV2V) for replacing or inserting subjects from a reference image while keeping the original motion, composition and unaffected regions intact. Surfaced in Martini through the Geneasy provider with text-to-video and first-frame image-to-video modes.

Try HappyHorse 1.0 Free

Illustrative sample of a HappyHorse 1.0 still showing a multi-shot 1080p scene with an expressive speaking character implying native synchronized audio on the Martini canvas — Illustrative sample — representative output, not a verbatim model render

Capabilities

Text-to-Video

Image-to-Video

Video-to-Video

Reference Images

End Frame

Storyboard

Audio-Driven

Supported Aspect Ratios

16:99:161:14:33:4

Best For

Lip-synced dialogue and emotionally expressive vocal performance in generated video
Multi-shot 1080p video from a single text or image prompt
Multilingual content where native lip-sync matters
Story- and ad-style content needing voice + ambient sound out of the box
First-frame I2V workflows that animate a hero still into a finished clip

Strengths

Unified 15B-parameter Transformer with joint audio-video generation
1080p output up to 15 seconds with multi-shot composition
Multilingual lip-sync, ambient soundscapes and expressive vocal performance
Topped public leaderboards on launch (Elo 1381, +107 over runner-up)

Limitations

Prompts are capped at 5,000 non-Chinese characters or 2,500 CJK characters
Reference images for I2V must be 10 MB or smaller
On Martini today, only T2V and first-frame I2V are exposed; S2V/V2V/SV2V live on the upstream model
Single quality tier — no Standard/Pro split inside the model

Tips & Best Practices

Write the spoken dialogue verbatim in the prompt — HappyHorse will generate matching lip-synced audio in the same pass.

Describe the desired vocal emotion (calm, excited, whispered) so the model can shape the performance accordingly.

For first-frame I2V, upload a clean hero still under 10 MB; the model will animate it into a multi-shot clip.

Use HappyHorse 1.0 on Martini

Connect HappyHorse 1.0 with other AI models on Martini's infinite canvas. No GPU required — start free.

Get Started Free

This website uses cookies

We use cookies to keep Martini secure, remember your preferences, and, if you allow it, measure product performance. Read more

Strictly necessary

Required for authentication, security, payments, and core product flows.

Functionality

Remembers product preferences such as theme, language, and your most recent workspace.

Performance

Helps us understand product usage and site performance with PostHog, Vercel Analytics, Speed Insights, and Ahrefs.

Targeting

Allows marketing and advertising tags we may run through Google Tag Manager.

Alibaba

HappyHorse 1.0

Try HappyHorse 1.0 Free

Capabilities

Text-to-Video

Image-to-Video

Video-to-Video

Reference Images

End Frame

Storyboard

Audio-Driven

Supported Aspect Ratios

16:99:161:14:33:4

Best For

Lip-synced dialogue and emotionally expressive vocal performance in generated video
Multi-shot 1080p video from a single text or image prompt
Multilingual content where native lip-sync matters
Story- and ad-style content needing voice + ambient sound out of the box
First-frame I2V workflows that animate a hero still into a finished clip

Strengths

Unified 15B-parameter Transformer with joint audio-video generation
1080p output up to 15 seconds with multi-shot composition
Multilingual lip-sync, ambient soundscapes and expressive vocal performance
Topped public leaderboards on launch (Elo 1381, +107 over runner-up)

Limitations

Prompts are capped at 5,000 non-Chinese characters or 2,500 CJK characters
Reference images for I2V must be 10 MB or smaller
On Martini today, only T2V and first-frame I2V are exposed; S2V/V2V/SV2V live on the upstream model
Single quality tier — no Standard/Pro split inside the model

Tips & Best Practices

Write the spoken dialogue verbatim in the prompt — HappyHorse will generate matching lip-synced audio in the same pass.

Describe the desired vocal emotion (calm, excited, whispered) so the model can shape the performance accordingly.

For first-frame I2V, upload a clean hero still under 10 MB; the model will animate it into a multi-shot clip.

Use HappyHorse 1.0 on Martini

Connect HappyHorse 1.0 with other AI models on Martini's infinite canvas. No GPU required — start free.

Get Started Free

HappyHorse 1.0

Capabilities

Supported Aspect Ratios

Best For

Strengths

Limitations

Tips & Best Practices

Use HappyHorse 1.0 on Martini

Related Video Models

Wan

Sora 2

Seedance 2

This website uses cookies

HappyHorse 1.0

Capabilities

Supported Aspect Ratios

Best For

Strengths

Limitations

Tips & Best Practices

Use HappyHorse 1.0 on Martini

Related Video Models

Wan

Sora 2

Seedance 2