Issue Vol. 01 · 2026

Compiled May 2026

Models reviewed 10

Reading time ≈ 14 min

Bias disclosure No paid placements

AI video, after the hype.
Ten models. One verdict.

Every AI video tool now claims to be cinematic, real-time, and 4K. Most are exactly one of those things. This is the working field guide we wish someone had handed us when we started shipping AI video for clients — every serious model in 2026, what it is actually good at, where it lies to you, and which one to pick for the brief on your desk right now.

Read the reviews Jump to comparison

Editor's note

In December 2024, OpenAI released Sora. By the end of 2025, every major lab, every Chinese internet giant, and a small fleet of well-funded startups had shipped competing models. By spring 2026 the question stopped being "can AI generate video?" and became "which one of these ten tools does the specific job in front of me, and at what cost?"

That is the only question this guide tries to answer. We do not pretend a single model rules all briefs — every section below ends with a best for and a skip if. If you want a marketing video, scroll to the comparison table. If you want to actually understand the trade-offs, read the reviews in order.

§ 01 · At a glance

Specs, side by side. The honest grid.

Model	Tier	Max length	Max res.	Native audio	Img→Vid	Vid→Vid	Access
Sora 2OpenAI	Frontier	60 s	1080p	●	●	○	ChatGPT · API*
Veo 3.1Google DeepMind	Frontier	8 s	4K	●	●	via Flow	Gemini · Vertex
Gen-4.5 · AlephRunway	Frontier	10 s	1080p	partial	●	●	Web · API
Kling 2.xKuaishou	Frontier	120 s	1080p	○	●	limited	Web · API
PikaPika Labs	Creator	10 s	1080p	lip-sync	●	effects	Web · Discord
Ray 2Luma	Frontier	10 s	1080p	●	●	●	Web · API
HailuoMiniMax	Creator	10 s	1080p	partial	●	limited	Web · API
Seedance 2.0ByteDance	Creator	15 s	1080p	partial	●	○	Web · API
LTX-2Lightricks	Open	60 s	1080p	●	●	●	HF · self-host
Wan 2.xAlibaba	Open	10 s	1080p	partial	●	limited	HF · self-host

● = supported · ○ = not on the main path · * Sora access has been in transition through 2026 — verify current availability with the vendor before procurement.

"The best model in 2026 is the one you can render today, legally, in the format your edit needs."

Heplony Field Guide · Editor's rule

§ 02 · Reviews

Ten models. What they are, where they lie.

01 / 10 · Frontier

Sora 2

OpenAI · Released Sep 2025

Access in transition

Sora is the model that lit the fuse. The original Sora landed in December 2024 as a research preview; Sora 2 followed in September 2025 inside a dedicated, social-feeling iOS app — clips up to a minute, native audio dialogue and ambience, and a cameo system that lets the model render a specific person's likeness on demand.

On a good prompt, Sora 2 still produces the most cinematic single takes of any model in this guide. Prompt adherence on hard briefs — moving cameras, multiple subjects, consistent lighting across a shot — beats most competitors. The catch in 2026 is access. OpenAI has been consolidating its product surface, the dedicated apps are no longer the recommended path, and the standalone API has been on a public wind-down trajectory. Treat any "stable Sora pipeline" claim with skepticism until you have written contracts in hand.

Best for

Hero shots, mood reels, anything where a single 10–60-second cinematic take has to land.

Skip if

You need a contracted, SLA-backed API today, or your edit is volume-heavy social cuts.

Lengthup to 60 s

Resolution1080p

Native audioyes

Licenseproprietary

Tierpremium

ReleasedSep 2025

Pros

Best-in-class prompt adherence on cinematic scenes
Native audio: dialogue, ambience, SFX
Cameo system for likeness-based generation
Multi-shot continuity inside a single prompt

Cons

Access path is unstable — apps sunset, API in flux
Visible watermark on most consumer tiers
Strict content filters block many commercial briefs
Throughput drops during peak hours

02 / 10 · Frontier

Veo 3.1

Google DeepMind · Veo 3 in May 2025, 3.1 later that year

Generally available

Veo 3 was the first frontier model to take native audio seriously, and Veo 3.1 cemented Google's lead there. Dialogue, score, ambience, and Foley arrive in the same file you generated — no post-pass through ElevenLabs or a stock library. 4K output is real, not upscaled, which matters the day a client asks for broadcast deliverables.

The trade-off is the eight-second clip cap. Long-form means stitching clips inside Flow, Google's filmmaking surface — workable for cuts, painful for a single uninterrupted take. Pricing also tilts enterprise: real production work tends to converge on the $250/mo Ultra tier or a Vertex AI contract. Both are fine if you are an agency; both feel like a wall if you are a solo creator.

Best for

Brand films, product reveals, anything where on-clip audio and 4K delivery matter.

Skip if

Your edit needs single takes longer than eight seconds, or your team cannot live with enterprise gating.

Length8 s per clip

Resolutionup to 4K

Native audioyes · best in class

Licenseproprietary

Tierenterprise + consumer

ReleasedMay 2025 · 3.1 follow-up

Pros

Native audio quality is best in class
True 4K master delivery — not upscaled
Camera-control vocabulary actually works
Vertex AI route makes enterprise procurement clean

Cons

Hard 8-second clip cap
Production-grade use concentrates in the Ultra tier
Strict policy on real-person likeness
Latency varies by region and time of day

03 / 10 · Frontier

Gen-4.5 · Aleph

Runway · Gen-4 in 2025, Gen-4.5 the follow-up

Generally available

Runway has been doing this longer than anyone, and Gen-4.5 keeps the company's reputation for character and world consistency intact. Build a reference image, drop it into a scene, and you can carry that face, that costume, that lighting across multiple shots without the model forgetting. That alone is why agencies running episodic or branded character work still default to Runway.

Aleph is the second weapon: a real video-to-video model that takes existing footage and re-renders it in a new style, with a new subject, under new lighting — without the temporal jitter that killed early V2V attempts. Combined with Act-One and Act-Two for performance capture, Runway has assembled the most production-mature creative surface in the market. The catch is credit economics: iteration burns currency fast.

Best for

Episodic content, character-driven films, agencies doing reference-based generation.

Skip if

You need cheap-and-fast social cuts or 4K master delivery.

Lengthup to 10 s

Resolution1080p

Native audiopartial

Vid → Vidyes · Aleph

Tierstudio + enterprise

ReleasedGen-4.5 (2025)

Pros

Character & world consistency is the category leader
Aleph: true video-to-video, at quality
Mature filmmaker UI: timelines, refs, motion brushes
Act-One / Act-Two unlock real performance capture

Cons

Credit-burn economics on iteration
Native audio trails Veo and Sora
1080p ceiling on the main path
High prompt-engineering ceiling for top output

04 / 10 · Frontier

Kling 2.x

Kuaishou · 1.0 in June 2024; 2.x family through 2025

Generally available

Kling is the model the rest of the industry quietly downloads to benchmark against. Two-minute generations at 1080p / 30fps are uncommon — most competitors cap clips at 10 seconds, then ask you to stitch. Kling renders the long take. Physical realism is also disproportionately strong: water, hair, fabric, cloth simulation, the small physics the eye flags first.

The friction is operational. Kling's English-language UX has improved but historically required Chinese-side onboarding. There is no native audio path — every Kling clip needs sound mixed in after the fact. Content policy is real but opaque, and shifts without much warning. If your compliance team will not greenlight a Chinese pipeline, Kling is off the table regardless of how good the renders are.

Best for

Two-minute scenes, realistic motion, image-to-video at production length.

Skip if

You need audio baked in, or compliance will not sign off on a China-hosted pipeline.

Lengthup to 120 s

Resolution1080p @ 30fps

Native audiono

Licenseproprietary

Tierconsumer + studio

ReleasedJune 2024

Pros

Up to two-minute generations — rare
Strong on water, hair, fabric, complex motion
Image-to-video produces convincing one-still results
Aggressive pricing vs Western peers

Cons

Account creation historically gated by region
No native audio
Content policy is opaque, changes without notice
Long renders cost real wall-clock time

05 / 10 · Creator

Pika · Pikaformance

Pika Labs · v1 in 2023; Pikaformance era 2025–2026

Generally available

Pika took a different bet from the frontier labs: instead of chasing benchmark wins, it built the lowest possible friction from prompt to first usable clip. Pikaffects — one-tap effects like "explode this", "melt this", "inflate this" — let non-technical users get something fun in 30 seconds. Pikaformance, the current generation, adds convincing lip-sync to arbitrary audio.

That is also the ceiling. Pika is not the tool you reach for when the brief is broadcast or long-form cinematic. Prompt adherence is good not great compared to Veo or Sora; output ceiling sits below open-source LTX-2 on hero shots. But for social-first content, talking-head reactions, and meme-velocity output, no competitor matches Pika's time-to-first-good-clip.

Best for

Social-first creators, meme-velocity content, talking-head shorts with lip-sync.

Skip if

Your brief is broadcast spots or anything over ~10 seconds without obvious cuts.

Lengthup to 10 s

Resolution1080p

Native audiolip-sync + SFX

Licenseproprietary

Tierconsumer

Released2023, ongoing

Pros

Lowest time-to-first-good-clip of any tool here
Pikaffects unlock non-technical users instantly
Pikaformance lip-sync rivals dedicated tools
Near-real-time on shorter clips

Cons

Short clips only — long-form is not its game
Prompt adherence trails frontier models
Output ceiling below Veo / LTX-2
Brand-safe rendering needs curation

06 / 10 · Frontier

Ray 2 · Dream Machine · Agents

Luma · Dream Machine in 2024; Ray 2 + Agents in 2025

Generally available

Luma sells a system, not a model. Ray 2 is the core video engine — fluid, physically plausible motion that handles momentum and inertia better than most peers. But the actual product is Luma Agents, an orchestration layer that runs image, video, audio, and copy generators against a single brief, holding consistency across them.

The pricing ladder — $30 / $90 / $300 per month — makes procurement a one-page conversation, which is rare in this category. Named agency customers (Publicis, Serviceplan, Dentsu) suggest the team has built the contractual surface that agencies actually need. The trade-off: if you only need single text-to-video and want the absolute lowest cost-per-clip, Luma is not the cheapest seat.

Best for

Agency teams running multi-modal campaigns who want one orchestrated workflow.

Skip if

You only need one-shot text-to-video at the lowest possible cost per clip.

Lengthup to 10 s

Resolution1080p

Native audioyes

Licenseproprietary

Tier$30 / $90 / $300

ReleasedRay 2 in 2025

Pros

Ray 2 motion is fluid and physically plausible
Agents orchestrate image + video + audio + copy
Clear pricing ladder — procurement is trivial
Named agency adoption (Publicis, Dentsu)

Cons

10-second cap — long-form needs stitching
API throughput tighter than Runway at same tier
Sometimes prioritises smoothness over detail
Full Agents stack needs the Ultra plan

07 / 10 · Creator

Hailuo

MiniMax · 2024 launch; rapid iteration 2025–2026

Generally available

Hailuo's bet is verticalisation. Instead of one prompt box, the platform ships a dozen pre-built tools — PetPal, BabyForm, ASMR Generator, Style Switch, CineScope, Ads Generator. Each one wraps the model in a use-case-specific UI that strips out prompt-craft entirely. For consumer-creator audiences this is enormous; for prompt-fluent power users it is incidental.

The underlying model is also genuinely strong on physical realism, particularly faces and bodies. Hailuo Agent automates multi-tool workflows — generate the still, animate it, swap the style, ship. The English documentation lags the Chinese-language version, and enterprise contracting outside China is harder than with Western vendors. As a creator product, that is fine. As an agency procurement decision, it is friction.

Best for

Creator economy, social, persona videos, and "verticalised" generators.

Skip if

You need a Western-vendor contract and SLA-grade support.

Lengthup to 10 s

Resolution1080p

Native audiopartial

Licenseproprietary

Tierconsumer + studio

Released2024, iterating

Pros

Strong physical realism on faces and bodies
Verticalised tools (PetPal, ASMR, etc.) lower the barrier
Hailuo Agent automates multi-tool workflows
Image-to-video holds quality at scale

Cons

English documentation lags Chinese
Pricing on heavy use is not transparent
Opaque content policy enforcement
Western enterprise contracting is harder

08 / 10 · Creator

Seedance 2.0

ByteDance Seed · Released Feb 2026

Generally available

Seedance 2.0 was the first big release of 2026. It is also the most controversial: shortly after launch, public reporting flagged copyright-infringement concerns on training data, which is the kind of disclosure that turns enterprise procurement teams cold immediately. For consumer-creator use the question is fuzzier; for any commercial work where IP indemnification matters, it is decisive.

On the model side, Seedance is unusually good at human motion, dance especially — the training signal is obvious if you watch a few outputs. Five to fifteen seconds is the operating range. Turnaround is quick versus frontier peers. If you live inside the ByteDance ecosystem already (TikTok-adjacent pipelines, CapCut), the integration story is real. Outside it, the legal cloud is the deciding factor.

Best for

Short-form social, motion-heavy content where dance and bodies are the subject.

Skip if

Your legal team is allergic to unresolved training-data provenance questions.

Length5–15 s

Resolution1080p

Native audiopartial

Licenseproprietary

Tierconsumer + studio

ReleasedFeb 2026

Pros

Strong dance & human-motion fidelity
5–15 s covers most short-form needs
Fast turnaround vs frontier competitors
TikTok-adjacent pipeline integration

Cons

Public reporting flags copyright-infringement concerns
Thin English documentation
No native audio on the main path
Enterprise story unclear outside China

09 / 10 · Open source

LTX-2

Lightricks · Released October 2025 · open weights

Generally available

LTX-2 is the open-source moment of 2025 — open weights, 60-second native generations, and audio built into the model rather than bolted on after. If you have GPUs and an engineering team, you can stand up an in-house pipeline today that escapes per-clip pricing entirely. That is not just a financial argument; it is a sovereignty argument for anyone whose data cannot leave the building.

The trade-offs are honest. Self-hosting requires real GPU infrastructure to be production-grade; "runs on a consumer card" and "runs at production throughput" are different sentences. The hosted LTX Studio offers a managed path at a cost below Runway's, but with fewer creative-control surfaces. Output ceiling on hero shots still sits below Veo 3.1 or Sora 2. For most enterprise teams, LTX-2 is the right second pipeline, not the only pipeline.

Best for

Teams that need ownership, on-prem deployment, or escape from per-clip pricing.

Skip if

You have no GPU budget and zero appetite for self-hosting.

Lengthup to 60 s

Resolution1080p

Native audioyes

Licenseopen source

Tierfree + studio

ReleasedOct 2025

Pros

Open weights — full pipeline ownership
60-second generations natively, not stitched
Native audio built into the model
Strong VRAM efficiency for an open model

Cons

Self-hosting needs real GPU infrastructure
Hosted LTX Studio has fewer creative surfaces
Hero-shot ceiling below Veo / Sora
Smaller plugin ecosystem

10 / 10 · Open source

Wan 2.x

Alibaba · Ongoing 2.x release cadence in 2025

Generally available

Wan is the other open-source story worth taking seriously. Alibaba has been releasing checkpoints on Hugging Face and ModelScope with a cadence and quality that surprised the open community. Multilingual prompt understanding is a quiet strength — Mandarin briefs translate to clean visual intent in a way most Western-trained models still fumble.

Quality on image-to-video is the headline. On a single still, Wan produces motion that looks frontier-adjacent — the first time the open ecosystem has been able to make that claim with a straight face. Capping at ten seconds is a real limit, and going from research checkpoint to production pipeline is the same engineering project as with any open model. If your team is Mandarin-fluent or self-host-capable, Wan belongs in your stack.

Best for

Self-hosted teams who want frontier-class quality without proprietary lock-in.

Skip if

You need a turnkey hosted product with a Western support contract.

Lengthup to 10 s

Resolution1080p

Native audiopartial

Licenseopen source

Tierfree + cloud

Released2025, ongoing 2.x

Pros

Frontier-adjacent quality from an open model
Multilingual prompt understanding
Image-to-video quality surprises the first time
Active community on HF and ModelScope

Cons

10-second cap on the main path
Documentation skews Chinese-language
Research-to-production is real engineering
Native audio still partial

§ 03 · Verdict

Six briefs. Six picks. No hedging.

Cinematic hero spot

→ Sora 2 / Veo 3.1

Sora 2 if you have stable access and want the most cinematic single take. Veo 3.1 if delivery includes audio on the file and 4K masters.

Fallback · Kling 2.x for length

Character-driven series

→ Runway Gen-4.5

Reference-image consistency across shots, Aleph for re-styling existing footage, Act-One/Two for performance capture. Nothing else in this guide ships the same trinity.

Fallback · Luma Ray 2

Social-first, meme-velocity

→ Pika

Pikaffects make non-prompt users productive immediately. Pikaformance lip-sync is good enough that audiences stop noticing. Time-to-clip wins this category.

Fallback · Seedance 2.0 for dance

Multi-modal agency campaign

→ Luma Agents

When the brief is image + video + audio + copy held to one creative direction, Luma Agents is the only tool that orchestrates the whole brief end-to-end at the $90-$300 tier.

Fallback · Runway for video-first

Self-hosted & sovereign

→ LTX-2

Open weights, 60-second native output, audio in the model. If your data cannot leave the building or your CFO is tired of credit invoices, this is the answer.

Fallback · Wan 2.x on image-to-video

Image-to-video at length

→ Kling 2.x

Two-minute takes at 1080p from a single still beats every Western competitor on length, and physical realism on hair/water/fabric is class-leading. Mix in audio after.

Fallback · Hailuo for faces

§ 04 · How to choose

Four questions, in order, before you sign anything.

Does the deliverable need on-clip audio?

If yes, the field narrows to Veo 3.1, Sora 2, Luma Ray 2, or LTX-2. Everything else needs audio mixed in post. Veo wins quality; LTX-2 wins ownership.

Is the single take longer than ten seconds?

If yes, you have three serious options: Sora 2 (60 s, access permitting), Kling 2.x (120 s, no native audio), or LTX-2 (60 s, open source). Everyone else makes you stitch.

Will legal sign off on training-data provenance?

If your client is a regulated brand or your jurisdiction enforces IP indemnification, be very careful with Seedance 2.0 and verify provenance language in every vendor contract. Veo and Runway have the cleanest enterprise paper.

Is per-clip cost or sovereignty the binding constraint?

If yes, go open-source: LTX-2 or Wan 2.x. The day-one savings vanish into GPU bills, but at volume the unit economics flip — and so does the data-sovereignty story.

§ 05 · FAQ

Honest answers to the boring questions.

Is there a single "best" AI video model in 2026?

No, and anyone selling you one is selling a category, not a tool. Veo 3.1 leads on audio and 4K; Sora 2 leads on cinematic prompt adherence (when accessible); Runway Gen-4.5 leads on character consistency; Kling leads on length; LTX-2 leads on ownership. Pick the leader for your specific binding constraint.

Can these models be used in commercial work without legal risk?

It depends on the vendor and the brief. Frontier Western vendors (Google, Runway, Luma, OpenAI) ship explicit enterprise terms with varying degrees of IP indemnification — read them. Chinese vendors are operationally fine for many use cases but the contractual surface is thinner outside China. Seedance 2.0 currently carries public copyright-infringement concerns that warrant extra caution.

If your client is regulated — finance, pharma, defence — default to vendors who indemnify their training data in writing.

How long until AI video replaces traditional production?

It is replacing parts of it already — concept films, mood reels, illustrative B-roll, ad variants, presenter videos for internal comms. It is not replacing scripted episodic drama or location-dependent broadcast work, and probably will not by the end of the decade. The honest framing is that AI video is a new medium, not a substitute medium. Treat it that way.

What about avatar tools — Synthesia, HeyGen, D-ID?

Different category. Avatar tools generate a presenter speaking your script — useful for training, internal comms, localised announcements. They are not what you reach for when the brief is cinematic narrative or product film. Synthesia and HeyGen are the two we see in agency stacks; if your only need is a multilingual presenter, look there instead of at the models in this guide.

How do you handle audio when the model does not generate it?

Standard pipeline: dialogue via ElevenLabs, score via Suno or AIVA, ambience via library, Foley via library, master in Resolve or DaVinci. For lip-sync to existing video, Pika's Pikaformance and Runway's Act-Two both work; the choice between them is workflow taste.

How current is this guide?

Compiled May 2026. The category moves weekly — a fact stated here may be wrong by the time you act on it. For any procurement decision over a few thousand dollars, verify with the vendor's current docs before signing. We will republish the guide when the category moves materially.

AI video, after the hype. Ten models. One verdict.

AI video, after the hype.
Ten models. One verdict.