Issue Vol. 01 · 2026
Compiled May 2026
Models reviewed 10
Reading time ≈ 14 min
Bias disclosure No paid placements

AI video, after the hype.
Ten models. One verdict.

Every AI video tool now claims to be cinematic, real-time, and 4K. Most are exactly one of those things. This is the working field guide we wish someone had handed us when we started shipping AI video for clients — every serious model in 2026, what it is actually good at, where it lies to you, and which one to pick for the brief on your desk right now.

Editor's note

In December 2024, OpenAI released Sora. By the end of 2025, every major lab, every Chinese internet giant, and a small fleet of well-funded startups had shipped competing models. By spring 2026 the question stopped being "can AI generate video?" and became "which one of these ten tools does the specific job in front of me, and at what cost?"

That is the only question this guide tries to answer. We do not pretend a single model rules all briefs — every section below ends with a best for and a skip if. If you want a marketing video, scroll to the comparison table. If you want to actually understand the trade-offs, read the reviews in order.

§ 01 · At a glance

Specs, side by side. The honest grid.

Model Tier Max length Max res. Native audio Img→Vid Vid→Vid Access
Sora 2OpenAI
Frontier 60 s 1080p ChatGPT · API*
Veo 3.1Google DeepMind
Frontier 8 s 4K via Flow Gemini · Vertex
Gen-4.5 · AlephRunway
Frontier 10 s 1080p partial Web · API
Kling 2.xKuaishou
Frontier 120 s 1080p limited Web · API
PikaPika Labs
Creator 10 s 1080p lip-sync effects Web · Discord
Ray 2Luma
Frontier 10 s 1080p Web · API
HailuoMiniMax
Creator 10 s 1080p partial limited Web · API
Seedance 2.0ByteDance
Creator 15 s 1080p partial Web · API
LTX-2Lightricks
Open 60 s 1080p HF · self-host
Wan 2.xAlibaba
Open 10 s 1080p partial limited HF · self-host

● = supported  ·  ○ = not on the main path  ·  * Sora access has been in transition through 2026 — verify current availability with the vendor before procurement.

"The best model in 2026 is the one you can render today, legally, in the format your edit needs."

Heplony Field Guide · Editor's rule
§ 02 · Reviews

Ten models. What they are, where they lie.

01 / 10 · Frontier

Sora 2

OpenAI · Released Sep 2025
Access in transition

Sora is the model that lit the fuse. The original Sora landed in December 2024 as a research preview; Sora 2 followed in September 2025 inside a dedicated, social-feeling iOS app — clips up to a minute, native audio dialogue and ambience, and a cameo system that lets the model render a specific person's likeness on demand.

On a good prompt, Sora 2 still produces the most cinematic single takes of any model in this guide. Prompt adherence on hard briefs — moving cameras, multiple subjects, consistent lighting across a shot — beats most competitors. The catch in 2026 is access. OpenAI has been consolidating its product surface, the dedicated apps are no longer the recommended path, and the standalone API has been on a public wind-down trajectory. Treat any "stable Sora pipeline" claim with skepticism until you have written contracts in hand.

Best for

Hero shots, mood reels, anything where a single 10–60-second cinematic take has to land.

Skip if

You need a contracted, SLA-backed API today, or your edit is volume-heavy social cuts.

02 / 10 · Frontier

Veo 3.1

Google DeepMind · Veo 3 in May 2025, 3.1 later that year
Generally available

Veo 3 was the first frontier model to take native audio seriously, and Veo 3.1 cemented Google's lead there. Dialogue, score, ambience, and Foley arrive in the same file you generated — no post-pass through ElevenLabs or a stock library. 4K output is real, not upscaled, which matters the day a client asks for broadcast deliverables.

The trade-off is the eight-second clip cap. Long-form means stitching clips inside Flow, Google's filmmaking surface — workable for cuts, painful for a single uninterrupted take. Pricing also tilts enterprise: real production work tends to converge on the $250/mo Ultra tier or a Vertex AI contract. Both are fine if you are an agency; both feel like a wall if you are a solo creator.

Best for

Brand films, product reveals, anything where on-clip audio and 4K delivery matter.

Skip if

Your edit needs single takes longer than eight seconds, or your team cannot live with enterprise gating.

03 / 10 · Frontier

Gen-4.5 · Aleph

Runway · Gen-4 in 2025, Gen-4.5 the follow-up
Generally available

Runway has been doing this longer than anyone, and Gen-4.5 keeps the company's reputation for character and world consistency intact. Build a reference image, drop it into a scene, and you can carry that face, that costume, that lighting across multiple shots without the model forgetting. That alone is why agencies running episodic or branded character work still default to Runway.

Aleph is the second weapon: a real video-to-video model that takes existing footage and re-renders it in a new style, with a new subject, under new lighting — without the temporal jitter that killed early V2V attempts. Combined with Act-One and Act-Two for performance capture, Runway has assembled the most production-mature creative surface in the market. The catch is credit economics: iteration burns currency fast.

Best for

Episodic content, character-driven films, agencies doing reference-based generation.

Skip if

You need cheap-and-fast social cuts or 4K master delivery.

04 / 10 · Frontier

Kling 2.x

Kuaishou · 1.0 in June 2024; 2.x family through 2025
Generally available

Kling is the model the rest of the industry quietly downloads to benchmark against. Two-minute generations at 1080p / 30fps are uncommon — most competitors cap clips at 10 seconds, then ask you to stitch. Kling renders the long take. Physical realism is also disproportionately strong: water, hair, fabric, cloth simulation, the small physics the eye flags first.

The friction is operational. Kling's English-language UX has improved but historically required Chinese-side onboarding. There is no native audio path — every Kling clip needs sound mixed in after the fact. Content policy is real but opaque, and shifts without much warning. If your compliance team will not greenlight a Chinese pipeline, Kling is off the table regardless of how good the renders are.

Best for

Two-minute scenes, realistic motion, image-to-video at production length.

Skip if

You need audio baked in, or compliance will not sign off on a China-hosted pipeline.

05 / 10 · Creator

Pika · Pikaformance

Pika Labs · v1 in 2023; Pikaformance era 2025–2026
Generally available

Pika took a different bet from the frontier labs: instead of chasing benchmark wins, it built the lowest possible friction from prompt to first usable clip. Pikaffects — one-tap effects like "explode this", "melt this", "inflate this" — let non-technical users get something fun in 30 seconds. Pikaformance, the current generation, adds convincing lip-sync to arbitrary audio.

That is also the ceiling. Pika is not the tool you reach for when the brief is broadcast or long-form cinematic. Prompt adherence is good not great compared to Veo or Sora; output ceiling sits below open-source LTX-2 on hero shots. But for social-first content, talking-head reactions, and meme-velocity output, no competitor matches Pika's time-to-first-good-clip.

Best for

Social-first creators, meme-velocity content, talking-head shorts with lip-sync.

Skip if

Your brief is broadcast spots or anything over ~10 seconds without obvious cuts.

06 / 10 · Frontier

Ray 2 · Dream Machine · Agents

Luma · Dream Machine in 2024; Ray 2 + Agents in 2025
Generally available

Luma sells a system, not a model. Ray 2 is the core video engine — fluid, physically plausible motion that handles momentum and inertia better than most peers. But the actual product is Luma Agents, an orchestration layer that runs image, video, audio, and copy generators against a single brief, holding consistency across them.

The pricing ladder — $30 / $90 / $300 per month — makes procurement a one-page conversation, which is rare in this category. Named agency customers (Publicis, Serviceplan, Dentsu) suggest the team has built the contractual surface that agencies actually need. The trade-off: if you only need single text-to-video and want the absolute lowest cost-per-clip, Luma is not the cheapest seat.

Best for

Agency teams running multi-modal campaigns who want one orchestrated workflow.

Skip if

You only need one-shot text-to-video at the lowest possible cost per clip.

07 / 10 · Creator

Hailuo

MiniMax · 2024 launch; rapid iteration 2025–2026
Generally available

Hailuo's bet is verticalisation. Instead of one prompt box, the platform ships a dozen pre-built tools — PetPal, BabyForm, ASMR Generator, Style Switch, CineScope, Ads Generator. Each one wraps the model in a use-case-specific UI that strips out prompt-craft entirely. For consumer-creator audiences this is enormous; for prompt-fluent power users it is incidental.

The underlying model is also genuinely strong on physical realism, particularly faces and bodies. Hailuo Agent automates multi-tool workflows — generate the still, animate it, swap the style, ship. The English documentation lags the Chinese-language version, and enterprise contracting outside China is harder than with Western vendors. As a creator product, that is fine. As an agency procurement decision, it is friction.

Best for

Creator economy, social, persona videos, and "verticalised" generators.

Skip if

You need a Western-vendor contract and SLA-grade support.

08 / 10 · Creator

Seedance 2.0

ByteDance Seed · Released Feb 2026
Generally available

Seedance 2.0 was the first big release of 2026. It is also the most controversial: shortly after launch, public reporting flagged copyright-infringement concerns on training data, which is the kind of disclosure that turns enterprise procurement teams cold immediately. For consumer-creator use the question is fuzzier; for any commercial work where IP indemnification matters, it is decisive.

On the model side, Seedance is unusually good at human motion, dance especially — the training signal is obvious if you watch a few outputs. Five to fifteen seconds is the operating range. Turnaround is quick versus frontier peers. If you live inside the ByteDance ecosystem already (TikTok-adjacent pipelines, CapCut), the integration story is real. Outside it, the legal cloud is the deciding factor.

Best for

Short-form social, motion-heavy content where dance and bodies are the subject.

Skip if

Your legal team is allergic to unresolved training-data provenance questions.

09 / 10 · Open source

LTX-2

Lightricks · Released October 2025 · open weights
Generally available

LTX-2 is the open-source moment of 2025 — open weights, 60-second native generations, and audio built into the model rather than bolted on after. If you have GPUs and an engineering team, you can stand up an in-house pipeline today that escapes per-clip pricing entirely. That is not just a financial argument; it is a sovereignty argument for anyone whose data cannot leave the building.

The trade-offs are honest. Self-hosting requires real GPU infrastructure to be production-grade; "runs on a consumer card" and "runs at production throughput" are different sentences. The hosted LTX Studio offers a managed path at a cost below Runway's, but with fewer creative-control surfaces. Output ceiling on hero shots still sits below Veo 3.1 or Sora 2. For most enterprise teams, LTX-2 is the right second pipeline, not the only pipeline.

Best for

Teams that need ownership, on-prem deployment, or escape from per-clip pricing.

Skip if

You have no GPU budget and zero appetite for self-hosting.

10 / 10 · Open source

Wan 2.x

Alibaba · Ongoing 2.x release cadence in 2025
Generally available

Wan is the other open-source story worth taking seriously. Alibaba has been releasing checkpoints on Hugging Face and ModelScope with a cadence and quality that surprised the open community. Multilingual prompt understanding is a quiet strength — Mandarin briefs translate to clean visual intent in a way most Western-trained models still fumble.

Quality on image-to-video is the headline. On a single still, Wan produces motion that looks frontier-adjacent — the first time the open ecosystem has been able to make that claim with a straight face. Capping at ten seconds is a real limit, and going from research checkpoint to production pipeline is the same engineering project as with any open model. If your team is Mandarin-fluent or self-host-capable, Wan belongs in your stack.

Best for

Self-hosted teams who want frontier-class quality without proprietary lock-in.

Skip if

You need a turnkey hosted product with a Western support contract.

§ 03 · Verdict

Six briefs. Six picks. No hedging.

Cinematic hero spot
→ Sora 2 / Veo 3.1

Sora 2 if you have stable access and want the most cinematic single take. Veo 3.1 if delivery includes audio on the file and 4K masters.

Fallback · Kling 2.x for length
Character-driven series
→ Runway Gen-4.5

Reference-image consistency across shots, Aleph for re-styling existing footage, Act-One/Two for performance capture. Nothing else in this guide ships the same trinity.

Fallback · Luma Ray 2
Social-first, meme-velocity
→ Pika

Pikaffects make non-prompt users productive immediately. Pikaformance lip-sync is good enough that audiences stop noticing. Time-to-clip wins this category.

Fallback · Seedance 2.0 for dance
Multi-modal agency campaign
→ Luma Agents

When the brief is image + video + audio + copy held to one creative direction, Luma Agents is the only tool that orchestrates the whole brief end-to-end at the $90-$300 tier.

Fallback · Runway for video-first
Self-hosted & sovereign
→ LTX-2

Open weights, 60-second native output, audio in the model. If your data cannot leave the building or your CFO is tired of credit invoices, this is the answer.

Fallback · Wan 2.x on image-to-video
Image-to-video at length
→ Kling 2.x

Two-minute takes at 1080p from a single still beats every Western competitor on length, and physical realism on hair/water/fabric is class-leading. Mix in audio after.

Fallback · Hailuo for faces
§ 04 · How to choose

Four questions, in order, before you sign anything.

01
Does the deliverable need on-clip audio?

If yes, the field narrows to Veo 3.1, Sora 2, Luma Ray 2, or LTX-2. Everything else needs audio mixed in post. Veo wins quality; LTX-2 wins ownership.

02
Is the single take longer than ten seconds?

If yes, you have three serious options: Sora 2 (60 s, access permitting), Kling 2.x (120 s, no native audio), or LTX-2 (60 s, open source). Everyone else makes you stitch.

03
Will legal sign off on training-data provenance?

If your client is a regulated brand or your jurisdiction enforces IP indemnification, be very careful with Seedance 2.0 and verify provenance language in every vendor contract. Veo and Runway have the cleanest enterprise paper.

04
Is per-clip cost or sovereignty the binding constraint?

If yes, go open-source: LTX-2 or Wan 2.x. The day-one savings vanish into GPU bills, but at volume the unit economics flip — and so does the data-sovereignty story.

§ 05 · FAQ

Honest answers to the boring questions.

Is there a single "best" AI video model in 2026?

No, and anyone selling you one is selling a category, not a tool. Veo 3.1 leads on audio and 4K; Sora 2 leads on cinematic prompt adherence (when accessible); Runway Gen-4.5 leads on character consistency; Kling leads on length; LTX-2 leads on ownership. Pick the leader for your specific binding constraint.

Can these models be used in commercial work without legal risk?

It depends on the vendor and the brief. Frontier Western vendors (Google, Runway, Luma, OpenAI) ship explicit enterprise terms with varying degrees of IP indemnification — read them. Chinese vendors are operationally fine for many use cases but the contractual surface is thinner outside China. Seedance 2.0 currently carries public copyright-infringement concerns that warrant extra caution.

If your client is regulated — finance, pharma, defence — default to vendors who indemnify their training data in writing.

How long until AI video replaces traditional production?

It is replacing parts of it already — concept films, mood reels, illustrative B-roll, ad variants, presenter videos for internal comms. It is not replacing scripted episodic drama or location-dependent broadcast work, and probably will not by the end of the decade. The honest framing is that AI video is a new medium, not a substitute medium. Treat it that way.

What about avatar tools — Synthesia, HeyGen, D-ID?

Different category. Avatar tools generate a presenter speaking your script — useful for training, internal comms, localised announcements. They are not what you reach for when the brief is cinematic narrative or product film. Synthesia and HeyGen are the two we see in agency stacks; if your only need is a multilingual presenter, look there instead of at the models in this guide.

How do you handle audio when the model does not generate it?

Standard pipeline: dialogue via ElevenLabs, score via Suno or AIVA, ambience via library, Foley via library, master in Resolve or DaVinci. For lip-sync to existing video, Pika's Pikaformance and Runway's Act-Two both work; the choice between them is workflow taste.

How current is this guide?

Compiled May 2026. The category moves weekly — a fact stated here may be wrong by the time you act on it. For any procurement decision over a few thousand dollars, verify with the vendor's current docs before signing. We will republish the guide when the category moves materially.