Text to video

We used to think making video meant cameras, crews, and editing software. Now we just type. Artificial intelligence (AI) reads our words and turns them into moving pictures.

How it works

At the core are diffusion models. They start with static noise and gradually pull an image out of it, frame by frame. It’s like watching fog clear until a scene appears. Once we stack frames, we have a video. Simple in theory; tricky in practice.

Sora

OpenAI’s Sora tries to handle this complexity. She builds videos with consistent motion, lighting, and detail, even when prompts get weird. Ask for “a corgi surfing on Mars at sunset,” and she tries to keep the corgi on the board, Mars looking Martian, and the sunset in sync. Sometimes she nails it, sometimes she trips. But she keeps getting better.

RunwayML

RunwayML takes the same idea but makes it friendlier for everyday creators. We type, she generates clips, and we stitch them into projects. She doesn’t always produce Hollywood polish, but that’s not the point. The point is we can storyboard ideas fast—ads, prototypes, experiments—without a film crew.

Why it matters

For coders like us, text-to-video is a sandbox. We test prompts, see how models interpret them, then adjust. It feels less like editing and more like debugging—only the bug is in her imagination.

We keep wondering: are we directing movies, or just coaxing code to dream in motion? Either way, it’s a new way to build. And we don’t even need a camera.