That's no video — it's an array

Once upon a time, video was content. Video was a thing you watched when you wanted entertainment or education or whatever. In other words, video was TV and movies. When video came online, online video was content too. Online video was Netflix or YouTube or Twitch or sports or news. You spent intentional, dedicated time watching video, probably because you wanted to watch video.

Then something changed, and online video became something new. In addition to video being content, video became features. Apps that used to not have video started getting video, because video made existing apps better. Homepages, docs, and guides now put video front and center, because homepages, docs, and guides are better with video. Social apps that started with text and images added video. Business tools added video. E-commerce sites added video.

But content and features are just broad use-cases, and they don't exhaust the uses of video. Video is a technology, not a use-case.

Online video is changing once again, and this time it's driven by — no surprise — AI. The next shift that has started is that video is data.

What even is data?

Lucille Bluth from Arrested Development saying 'I mean, it's one datum, Michael. What could it cost? 10 dollars?'

Data is a collection of raw, unprocessed figures or facts.

In theory, video contains massive amounts of data. Think of a recording of a basketball game. What players are in the video? What are the words being spoken? How many free throws have been attempted, and how many have been made?

In practice, video is a binary representation of compressed still images and motion vectors and audio samples, which themselves are representations of pixels and colors and sounds waves. It's impossible to make sense of what is happening in video and audio at a binary level. The bits have no direct correlation to the information contained in the bits. This means getting data from video is exceptionally difficult, or prohibitively expensive, or both.

Or so it was, until AI made it easy. AI isn't good at everything (see OpenClaw deleting emails), but AI is really good at extracting data from video.

Multimodal models know how to turn audio into transcripts; how to translate those transcripts from one language to another; and even how to extract other information from audio, like speaker sentiment or music or background noise.

They also know how to turn video frames into sets of objects with time codes and bounding boxes; or descriptions of activities; or subjective classifications ("Is this professional football or amateur football?"); or arbitrary descriptions (”What is happening right now?”).

In other words: AI makes it easy to turn video files into arrays of strings or numbers or tuples or vectors.

**This isn't literally new, but it's pretty much new.**

Technically, it was always possible to turn videos into data in the past. There were two ways to do it: computer vision or humans.

Computer vision (CV) software could turn video to data, but only in pre-determined, pre-trained ways. You could train a CV system to read license plates, for example, but your license-plate-reading system was only good at license plate reading. This made tools highly specialized, inflexible, and challenging.

AI overcomes these limitations. You can ask a multimodal model open-ended questions about a video and it will answer. "Describe this scene" was an impossible task for computer vision, but is possible for AI.

Alternately, humans could turn videos into data via manual labeling. You could hire a team to watch video and make notes of everything that happened in the video, or to transcribe audio to text, or to describe scenes or count baskets or classify into categories. Obviously, though, that was labor-intensive, slow, relatively expensive, and hard to scale. If you want to extract data from ten thousand hours of video, you need something in the ballpark of ten thousand people-hours of work.

AI helps here too. It isn't free, but it is significantly faster and cheaper than human labeling, and in some cases it is even more accurate.

What can you do with this?

I don't know, and that's exciting.

Or rather: I know some of things you can do with it, but not everything. "Video as data" isn't a single use-case; it's a category of use-cases that are going to be disclosed and unlocked over the next ten years.

A few use-cases are already mainstream. Note taking software turns meetings and conversations into datasets. UX research and user testing is another. Creator shopping platforms do really interesting things with AI to detect products in images and video. Major and minor sports use platforms that use AI to track shots or plays or players. Video becomes data in obvious ways for manufacturing and physical security and self-driving cars.

These use-cases are all interesting, but even more interesting is what's coming next. What are you going to do with video as data?

That's no video — it's an array

What even is data?

**This isn't literally new, but it's pretty much new.**

What can you do with this?

Written By

Jon Dahl – Co-founder, CEO

Leave your wallet
where it is

Read more like this

Your video is more valuable on Mux

How to build a Windows 98-style video player in 2025

Why video quality matters

Check out our newsletter

That's no video — it's an array

LinkWhat even is data?

LinkThis isn't literally new, but it's pretty much new.

LinkWhat can you do with this?

Written By

Jon Dahl – Co-founder, CEO

Leave your wallet where it is

Read more like this

Your video is more valuable on Mux

How to build a Windows 98-style video player in 2025

Why video quality matters

Check out our newsletter

What even is data?

**This isn't literally new, but it's pretty much new.**

What can you do with this?

Leave your wallet
where it is