Published on April 27, 2023 (about 1 year ago)

How to transcode video 100x faster; or, a Gordian knot cut

Jon Dahl
By Jon Dahl6 min readProduct

Five years ago, in a blog post titled “The fastest video publishing in the multiverse,” I made the following claim:

Mux Video is the first video streaming platform that doesn’t make users wait for transcoding. We can prepare short video in seconds instead of minutes, and feature-length films in a couple of minutes instead of half an hour. Over time we expect to get these times even lower.

That earlier blog post showed numbers. In our early testing, we were about 10x faster than others, but those were simulated tests. At that point, Mux Video wasn’t even really launched. We had no real scale.

Five years later, how are we doing?

Video duration

Median time to publish (all inputs)

0-1 minute

2.0 seconds

1-5 minutes

3.3 seconds

5-20 minutes

9.0 seconds

20-60 minutes

46.4 seconds

60+ minutes

93.5 seconds

This is fast. In comparison, a typical transcoder takes about half the duration of a video to transcode. So a 10-minute video might take 5 minutes to be ready for playback. At Mux, a typical 10-minute video takes 9 seconds.

That includes all videos submitted to Mux. We’re even faster with what we call “standard input.” Standard input refers to the dominant profile of files uploaded by users today, based on resolution (up to 2K), codecs (e.g., H.264), frame rate, bitrate, and so on. 82% of files ingested to Mux are standard input.

(This is generally user-controllable; see our Minimize processing time guide for tips on ensuring video publishes quickly.)

Video duration

Median time to publish (standard input)

95% time to publish (standard input)

0-1 minute

1.8 seconds

3.6 seconds

1-5 minutes

2.6 seconds

18.7 seconds

5-20 minutes

5.5 seconds

37.2 seconds

20-60 minutes

26.7 seconds

11.0 minutes

60+ minutes

50.1 seconds

16.1 minutes

The typical 60 minute “standard” file takes less than 1 minute to publish at Mux, and an extreme outlier (95%) takes 16 minutes, which is still twice as fast as a typical transcoder.

LinkHow do you publish video that quickly?

A “Gordian knot” is an intractable problem. Around 2500 years ago, an ox-cart was supposedly tied to a post in ancient Phrygia with a famously intricate knot, for somewhat inscrutable reasons. An oracle had said that whoever untied the knot would rule Asia, and so when Alexander the Great came to Phrygia, he obviously took on the challenge. When he couldn’t untie it the traditional way, he pulled out his sword to cut the knot. “Cutting the Gordian knot” means solving a seemingly impossible problem by reframing or sidestepping the problem.

A standard video publishing workflow looks like this:

Steps in this diagram happen serially. Video is uploaded to storage, then sent to a transcoder, then sent back to storage for playback. This transcoding step is computationally intensive and slow. This isn’t ideal; what you really want, if you care about user experience, is instant transcoding. But instant transcoding is an intractable problem — a computational Gordian knot.

Before starting Mux, I co-founded a company called Zencoder, which was one of the first cloud-based video transcoding products. Our system worked exactly like that diagram.

Back at Zencoder, we had two ideas that we never implemented.

First: What if we break every incoming video into smaller chunks (say, 10 seconds), and process them in parallel?

Second: What if we could stream video into and out of a transcoder, so that instead of waiting for transcoding to finish, a video could be watched while it was being transcoded?

When we built Mux Video, we implemented both of these ideas. We call this just-in-time (JIT) transcoding. Effectively, when frames of video leave the transcoder, they’re streamed out before the rest of the file is complete. This means our publishing workflow looks like this:

So how do you transcode video 100x faster? The answer is you don’t — you cheat. You sidestep the problem by eliminating the need to pre-transcode video in the first place.

LinkWhat else do we do with JIT transcoding?

Just-in-time transcoding doesn’t only let us publish video quickly. It also gives Mux capabilities that no one else has. Examples:

Instant thumbnails: If you use a typical cloud transcoder, you need to specify in advance what thumbnails you want to extract from a video. If you ever want new thumbnails, you need to send the entire file back to the transcoder for a slow, expensive operation.

With Mux Video, you can instantly extract any thumbnail from a video using a simple GET request, like this:

GET image

Instant GIF generation: Same story with animated GIFs.

GET gif

Instant live clipping: Just-in-time transcoding means you can extract any clip from any video with basically zero delay. While a live event is running, you can select arbitrary portions of the video to turn into on-demand clips. View docs for live clipping.

Unified live and on-demand video: Because we treat files and live video in the same way, any live event that we ingest is available as an on-demand asset as soon as the event finishes. View docs for live streaming.

Cost efficiency: Last, but certainly not least: JIT technology means we don’t need to transcode segments of video that are never watched. This allows us to lower our ingest pricing. If you used (say) AWS transcoding to create an ABR ladder of 2 HD and 2-3 SD renditions, you’d pay between 25% and 100% more than our rates at Mux.

LinkWhy does this matter?

First, it doesn’t necessarily matter for every user. If you’re streaming long-form content that is uploaded by an in-house editor, it might not matter if a video takes 30 minutes to be ready to publish. You might upload a video long before you’re ready to actually publish it.

But if you’re dealing with user-generated content, it’s a big deal.

Fast time-to-publish is a better experience for users. No one likes waiting on a progress bar.

Fast publishing can increase user activity. When users have to wait a few minutes after uploading a video before they can take their next step (previewing a video, clicking “publish”), they’re more likely to abandon what they’re doing. If they get annoyed by the wait, they’re less likely to upload another video.

Instant clipping and thumbnails enable new workflows, like allowing users to choose their own thumbnails and poster images, or letting users change these over time.

Some content is more relevant when it’s fresh. If you’re dealing with breaking news or time-sensitive content, the difference between 2-second publishing and 2-minute publishing might be enormous.

Ultimately, if users are uploading video, fast publishing means a better user experience. What’s more important than that?

Written By

Jon Dahl

Co-founder of Zencoder, acquired by Brightcove, where Jon served as VP Technology. Makes better BBQ than code these days.

Leave your wallet where it is

No credit card required to get started.