Published on August 24, 2023 (8 months ago)

Cloud encoding APIs are a dying breed: lessons learned from building three cloud video startups

Steve Heffernan
By Steve Heffernan8 min readEngineering

A history told through Hacker News posts…

The founding team of Mux previously founded Zencoder (YC W'10). Zencoder was early in the cloud-based encoding space, and early in "cloud" in general. Our customers grilled us with questions on the security of the cloud, and then replaced their racks of expensive hardware encoders with our simple API that was ten times more cost efficient.

With a product that developers loved, it didn't take long to surpass and then double the encoding volume of the previous leader in the space.

With Zencoder, the cloud was our "disruptive technology," although building Zencoder was less about disrupting anything and more about engineers building a product that would make our own lives easier, using the latest tech available to us, with some lucky timing.

Alongside Zencoder we also built Video.js, the first open-source HTML5 video player, which today is used on millions of sites. At the time, HTML5 video was supported by only 10% of internet users. I was speaking at conferences on the virtues of HTML5 video and how it would replace Flash (in a very nice striped shirt) to audiences of media publishers who did not want to hear anything about their Flash video players going away.

In 2012, Zencoder was acquired by Brightcove, the largest white-label video platform. Less than a year later, Amazon launched Elastic Transcoder, which closely resembled Zencoder's API and minutes-based pricing. However, even with Amazon's backing, Elastic Transcoder struggled to compete with Zencoder's momentum. A few Elastic Transcoder execs ultimately joined Brightcove, and the product has been replaced in the AWS lineup.

As for many products, Zencoder's success was built on a string of failures, each of which is a story on its own:

  • A failed startup that was going to be the "HD YouTube," just like 50 other companies in 2007, before YouTube became the HD YouTube
  • A self-hosted version of Zencoder that flopped (we sold one copy and eventually returned the money)
  • A partnership in an encoding service, where our partner was acquired by Google, who then squashed the project

With over a decade of experience building good and bad ahead-of-their-time encoding services, I think we've gained a unique perspective on this industry, including a mix of paranoia and excitement for the next technology that will disrupt what we've built. What's become increasingly clear is that standalone cloud encoding APIs are on their way out, for a couple reasons.

LinkReason #1: Direct access to audience-specific data

Not the "we're gonna sell your info" type of audience data, but aggregated, anonymous details about device capabilities and internet connection speeds, as well as Quality of Experience (QoE) metrics to understand and set goals for viewers’ actual experience (e.g., playback startup time, playback stalling/rebuffering, and visual quality).

Standalone cloud encoders require you to know how to best encode all your videos. But in reality, you don't — at least not without a lot of manual work to understand your general audience. Even then, audiences change over time and are often different per video.

Most people researching how they should encode their video for streaming find their way to Apple's recommended bitrate ladder for HLS. Apple clearly warns that this is too generalized and recommends putting in more effort.

These factors make universal encoding recommendations for content difficult. This specification includes initial bit rate recommendations that should be evaluated against your content, constraints and encoding workflow.

Yet what we've seen is that most people ignore this advice and use Apple's recommendations verbatim. At Zencoder, we would often manually help customers make better encoding decisions than they were making on their own, picking adaptive bitrate ladders based solely on their content type and helping them decide on cost vs. quality vs. encoding speed. That was better than nothing, but still far from optimal.

Audience data contains the answers to critical questions about expected network connections, device resolutions, device decoding capabilities, audience size, and how well the existing encoding strategy is performing. All of this information can drive decisions around how many adaptive streaming renditions to create at which bitrates, and which codecs should be used to optimize bandwidth. An encoder with access to this data can build a better encoding strategy — not only overall but also per video, while adapting to the audience over time. This can result in significant cost savings and a higher QoE for viewers, often both at the same time.

Advanced video platforms are doing this today, and to stay competitive, other services will need to catch up. So all that to say, the future of video encoding is deeply integrated with audience data, and that's not something standalone encoding APIs have a clear path to today.

LinkReason #2: Just-in-time encoding

A standalone encoding API is transactional. You push a video file to it, which the encoding service then transcodes with your desired settings and pushes back to wherever you store your videos. Job done. The transaction takes some amount of time relative to the video's duration, and the video is unwatchable until the job is done. Advanced encoders may even parallelize encoding of segments of the video to speed up the process.

With just-in-time (JIT) encoding, on the other hand, the transcoding doesn't happen until the video is watched. If any part or specific rendition segment of a video is never watched, it’s never encoded.

The first obvious benefit of this approach is cost savings. This is especially significant for user-generated video, considering that, depending on the product, somewhere around half of the videos uploaded are never watched at all. It's also beneficial for upgrading the encoding strategy of a large back catalog of videos without the expensive process of re-encoding it all at once.

The other benefit is publishing time, because you're not waiting for an encoding process to finish before making the video available to viewers. This makes any video uploading experience better by removing the wait before a next step can happen, such as sharing the video with friends or publishing a timely news story.

Just-in-time encoding requires ongoing, fast access to specially packaged video source files so that any part of the video can be transcoded quickly without introducing performance issues. It also requires CDN and origin tuning well beyond the defaults.

While standalone encoders could painfully reshape themselves to fit a JIT workflow, they still wouldn't have enough control over enough of the pipeline to guarantee a good JIT encoding experience for every customer.

To be clear, JIT encoding is not easy and not always done well, and there are a lot of challenges. Even AWS with its now vast and confusing portfolio of video encoding products hasn't attempted JIT encoding yet (JIT packaging, yes, that's easier). But a growing number of platforms (including Mux) have solved these problems, and are pushing efficiency and performance expectations beyond what standalone encoders can support.

LinkI'm an encoding service, what happens now?

Well, getting acquired was the answer for Zencoder. Not that we foresaw any of this in 2012. You might get the same advice from Elemental, Encoding.com, and Hybrik.

This shift in the encoding service market obviously won't happen overnight; otherwise there'd be more people talking about it. In many ways, Mux and other integrated video infrastructure platforms are in a similar place as we were with cloud encoding and HTML5 video in 2010, extolling the benefits of a solution that doesn't quite fit customers' mental models or expectations, while those benefits are steadily becoming unignorable.

But what do you think? I'm happy to be told I'm way off, or just answer questions. Ping me on Twitter X (if it still exists when you read this): @heff & @MuxHQ.

You can see an example of JIT encoding publishing performance with the demo on mux.com. The cost benefits of JIT are currently blended into Mux's simple pricing today, but keep an eye out for upcoming new pricing options. Get in touch if any of this is particularly interesting for your use case.


Written By

Steve Heffernan

Steve Heffernan – Co-founder and Product Architect

Creator of Video.js and co-founder of Zencoder. Semi-professional drummer and apparently drove a Civic lowered more than Mux's live latency in high school.

Leave your wallet where it is

No credit card required to get started.