Last week at WWDC, Apple announced their usual cascade of software updates, and as has been tradition for the last 4 years, Roger Pantos took the stage to announce the latest swathe of changes to the HTTP Live Streaming (HLS) specification. This year's changes are intended to reduce the latency of live video streams, but at what cost?
HLS is a segmented delivery technology which enables live and on-demand video streaming to devices. While HLS was designed for Apple devices, it’s now used broadly across the video streaming ecosystem including on browsers, smart TVs, set top boxes, and games consoles. It's a simple protocol that’s easy to understand and implement. You deliver a master playlist (often called a manifest) text file, which describes the different resolution and bitrate combinations (renditions) of your content you have available, and then you have a separate playlist for each of those renditions, which contains a list of media segments, their durations, and the URLs where to get them.
While HLS is simple and scalable, it’s prone to high latency when used for live streaming. In this context, we’re talking about “wall-clock” or “glass-to-glass latency”, which is the time between something happening IRL, and being seen by the end user. In HLS, latency is closely tied to the duration of the media segments that you’re using - generally the lower limit of segment duration has been found to be around 2 seconds, which will deliver a passable streaming experience, while delivering a latency of around 10 seconds. More traditional HLS streaming setups with longer segment durations can have latencies of upwards of 30 seconds.
This year at WWDC, Pantos took to the stage to announce that Apple updated HLS to include a new low-latency mode. “Great!” you’re all saying, “Lower latency video, that’s a good thing, right?”. But here’s what’s interesting - this isn’t the first attempt to write a specification for low-latency HLS. There’s been a specification in open development within the video developer community for over a year, based on white papers over two years old. The community approach used on the surface simpler, more widely deployable and available technologies. So why didn’t Apple use the communities work? Let’s take a look at the approach Apple took, and how it differs from what the community has been working on.
First, let’s look at how Apple’s Low Latency HLS solution works. You can watch the presentation here, and read the spec here, but here’s the 30,000 foot summary of the changes Apple have made in the name of low-latency:
- Partial segments
It’s now allowable to generate partial segments of media, in TS or CMAF chunks of around 250-300ms in duration, which contain several complete video or audio frames, but probably not a full GOP (Group of pictures - a set of video frames which as a sequence are standalone). Apple refer to these as “Parts”.
Apple have introduced a new syntax to the HLS playlist format which allows these parts to be advertised at the head of a live HLS playlist, and to allow the client to download them. This means that the player can get smaller groups of frames sooner after they come out of the encoder, rather than waiting for 2 to 10 seconds of frames to be ready before they can be downloaded.
Interestingly, as per the specification, these parts “must only” be added to the playlist once they are downloadable “at full speed”, which is an interesting note. We’ll talk more about why this probably is later on.
- HTTP/2 pushed segments
At the most fundamental level, HLS relies on the polling of the rendition playlist file to check for new available segments, followed with another HTTP round trip to retrieve the new segment. When low-latency delivery is required, the overhead of these traditional HTTP requests becomes the limiting factor for how low the wall-clock latency can be.
Apple’s new approach for addressing this is to use HTTP/2 push to push out the shorter media “parts” in the response to a playlist request. The playlist however has to be fetched very frequently in Apple’s new approach, depending on target latency and part duration, this can be up to 3-4 times a second.
- Blocking playlist requests
One of the new features Apple’s added is a mode where HTTP requests for a playlist can be held for a period until a particular segment or part is available. For example, a client playing segment 20 of a piece of content can request the playlist for that rendition again, saying that it only wants a response once segment 21 was available. This is intended to allow players to immediately request a new playlist (and as per point 2, a new piece of media immediately) very soon, if not immediately after the previous one has arrived.
This (and the next 2 features) functionality relies on some new client-server communication Apple has introduced into HLS. Apple has now reserved all query parameters starting with _HLS for their new “Origin API” which can be used to manipulate the behavior of playlist generation.
- Playlist delta updates
One of the long-running headaches with HLS has been playlist bloat/overhead. With long running sports streams with large live rewind windows, the list of segments within a rendition playlist can become very long, and even when using gzip, it’s easy for a HLS rendition playlist to become hundreds of kilobytes, if not larger. This becomes a more acute problem in a world where you need to download the playlist every few hundred milliseconds.
In this HLS update, Apple enables a way for “delta” playlists to be generated, which allows the playlist to only contain some of the segments contained within the full playlist. This allows players to request the full playlist once, maintain an internal state, and add to it using smaller delta playlists which only contain the latest few segments along with the low latency “parts” at the head of the playlist.
I must say, I love this feature; it’s mostly well thought through, and really addresses a long standing problem with HLS. I just hope that Apple allow this feature to be used outside of the low-latency mode, since playlist bloat is a real problem, regardless of if the user wants low-latency or not.
- Faster bitrate switching
Finally, Apple have introduced a small feature which allows the playlist responses for a particular rendition to contain information about the most recent chunks and segments available in another rendition - in theory this allows the player to make a jump into another rendition without needing to make a full playlist request before it starts the switch.
Conceptually, this is a neat feature, but the current version of the specification seems to be missing sufficient detail to actually make this work reliably in practice (and Apple’s demos don’t actually support this feature today). It’s worth noting that this feature doesn’t seem to be designed to allow you to jump directly to a segment from another rendition, but rather to optimize a playlist request by requesting a blocking playlist update, and leveraging HTTP/2 push to potentially get a part along with that request.
I think with a little more thought, and design, this could be really useful, particularly if there was also a way to push the initialization segment of a CMAF stream in the playlist response too.
What’s likely to jump out at you if you’ve used HLS before is “Wow, that’s a lot of moving parts”, and you’d be right, this is a pretty complex addition to an otherwise simple specification. For bonus points you’re going to have to implement all of these features, and some more I haven’t talked about (yes, including HTTP/2) in order to get your low-latency HLS stream to work. For the time being at least, you’ll have to get your application (and thus your low latency implementation) tested by Apple to get into the app store, signaled by using a special identifier in your application’s manifest.
The biggest departure from traditional HLS approaches that these changes have caused is the significant increase in the state that needs to be communicated between the playlist generation process and the encoder process. Historically, the process was simple. The encoder generated a new segment, put it onto some storage (a CDN or object store), and updated the playlist to indicate the new segment was available. Now there has to be a lot more logic performed when the playlist is generated, including in some cases, hanging the connection for a while while a part becomes available to download.
Alone, I actually don’t think that the ALHLS is a bad specification. Is it complicated? Yes. Does it have a lot of moving parts? Yes. Does that make it fundamentally bad? No.
There are things I don’t think are elegant in there for sure - the reserved use of some query parameters to change playlist generation behavior isn’t to my liking, and neither is the blocking playlist request behavior. Let’s take a look in more detail at the areas that are going to be challenges from an implementation perspective:
Query parameter usage
Most playlist requests in 2019 use query parameters as part of their content security mechanism, meaning that part of all of the URL to the playlist is signed in order to stop unauthenticated users from accessing content. Introducing new, functional query parameters to the URL introduces extra complexity into the signing and caching implementations for playlist requests, as well as introducing new challenges in third party player development.
Blocking playlist reloads
Blocking playlist requests are certainly going to be a headache to maintain, and the current documented timeout behavior seems unclear and frustrating to monitor (503ing after 3x the target duration). Beyond this, this strategy opens some interesting and quite concerning security and performance concerns on your web and CDN tier.
HTTP/2 server push at scale
However, the biggest challenge for adoption with Apple’s approach is the mandatory use of HTTP/2. In the announcement, Apple touted HTTP/2 as “widely adopted […] by CDNs”. While on the surface this is true, this statement doesn’t really hold true for the features of HTTP/2 that Apple require you to use.
HTTP/2 server push works by allowing a server (a node in a CDN in this case) to push an object back in the client without the client asking for it. This is pretty cute, but comes with 2 major headaches when we’re talking about using it at scale through name brand CDNs:
- HTTP/2 push is unimplemented on many CDNs. While there’s good general HTTP/2 coverage on the big name CDNs, push is less widely implemented. This really sucks when one of the most important considerations you should be making when implementing a large scale streaming service these days, is having a multi-CDN strategy.
Where HTTP/2 push is implemented, it’s usually by the use of the preload keyword in your Link headers, in your origin response. This causes the CDN to link together the two objects in its cache, and push out the linked objects appropriately. However this brings us to problem 2…
- Because you now have to be able to push the media along with your playlist response, you now have to use the same edge endpoint for your playlist requests and your media requests. This wasn’t previously the case as HLS sensibly supports absolute URLs to media segments.
This will be a massive headache for many vendors who have spent years building systems which separate out the responsibility for playlist and media delivery, since they have very differing scale requirements. Playlists are small text files, which can be gzipped, and change frequently, while media segments are large binary chunks, which once created never change. Playlists are fast and easy to generate on the fly, media segments are less so.
Now let’s talk a little bit about how this differs from the community developed LHLS solution.
HLS.js in conjunction with a variety of others, including Mux, JW Player, Wowza, Elemental, and Akamai have been collaborating on a community driven approach to implement low latency streaming using HLS for well over a year. Much of the discussion around the formal standard can be found on this Github issue. The initial concept and terms come from a Periscope blog article published in mid-2017, describing how they had implemented their own approach for low latency HLS streaming. You can read this article here.
The approach is actually very simple (much more simple than ALHLS). Apart from some simple new playlist semantics, LHLS uses the same strategy used when delivering low latency MPEG DASH - HTTP 1.1 chunked transfer encoding. Chunked transfer encoding is a great fit here because it allows you to start sending your HTTP response as chunks of data as you have them, before the complete response is available.
This is helpful because it lets you send what Apple are calling “parts” of the segment of video as the encoder generates them, back to the client, which can start playing them as soon as it gets them, without needing to wait for a full segment to be available. The really great thing about chunked transfer mode is that it’s available on the overwhelming majority of CDNs - it's much more widely supported than HTTP/2 push is today.
Beyond its simple availability, this approach actually allows for less busy actions on the client device in comparison to ALHLS. On the surface, LHLS maintains the traditional HLS paradigm, polling for playlist updates, and then grabbing segments, however, because of the ability to stream a segment back as it's being encoded, you actually don’t have to reload the playlist that often, while in ALHLS, you’ll still be polling the playlist many times a second looking for new parts to be available, even if they’re then pushed to you off the back of the manifest request.
It would have been amazing to see Apple bring some of the concepts that it’s been developing for ALHLS (mainly delta playlists) to LHLS - these approaches combined would have made an elegant, powerful solution, so why didn’t they?
So if LHLS is so great, and supported in the community, why didn’t Apple just get involved? Honestly, I don’t know. Apple’s decision to ignore pre-existing communities or standards isn’t particularly new, but Apple had certainly been giving signs in the last few years that they were starting to move into alignment with the rest of the video streaming industry.
While Apple never adopted the MPEG DASH streaming standard (a competitive standard to HLS, despite being involved in the DASH Industry Forum), a couple of years ago, it started supporting fMP4 and CMAF media chunks. This support is now available in the overwhelming majority of Apple devices, which meant that the dream of delivering one set of media segments in one way through one endpoint, including low-latency modes was finally starting to become a reality.
However, with DASH’s ongoing standardization of an LHLS style chunked transfer delivery of low-latency streaming, it now seems that Apple is forcing us back into a segregated delivery stack strategy in order to support ALHLS, even if it is only for the hot end of the stream.
The biggest challenge here for many HLS and video platform vendors is going to be the mandatory HTTP/2 push, but I also strongly suspect this is the key as to why Apple chose to go in the direction they did. One of the big challenges for both ALHLS and LHLS is the problem of bandwidth estimation. In order to deliver a great streaming experience, you have to be able to measure and respond to changes in a user’s bandwidth. Historically estimating a user’s available bandwidth has been easy - you measure how long the last media segment took to download, and then check the size of that segment, do some simple math, and this gives you a good bandwidth estimation.
In the chunked transfer world however, estimating bandwidth when you expect every segment to take exactly as long as it took to generate to download isn’t easy, you need an alternate bandwidth performance measurement. It could be to use the playlist fetches, or using a small reference file, occasionally using a full segment, or something else.
My working theory is that Apple didn’t want to solve this problem in any of these ways, leaving the only option to be to allow AVPlayer (Apple’s streaming framework) to be able to measure the performance of individual chunks of a chunked-transfer response. I suspect Apple decided that it wasn’t interested in adding any new functionality to its legacy HTTP 1.1 stack on devices in order to support this.
Now, this said, HTTP/2 push absolutely does not solve this problem. There are no APIs in modern browsers or devices which allow you to check the download performance of a HTTP/2 push response, but with blocking playlist requests, the situation is actually worse: measuring the performance of a blocking playlist fetch along with a segment load doesn’t give you an accurate measurement, and you can’t use your playlist download performance as a proxy. We have to assume that Apple has a way of measuring this performance on their own devices when HTTP/2 is used for 2 reasons:
- It’s the only way that their low latency strategy will work with adaptive bitrate, and…
- They allude to it in their specification:
Another way to look at this decision is to classify it as “classic Apple”. It’s not like this is the first time Apple have taken a strongly opinionated deprecation timeline, though those decisions do tend to be rooted in more physical hardware. Headphone jacks, USB-A, a physical escape key… just to name a couple. Apple are the king of the dongle… maybe I can get an ALHLS to LHLS dongle too.
Apple’s beta of low latency is only compatible with iOS devices right now - not even the latest Safari technology preview on MacOS supports it (and I’m told it won’t for “some time”). However, Apple devices are really only a tiny part of the HLS ecosystem.
It’s worth keeping in mind that the amount of HLS that gets delivered to non-Apple devices is huge - with players like HLS.js and Video.js with their own HLS implementations supporting billions of impressions every day. So let’s assume that the video industry just follows Apple’s spec, and pivots away from any current approaches that they’ve been pursuing over the last year or two.
So would ALHLS be easy to implement on modern browsers or other devices? No, not really. The choice of technologies (namely HTTP/2) Apple has selected is going to make it really hard for non-Apple devices to implement ALHLS, and yes, that includes HLS.js, which Apple uses on their own website for their own developer videos.
HTTP/2 is a young technology, and the tooling to work with it are severely limited, and the web APIs in a browser just aren’t yet mature enough to build low-latency streaming technologies on top of the implementations. It’s likely Apple will be able to make it work well in Safari eventually since it can leverage private APIs, but the rest of the browser world is likely going to have to change rapidly in order to support and debug third party implementations of ALHLS.
Obviously, I’m sure that Apple performed a lot of due diligence, and investigated LHLS extensively. However, there’s some big challenges that need to be worked through before implementations of Apple’s specification will be ready.
Customers all over the video streaming industry are desperate for low latency solutions so that they can compete with the likes of Twitch or Twitter/Periscope. The community LHLS strategy is real and is available to implement against today (and indeed many have), and there’s nothing stopping you implementing it in a backwards compatible way across all major browsers.
Apple’s ALHLS however is clearly many months away even on iOS, since it’s very likely this won’t ship until iOS 13 at the earliest. This joined with the limited availability of HTTP/2 push on major CDNs, the requirement to use a single edge hostname, and Apple’s new app verification for ALHLS means that we're unlikely to see large scale deployments of ALHLS for a while yet. If you join this with wanting to offer the same experience with the same technologies on desktop, or other web players, you’ll also have to wait for web players to catch up with the dramatically more complex ALHLS implementation. This leaves vendors and customers in a challenging position for a while while the industry figures all this out. Do vendors continue to forge forward with chunked-transfer based solutions, or do they go all-in on Apple’s new ALHLS?
Its fair to say there’s a lot of people in the industry with a luke-warm (at best) reaction to Apple’s ALHLS specification, but ignoring community developed options while pushing for an excessively future facing approach isn’t exactly new ground. It really is a shame to see a lack of conversation, because in some cases, such as Swift, Apple are becoming a much more community centric organization.
Oh well, I guess we’ve got some work to do! 💪
Cover photo: Snow White and the Seven Dwarfs, Disney 1937