I recently joined Mux as somebody who knows nothing about video. I mean, I'm pretty good at finding funny videos on Instagram, but Mux unfortunately wasn't hiring for that. Instead, I was hired as a Distributed Systems Engineer, and my first task was to get a basic understanding of how video streaming over HLS works.
How do video players and servers communicate over HTTP? Is it even HTTP? What's the "language" they speak with each other? How can you represent the structure of a video in plain text? I had questions, and Google was about to help me answer a lot of them.
Since I didn’t know what HLS even stands for, I decided to start poking around the happiest of paths: a POST to /video.
First, let’s build some context around HLS. Open up howvideo.works and read everything you see there. You should now have some understanding of manifests, renditions, and segments. If you don't, you probably spent too much time looking at those sweet, sweet animations. Go back and read again.
Visit mux.com and scroll down to the Video, in seconds. header. Clicking the Run Request button below will upload a video file to our servers, process it, and populate the response container to its left. Copy the playback_id out of there and curl it like this in your terminal:
Which will give you this:
What is returned is a list of rendition manifests, with plenty of other information stuffed in there for good measure. Let’s break down the top level headers:
- EXTM3U - this file is an Extended M3U rendition file.
- EXT-X-VERSION - the version number of the playlist file, its associated media, and its server.
- EXT-X-INDEPENDENT-SEGMENTS - each segment in this manifest is standalone, and can be decoded without information from other segments.
Above the url for each one is another header describing the manifest itself. Use that info to find and copy the 1280x720 resolution manifest url and curl it like so (you'll likely need to surround the url with single quotes '):
You should see the following:
Note that this file contains a EXTM3U and EXT-X-VERSION header too! Let's check out the other ones:
- EXTINF - specifies the duration of this segment in seconds.
- EXT-X-TARGETDURATION - specifies the maximum segment duration in seconds.
- EXT-X-PLAYLIST-TYPE:VOD - a value of VOD (Video On Demand) indicates that the playlist file must not change. A value of EVENT allows for segments to be appended to the end of the playlist.
We pipe the response to hexdump because the response you’d otherwise see in your terminal is a lot less pretty. To see what this hexdump actually represents, open up Quicktime, navigate to File > Open Location, and paste that url in there. You should now be enjoying video and audio content from the chunk that you requested.
By the way, you could have also pasted a manifest url into Quicktime and gotten the full video instead of just a segment/chunk of it. Double neat!
This was one of my first “Aha!” moments at Mux. I had uploaded plenty videos to the internet, but had no idea how the players on my phone/browser/AppleTV requested & parsed any of those videos in a sane manner. Seeing the breakdown from video to manifest to chunk reminded me of when I first saw the relationship between HTML/JS and a website. I was looking at the foundation of something I already had so much familiarity with, yet so much more to learn about.
If you want more details on any of the things we've gone over today, take a peek at the IETF’s RFC 8216, a document that “describes a protocol for transferring unbounded streams of multimedia data”, also known as the HLS RFC. It should answer some of your questions, and probably make a lot more new ones.
Thanks for following along! You have now gotten your hands dirty with HLS. If you’re a developer and this sounds neat to you too, then you might be interested in working with me at Mux, or checking out Demuxed, the conference for video engineers coming up in October!
Cover Photo by Kai Brown on Unsplash.