Here at Mux, we’re always trying to make sure we build products that make our customers better able to understand their video, and how to improve it. One of the common questions we get is how folks can improve their accessibility - and stay on the right side of the FCC!
Captions and Subtitles can be a tricky subject, but I recently had the pleasure of going to the great FOMS workshop here in San Francisco, and spent most of my time trying to unravel the current state of Captions, Subtitles, technologies and regulations. Today I’m going to give you a run-down of what I learned about the FCC and captions, and give you some advice on how best to deliver captions.
Subtitles are a description of what’s being said in a video, often translated to a different language. Think about an English speaker watching a French movie - they can hear what’s going on in the movie, but they don’t necessarily understand the words the characters are saying.
Captions, on the other hand, are for users who are deaf or hard of hearing - they can’t necessarily hear what’s going on in the video, so they rely on captions to provide the information about what type of music might be playing, or that a loud crash or bang just happened and startled the characters. All the information that might be contained in the audio, not just the words spoken, needs to be in the captions.
Technologically, however, there is no difference - subtitles and captions can both be thought of as “Timed Text Tracks”, pieces of text with corresponding time codes, that can be shown on top of the video to enrich the user experience.
Folks who deliver video to users in the USA also need to be concerned about the FCC regulations around closed captions specifically.
The Twenty-First Century Communications and Video Accessibility Act requires that any content that was first broadcast over terrestrial television with closed captions must also have those captions available when subsequently made available over the internet.
The FCC regulations contain this critical segment: “Enable the rendering or pass through of all required captions to the end user, maintaining the quality of the captions provided by the video programming owner and transmitting captions in a format reasonably designed to reach the end user in that quality.”
To answer this, we first need to understand how captions are initially broadcast over television, and what kind of features they have. CEA-608 captions used to be the standard for closed captions on NTSC TV broadcasts in North America. CEA-708 is the next generation of closed captioning and all ATSC digital television streams in the US currently use it.
While the newer 708 standard has many more features than 608, it also specifies a mechanism of being backward-compatible with 608 captions, and it is this backwards-compatible mode that is still the most common form of captions delivered today.
608/708 are a command-based format that allows for the display of captions with different background and foreground colors, as well as roll-up support. These are embedded directly into the video stream, typically as MPEG-2 “picture user data.”
Here’s an example of 608/708 captions being drawn onto a video, taken from the latest VLC (3.0):
Notice how the captions draw onto the image from left-to-right, and how as new rows appear the previous row has a small “roll-up” animation. These features are critical for folks who are deaf or hard of hearing to follow along with the captions naturally.
So what are the options for displaying captions on the web? In short:
- 608/708 Embedded captions
Let's go through these in more detail.
The first option we might try is just to maintain the 608/708 captions from our initial source, and pass these through to the web browser. Unfortunately, the HTML5 video element doesn’t support displaying these captions, although some individual HTML5 video players do have support.
Open-source projects like hls.js do have support for 608/708 captions, although the exact formatting of these captions can often be different from that of a traditional 608/708 renderer:
For players that don’t support embedded captions, or content that has them as a separate file, we need to use a web captions format. The most popular is currently WebVTT, which has wide support across both desktop and mobile browsers.
It's also possible to translate 608/708 captions directly into WebVTT, and there’s an excellent spec available on just how to achieve this.
While the WebVTT spec can absolutely be used to “maintain quality”, actually getting implementations to support features like roll-up captions can be very tricky, with projects like vtt.js (Firefox’s WebVTT renderer) not currently supporting roll-up captions in a way that matches 608/708 in quality. WebVTT uses a “region” specification to determine how captions should be rolled-up, but getting implementations to support these features has been slow going
Captions are now broadly supported across video players and browsers, but getting a good implementation that works everywhere can still be a challenge.
The current best practice for web captions is to preserve 608/708 captions wherever you can, and use video players that support them (or leave it to the device). If that’s not an option, WebVTT is currently the most popular format, with excellent support across many players. However, IMSC1 is also gaining ground, with imscJS making it easy to build player support.
With all these formats, keep an eye out for advanced captions features like color and roll-up, as getting these right on the web is a key part of maintaining caption quality.
If you're passionate about video technologies like captions, Mux is hiring!