You may not have noticed it, but watching videos on the internet these days is typically more than just watching videos.
It can be as simple as having a “Skip Recap” or “Skip Intro” button while watching a TV show. It might be Prime Video’s X-Ray feature, which shows actors, songs, or even trivia about what’s currently on screen. There are WWDC’s interactive transcripts (like this example). Educational video content might pause playback to present questions. Video marketplaces can display “buy it now” buttons. The list goes on.
These features all provide different experiences for a viewer. But if you take a step back, you might notice one shared, basic element: They all provide additional context for the media that is tied to the media’s timeline.
A feature might be shown all upfront (like the transcript), or it might be shown only when you’re playing that portion of the media (like the “Skip Intro” button), but in either case, it’s connecting the time in the media to some other related information. From a generic technical perspective, this is sometimes called timed metadata. As these examples show, timed metadata allows you to augment, contextualize, and enrich a user’s media-centric experience in whatever way makes sense for the product, application, or media in question.
The basic idea behind timed metadata is simple: Take some kind of information, give it a time or time range that corresponds to the media’s timeline, and keep track of that relationship. In Mux Player (and the more primitive Mux Video Component), you can add “out-of-band” timed metadata in the form of CuePoints. For now, we’ve kept our implementation fairly simple.
A CuePoint has two properties: a time for when the CuePoint begins in your media (more on this later), and a generic value that you can use to store any JSON serializable information that makes sense for your use case. This could be the question you want to show, or a “skip” button, or details about the product currently being shown. CuePoints keeps things simple and generic so you can use them for any number of use cases.
To give you a sense of how to work with CuePoints, we’re going to build a concrete example — an interactive video transcript. When we’re done, our simple application should do the following:
Show the entire transcript for the video
Indicate when each sentence of the transcript begins
Seek to the corresponding time in the video when clicking on a sentence
Indicate which sentence in the transcript is currently playing
If you’d rather just see the complete working example in action, you can check our version on codesandbox (though I’d still recommend you take a look at the final section, where we also show a few other use cases for CuePoints).
Now that we have a working mini app with a player, let’s create some transcript data that conforms to CuePoints so we can build our transcript UI. Recall that a CuePoint needs two bits of data: a (start) time and a value. Because strings are JSON-serializable values, we can break each transcript CuePoint into 1+ sentence for the value and a time (in seconds) for when that sentence begins:
Now that we know what our data will look like and have something to work with, we can start building our components to render the transcript, even before we hook the CuePoints up to Mux Player. For data-driven list UIs, I like to break the UI into at least two components: one that’s responsible for rendering each item in the list (i.e., each CuePoint) and one for rendering each of those components as a list (i.e., all of the CuePoints).
For now, our <TranscriptCuePointRenderer/> needs to take a cuePoint as a property and render its value — i.e., the transcript sentence(s). To make it a little fancy, we’ll also show a different background and show the time as a tooltip on :hover. Here’s the TSX for the component (plus a simple utility function to format the time):
And here’s the corresponding CSS you can add to your stylesheet to get the fancy hover behavior:
Next, let’s make the <TranscriptRenderer/>, which will render all the transcript CuePoints using our <TranscriptCuePointRenderer/>. To make it read like a transcript, we’ll render each CuePoint/transcript sentence inline (which is why we used a <span> above) and add a space between each CuePoint rendered. It should look something like this:
All that’s left to show our transcript is adding the <TranscriptRenderer/> to our <App/> and giving it the cuePoints:
At this point, your app should look something like this:
Now that we have a UI to show the transcript CuePoints, let’s start wiring up some time-based interactivity. Remember, we want to do two things: seek when clicking on the transcript and show what sentence is currently playing. For the first bit, we still don’t need to wire up our CuePoints (yet!). For now, we need to add some code that will update the currentTime of <MuxPlayer/> when a Transcript CuePoint (i.e., sentence) is clicked.
To keep it simple, we can use the useState hook for our currentTime, initialized to 0 so when we load media, Mux Player’s playback will start at the beginning. It should look something like this:
Now, let’s add a callback to pass down so that clicking on a rendered CuePoint will call setCurrentTime. For lists, I tend to implement this sort of thing generically as “selecting” a CuePoint. For our <TranscriptRenderer/> and <TranscriptCuePointRenderer/>, our updated versions should look something like this:
Now we can update the currentTime using the newly added callback in our simple app, which should look something like this:
With everything put together, you’ll be able to seek in the video by clicking on the rendered transcript. For example, if you begin playback and then click another part of the transcript, you should see something like this:
We’re finally at a point where we’ll start integrating CuePoints into Mux Player. For now at least, we will have to rely on the addCuePoints() method on the under-the-hood <mux-player> web component that the React version is built on top of. Additionally, you’ll need to wait until the video has started loading in Mux Player. To do this, add an onLoadedMetadata callback, which corresponds to the native loadedmetadata Event. To get some type safety, we’ll go ahead and import the TypeScript definition of the <mux-player> web component instance while we’re at it.
Now that we’ve got our CuePoints added, let’s get things in place for keeping track of and reacting to the currently playing CuePoint, called the “active” CuePoint. This will be similar to what we did for currentTime, only with the information flowing in the other direction. Like before, we’ll want to add some state to model the activeCuePoint.
And then we’ll want to monitor Mux Player for whenever the active CuePoint changes so we can update the state. This is as simple as adding an onCuePointChange callback, which conveniently provides the activeCuePoint as a detail of the event, so your Mux Player usage would look something like this:
On the transcript rendering side, like our callback before, we’ll need to pass down whether a given <TranscriptCuePointRenderer/> is the “active” CuePoint and then style it to indicate that it’s active, which we’ll do with a different background color. Your updated <TranscriptCuePointRenderer/> should look something like this:
And the simple CSS update will look like this:
The updated <TranscriptRenderer/> needs to determine if the CuePoint matches the activeCuePoint and pass that along, so it will look something like this:
And finally, you need to pass the activeCuePoint down. The final form of the App should look something like this:
Once all of this is done, you should see something like this in the running application:
And that’s it! You now have a fully working, interactive transcript renderer.
Now that you’ve built a basic but (mostly) real-world use case for CuePoints, I hope your creative juices are flowing for the kinds of immersive, interactive, and contextualized media-centric experiences you can unlock with this functionality. In case you need just a bit more inspiration, here are a couple more examples:
The complete working transcript example from this blog post (found here)
A simple example of question prompts using CuePoints from Dave Kiss (found here)
Adjusting your meditation time using a recorded guided meditation with CuePoints (found here)
Note how different each use case is, yet they can all be built using this same generic functionality. Keep this in mind as you’re thinking about potential features or experiences you may want to build.
This feature is still fairly new and fairly basic, so if you feel like a certain feature or functionality is missing, or even if you have a suggestion for better “developer ergonomics,” send a request our way!
Started working on OTT streaming media & players over a decade ago for a bunch of folks back in the Adobe Flash/RTMP days as a consultant at Digital Primates. When I’m not working on video players, I’m spending my time doing deep dives in philosophy, stomping around the Chicago arts scene, and watching TV on the internet with my cat Grits.