The state of going live from a browser

Publishing a live stream directly from a browser feels like it must be one of those solved problems. Watching live video in a browser is so common these days it's hard to imagine a time when it required proprietary plugins to even have a chance of working. Even video communication feels trivial now thanks to modern browser features like WebRTC. The "trivial" part is only really true if you're using two browser windows on the same machine, but still, it's you on video! Twice!

So as a web developer looking at all this video successfully being sent and played back by the browser, it's totally reasonable to think that publishing a live broadcast directly from a browser would be easy. All the building blocks are here, there's surely an npm package that ties it all together for publishing to sources like Mux, Facebook, YouTube Live, Twitch, etc...

That's gonna be a no from browsers, dawg.

Unfortunately that's simply not the case. There's no reasonable way to publish a live broadcast directly from a browser. It's possible to capture the video and eventually get it there, but you're almost always going to need to get a server involved.

One of the big reasons for this is that the industry standard for publishing live streams is RTMP, which is a protocol browsers simply aren't able to natively speak. We've written about the options out there for native mobile applications, and the desktop has fantastic, open tools like the OBS project.

Why go live from the browser?

One of the most common reasons is simply due to friction. If you're building a live streaming solution and you want your customers to be able to go live as easily as possible, asking them to leave your service to go figure out some other piece of desktop software is a big ask.

On top of that, the tools out there for live streaming are complex in their own right. OBS Studio, for example, is an incredibly powerful and flexible tool, but that comes with the cost of being a daunting piece of software for the unfamiliar. Even with guides and tools out there to help users get set up, you're now supporting not only your service, but whatever tools your streamers end up using.

If you're already building a web app there's a good chance your team is good at...well building web apps. Building your go-live dashboard directly into your browser application would allow you to continue to utilize the expertise of your team, giving end-users a low-friction, branded experience that doesn't require them to learn anything but your application.

Before we go on...

Yes, for all of the reasons just mentioned, it's easy to see why it's so tempting, but going live directly from the browser is almost certainly going to be a worse experience for everyone involved. The quality will be worse, the stream less reliable, and the tooling more limited. Your streamers and your viewers are all probably better off if the broadcast is done from a native application.

Ok cool, now let's talk about our options.

We're going to talk about 3 high-level approaches to going live from the browser. By "going live," what we're specifically referring to is getting video from a streamer's browser to a broadcast endpoint via RTMP. Spoiler alert: all three of the approaches we're going to discuss are related, and two of them are essentially the same workflow with a twist. There are probably other options out there, but these are the closest to production ready you'll find.

WebRTC rebroadcasting

Most commonly, WebRTC is known as the technology that lets web developers to build live video chat into the browser. That's true, but it actually goes much further than that. WebRTC is made up of standards that allow for peer-to-peer Web applications that can transmit audio, video, or even just arbitrary data without the need for plug-ins or technically even servers¹.

A quick aside, a fellow Muxologist, Nick Chadwick, gave a talk on WebRTC → RTMP at AllThingsRTC in 2019. He goes much deeper into the underlying protocols in that talk than we are here, so if you're interested in the nitty gritty details, that one's highly recommended.

Given the well-documented path to video teleconferencing that WebRTC provides, the most common solution that people immediately gravitate towards is what's called "rebroadcasting." A server implements the WebRTC API to become a peer, then takes the video feed and publishes it via RTMP.

This approach is, to put it simply, difficult. The good news is, that path has gotten a little easier in recent months, with projects like Pion maturing and higher level tools like node-webrtc adding support for accessing actual video frames.

Broadcasting headless Chrome

Nick also mentions this approach in his talk (and built an example), but another approach is to simply bypass server-side implementations altogether and use the one that's arguably the most battle-tested and has a wide selection of open-source tooling: Chrome. Yes, that one, the browser.

Thanks to projects like Puppeteer, the process of programmatically interacting with a headless Chrome instance is pretty straightforward. From there you can build a normal WebRTC experience and use ffmpeg to broadcast whatever's in your headless Chrome instance via RTMP.

The huge benefit of this approach is that it allows the developer to effectively build any experience in the user interface. Stream overlays, multiple speakers on a call, video effects, whatever you could build with canvas or the DOM would Just Work™ since it's...well, it's a browser. It's also not that much additional work on top of building out normal, peer-to-peer chat for that reason.

The downside of this approach is that you need to have a Chrome instance for every streamer. If you're just looking to stream yourself this isn't a huge issue, but if you're looking to support an arbitrary number of streamers this could become problematic.

Video over WebSockets

This one is the simplest and, in my opinion, the most fun to hack around on. Yes, as promised, this solution also uses at least one piece of the WebRTC toolchain, getUserMedia() (the way you request access to the browser's mic and camera). However, once you have the media, instead of delivering the media via WebRTC's protocols, you use the MediaRecorder API.

This allows for similar flexibility to the headless Chrome example: you can render the user's camera to a canvas element and manipulate the video however you'd like there. The MediaRecorder will fire an event every time it has a "chunk" of video data ready, at which point you send it to the server via the websocket as a binary blob. The server then listens for these data chunks and pipes them into a running ffmpeg command as they're received.

The benefit to this approach is that it's much closer to "traditional" applications in terms of running and scaling. You need a persistent WebSocket connection with each streamer, yes, but the requirements of each stream are actually pretty low since we've got ffmpeg doing as little as possible before publishing the RTMP stream. In fact, this example application using Next.js runs just fine on a Glitch server. Let's talk about how it works

The Client

For the example we used a React framework called Next.js with a custom Node.js server.

Before the client can do anything, it needs to request access to the user's camera and microphone by calling getUserMedia with the requested constraints. Calling this function will prompt the browser to ask the end-user if they'd like to share the requested resources.

javascript


// This would just ask for access to audio and video, but you can also 
// specify what resolution you want from the video if you'd like.
const cameraStream = await navigator.mediaDevices.getUserMedia({
  audio: true,
  video: true
});

The call to getUserMedia returns a promise, which (if the user agrees) will resolve and return the camera stream. That camera stream can then be set as the srcObject of a video tag, at which point you've got the webcam playing back in the browser window!

From here, what we're doing in the demo is rendering that video stream to a canvas element using a very similar technique to what we described in our blog post on manipulating video via the canvas element. Once we're copying the video over to the canvas element, we can capture that stream, and initialize a new MediaRecorder instance.

javascript


const mediaStream = canvasEl.captureStream(30); // 30 frames per second
const mediaRecorder = new MediaRecorder(mediaStream, {
  mimeType: 'video/webm',
  videoBitsPerSecond: 3000000
});

The new MediaRecorder object will fire an event every time a blob is ready (ondataavailable). We can listen for that event, and when we receive it send the data blob right down an open WebSocket connection.

javascript


// Listen for the dataavailable event on our mediaRecorder instance
mediaRecorder.addEventListener('dataavailable', e => {
  // Then send the binary data via the WebSocket connection!
  ws.send(e.data);
});

The Server

The server listens for incoming WebSocket connections, and when a new one is created it initializes a new ffmpeg process that's streaming to the specified RTMP endpoint. Whenever a new chunk of video comes in via a message, the server pipes that received data to the ffmpeg process, which in turn broadcasts it via RTMP.


webSocketServer.on('connection', (ws) => {;
  // When a new connection comes in, spawn 
  // a new `ffmpeg` process
  const ffmpeg = child_process.spawn('ffmpeg', [
    // ... ffmpeg settings ...

    // final argument should be the output, which in this case is our RTMP endpoint
    `rtmps://global-live.mux.com/app/${STREAM_KEY}`,
  ]);

  // If our ffmpeg process goes away, end the WebSocket connection
  ffmpeg.on('close', (code, signal) => {
    ws.terminate();
  });

  ws.on('message', (msg) => {
    // If we're using this WebSocket for other messages, check 
    // and make sure before piping it to our ffmpeg process
    if (Buffer.isBuffer(msg)) {
      ffmpeg.stdin.write(msg);
    }
  });

  // If the WebSocket connection goes away, clean up the ffmpeg process
  ws.on('close', (e) => {
    ffmpeg.kill('SIGINT');
  });
});

Profit! Kinda.

It works! It's fun and fairly simple, with both code and client coming in at < 300 lines of code. It's got the advantage of being easy to interact with the outgoing stream, and it's quick and easy to hack on. You can give it a try now, just go remix the Glitch, specify your own Mux stream key, and try it out.

However, there are huge drawbacks to the Javascript side of things. For example, modern browsers will de-prioritize the timers on a tab that isn't front-and-center, meaning if the streamer switches to a different tab, the streaming page won't send chunks of video fast enough and eventually the stream will stall. There are ways to ensure that doesn't happen, but most of them will require at least some participation from your streamer.

Let us help your users go live!

Unless you have a lot of resources to devote for building out an application around going live from the browser we suggest providing your users other tried and true native options or pointing them towards one of the fantastic paid browser options. That being said, we're here to help! If you want help figuring out the best way to let users go live in your application, please reach out.

The state of going live from a browser