How to Live Stream from a Twilio Video Conference

This post describes how we (as software engineers) built a scalable live stream broadcasting solution on top of Twilio Video. Twilio offers programmable video conferencing support within the browser. They provide an SDK which lets your users authenticate to and join a "room" (video conference room). The SDK gives you control over how you want to display the video and play the audio.

Problem

We wanted a solution that would allow us to customize the video that was available for viewers. In our case, we needed to shape the video of each user into a circle and animate them larger/smaller based on who was speaking. We also needed to support thousands of viewers and multiple video conference participants (or hosts). We also did not want to rely on any processing on the "hosts'" side due to host browser limitations. The "hosts" would simply log into the website and be able to broadcast from their browser (desktop or mobile).

Introducing Videographer

Use a "ghost" recorder to record and broadcast the video conference. We chatted with Twilio and they also recommended this solution. At first, we were hesitant to implement this because we didn't know if we could trust the reliability of running a headless chrome instance. We also didn't know what the result would look like if we styled video through a canvas element. However, we found that it solved many of our problems above. Jiitsi + Jibri is also an open source solution that implements the "ghost" recorder.

Live Broadcasting Architecture
Architecture Flow
  1. Hosts join a Twilio video conference for hosts. (Twilio Video docs)
  2. Start a headless Chrome instance on your server in order to broadcast. We used Puppeteer and we spawned a separate Node process to run Puppeteer.
  3. Visit a page you create (with your headless Chrome) that will join the Twilio video conference as a participant (without video or audio). We are calling this the Videographer.
  4. Videographer listens to Twilio room events for participants and video/audio tracks.
  5. Videographer uses one HTML canvas element and draws each host's video onto the canvas (with the help of requestAnimationFrame). This allows us to layout the video for viewers however we want and package it as one video.
  6. Videographer captures the audio streams and video stream (with canvas captureStream) and packages it within MediaRecorder.
  7. Videographer then sends video/audio stream data every x seconds with MediaRecorder to a video encoder. For our use case, we built our own lightweight video encoder that runs Ffmpeg and sent data within the same server. We decided to do this because we already had a server with Videographer and we wanted some control over the video outputs. You can also find a video encoding service that will accept the data. Many encoding services require the push of RTMP data and that was not possible directly from the browser.
  8. Video encoder encodes the video for live HLS and outputs the files to a storage solution. We used AWS S3 and served the files through Cloudfront.
  9. Viewers watch the live stream in a HLS supported video player.

If you have any questions or want more to discuss this in more detail, we'd love to chat!