Streaming services are gaining popularity day-by-day. Some of the early pioneers in video streaming that we can think of are Netflix and YoutTube (gaining popularity in the mid-2000s). At present, there are plenty of streaming services across the world, and in the next decade or so, cable network television could be a passe.
Though Netflix doesn’t support live streaming, there are other platforms like YouTube and Facebook that support this feature. This article covers a brief overview of a Live Streaming Service, the various protocols that support streaming, and a high-level system design of the service itself.
Typically, live streaming involves a combination of sophisticated hardware and software components, which would be impossible to cover in this, single article. But, due consideration is given to them in the high-level design so that we can get a feel for the different components/services that are required to build a streaming service.
Note: there are differences between live streaming and a streaming service. Streaming of sport events, public functions, online gaming sessions, etc. fall under live streaming services. Pure content delivery platforms like Netflix, YouTube, Amazon Prime, etc. are Streaming Services. This article covers Live Streaming Services.
Before getting onto the components that build up the live streaming design, we need to know some widely used protocol abbreviations that are used in streaming. For more details, a Wikipedia link is tagged in case you want to understand any specific protocol more in-depth.
- RTMP — Real-Time Messaging Protocol is based on TCP and was developed by Macromedia (now owned by Adobe). This protocol was intended to stream audio, video, and data over the internet to a Flash player from a server. With HTML5 providing native support for playing video/audio data, Flash player lost its popularity, and most web browsers are deprecating its support. Further reading about RTMP on wiki. (Most mid-1990s to 2000s internet users would recall the prompts or the pop-ups to install “Adobe Flash Player” to view content!)
- RTSP — Real-Time Streaming Protocol is based on TCP and was developed by RealNetworks, Netscape, and Columbia University. This protocol internally uses other protocols to transfer media content from the server to the client and vice-versa. Further reading on wiki.
- SDP — Session Description Protocol was a proposal by the IETF in the late 1990s. SDP does not deliver any media by itself but is used between endpoints for negotiation of media type, format, and all associated properties. Further reading on wiki.
- WebRTC — This is an open source project that provides web-based, real-time communication (RTC). It allows browsers to capture audio and video data from devices and transfer it to a server using SDP. It provides web browsers and mobile applications with real-time communication using APIs. It eliminates the need to install plugins or download native apps. Further reading on wiki.
- HLS — HTTP Live Streaming was developed by Apple and is an open standard. It is based on HTTP-based adaptive bit-rate streaming. HLS is universal and supports most devices. HLS creates a playlist (.m38u format), which is an index to chunks of video files (~10-second chunk) of various formats.
Based on the network quality and bandwidth, the native player automatically requests a different chunk. HLS is widely popular, as it can stream to mobile devices and HTML5-based video players. Further reading on wiki. This protocol has a pretty good history with the iPhone gaining popularity in the mobile market, as Apple didn’t want to rely on Flash or Quicktime players to play content on their phones.
- DASH — Dynamic Adaptive Streaming over HTTP. This protocol is very similar to HLS and works by breaking content into a sequence of small, HTTP-based file segments. Each segment contains a short interval of playback time of content (such as a movie or any live broadcast of an event). Further reading on wiki.
You may also like:
Real-Time Analytics in the World of Virtual Reality and Live Streaming.
The high-level design covers various components that build a live streaming service and is shown in the diagram below. Please note, this is not an industry standard. Rather, it gives you an idea of the design aspects of various software and hardware components that help in building a live streaming service.
Publishers form the very first input that generate raw audio and video of the streaming service. Conventionally the main audio, video, and graphics are generated using mics and video-cameras. This data is mainly consumed by encoders (software/hardware-based), which forms the heart of the Publisher.
The primary role of an encoder is to consume the audio, video, and graphics that have to be streamed and convert them into data that can be sent across a network. Hardware Encoders are physical equipment, and they only encode and stream data with high reliability. These encoders can be attached to multiple audio and video devices. An example of an open source software encoder is the OBS — Open Broadcaster Software.
The protocols used for sending encoded data by the publishers are RTMP, RTSP, and SDP (may not be limited to these in reality).
A Streaming Server receives encoded data from the software/hardware encoders. It creates multiple formats of the stream and can save it locally or re-stream to another service. It also supports multiple protocols.
The main feature of the Streaming Server is Adapter Bit-rate Streaming. It is a method of video streaming over HTTP where the source content is encoded at multiple bit rates. Each of the different bit rate streams is segmented into smaller parts.
The segment size can vary between 2 and 10 seconds. First, the client downloads a manifest file that describes the available stream segments and their respective bit rates. During stream start-up, the client usually requests the segments from the lowest bit rate stream. If the client finds that the network throughput is greater than the bit rate of the downloaded segment, then it will request a higher bit rate segment.
The Streaming Server can be directed to create various video formats. For example, if the incoming data has a resolution of 1080p, then the server can be directed to generate different resolutions of the same data, like 720p, 480p, 360p, etc. This allows for adaptive bit-rate streaming.
The Streaming Server can re-stream data to other streaming services like Facebook, YouTube, etc. Typically, when the traffic grows, the server cannot handle all the requests, and it might end up being very slow, resulting in buffering issues at the client end.
In order to mitigate this and to make it highly scalable, the server can re-stream the content to edge servers or CDN providers, like AWS CloudFront, Akamai, etc. The server can also strategically push content to CDN providers that are geographically located where a majority of the usage is. The protocol supported in this case would be HLS/DASH only as they are based on HTTP and no other protocols are supported by edge servers or CDNs.
The Clients or Viewers are usually media players in the service provider or any device-specific applications (e.g YouTube/Facebook apps for Android or iOS). To reach every client device, the most popular protocol that is used is HLS. HLS is universally supported and can play on all modern devices. There are some devices that still support RTMP (with embedded flash players), but this is almost reaching (or already reached) an end-of-life.
Each and every component mentioned in the sections Publisher and Streaming Components of the design are very large topics in itself and doesn’t cover under the scope of this article. It is advisable to research the components in detail based on your needs. This article is a primer and gives an overview of live streaming, components, and various protocols used along with Wikipedia references and a high-level system design. I hope you found this helpful!