How does Netflix Handle so many Requests?

How does Netflix Handle so many Requests?

A System Design Approach

Ankit Raj's photo
Ankit Raj

Published on Sep 1, 2021

Subscribe to my newsletter and never miss my upcoming articles

Imagine watching your favourite show on Netflix, and it stops playing due to increased system load. Not only would you get furious, but Netflix will lose a lot of their userbase. ๐Ÿ“‰

Netflix has about 200 million subscriptions and accounts for 6 billion collective watch hours per month. ๐Ÿคฏ

freestocks-11SgH7U6TmI-unsplash.jpg

A failure in any system of Netflix can result in unexpected load and therefore hinder the users' viewing experience. Failure can occur due to a lot of reasons: misbehaving clients that trigger a retry storm, an under-scaled service in the backend, a bad deployment, a network blip, or issues with the cloud provider.

Despite getting tons of requests and unexpected load, how does Netflix ensure that you don't miss out on having a smooth watching experience?

It uses Prioritized Load Shedding - prioritizes its requests and categorizes traffic into different buckets as critical, non-critical and degraded. An API gateway service computes a priority score for each request and adds them to the respective bucket.

So, when in a bad situation like a high load or exceeded threshold, Netflix drops traffic, starting with the lowest priority. These are mainly log and background requests that are non-critical and can be brought back with a retry. This technique ensures that the playback experience remains uninterrupted and you enjoy your show. ๐Ÿ˜„

throttle.png image source - Netflix

As you can see from the graph, the API gateway performs a progressive load shedding based on request priority during the incident. The different colours in the graph represent requests with different priorities being throttled.

However, Netflix changes quickly, and non-critical requests can unexpectedly become critical. To make sure dropping non-critical requests does not impact users, Netflix uses a platform to capture live use cases and measure the impact on users' playback experience. They schedule them to run periodically.

It's interesting to see how all of these happen without disrupting your binge time! ๐Ÿฟ


Thanks for reading this article. I write about system design and break down how companies build their system. Join my weekly newsletter to get more insights! Connect with me on LinkedIn and Twitter

ย 
Share this