How to avoid cascading failures in a distributed system 💣💥🔥

Gaurav Sen

2 Nov 201818:05

Summary

TLDRThis system design video tackles the 'Thundering Herd' problem, where a surge of requests overwhelms servers. It explains rate-limiting as a solution, using server queues to manage load and prevent cascading failures. The video also addresses challenges like viral traffic, job scheduling, and popular posts, suggesting strategies like pre-scaling, auto-scaling, batch processing, and approximate statistics. It concludes with best practices, including caching, gradual deployments, and cautious data coupling, to enhance performance and mitigate the impact of sudden traffic spikes.

Takeaways

🚦 The main problem addressed is the 'thundering herd' issue, which occurs when a large number of requests overwhelm the server, similar to a stampede of bison.
🔄 Rate limiting is introduced as a server-side solution to prevent server overload by controlling the rate at which users can send requests.
💡 The concept of load balancing is explained, where servers are assigned specific ranges of requests to handle, ensuring even distribution of load.
🔄 The script discusses the cascading failure problem, where the failure of one server can lead to additional load on others, potentially causing a system-wide crash.
🚫 A workaround mentioned is to stop serving requests for certain user IDs to prevent further overload, although not an ideal solution.
📈 The importance of having a smart load balancer or the ability to quickly bring in new servers during peak loads is highlighted.
🛑 The script suggests using request queues with each server having a defined capacity to handle requests, which helps in managing overloads.
⏱️ It emphasizes the need for clients to handle failure responses appropriately, possibly retrying after some time, to manage server load.
📈 Auto-scaling is presented as a solution for unpredictable traffic increases, such as during viral events or sales periods like Black Friday.
📅 Job scheduling is identified as a server-side problem, where tasks like sending new year wishes to all users should be batched to avoid overload.
📊 The script introduces the concept of approximate statistics, where displaying approximate numbers for metadata like views or likes can reduce database load.
💾 Caching is recommended as a best practice to handle common requests efficiently and reduce database queries.
📈 Gradual deployments are suggested to minimize server-side issues during updates, by deploying in increments and monitoring the impact.
🔗 The script ends with a cautionary note on coupling, where keeping sensitive data in cache can improve performance but also poses risks if not managed carefully.

Q & A

What is the main problem addressed in the system design video?
-The main problem addressed is the 'Thundering Herd' problem, which refers to a large number of requests overwhelming the server, potentially causing a cascading failure of the system.
What is rate limiting and why is it used?
-Rate limiting is a technique used on the server side to control the amount of incoming traffic to prevent the server from being overwhelmed by too many requests at once, thus avoiding system crashes.
How does the load balancing scenario with four servers work in the script?
-In the load balancing scenario, each server is assigned a range of requests to handle (1 to 400 in increments of 100). If one server crashes, the load balancer redistributes the load among the remaining servers, increasing their request ranges accordingly.
What is the cascading failure problem mentioned in the script?
-The cascading failure problem occurs when one server's crash leads to an increased load on other servers, which may also become overwhelmed and crash, causing a chain reaction that can take down the entire system.
How can a server queue help in managing the load?
-A server queue can help manage the load by allowing each server to have a limit on the number of requests it can handle. If the queue reaches its capacity, additional requests are either ignored or the server returns a failure response, preventing overload.
What is the difference between temporary and permanent errors in the context of rate limiting?
-Temporary errors indicate that the request failure is due to a temporary issue, such as server load, and the client may try again later. Permanent errors suggest there is a logical error in the request that needs to be corrected by the client.
How can pre-scaling help with events like Black Friday?
-Pre-scaling involves increasing the server capacity in anticipation of high traffic during specific events like Black Friday. This proactive approach helps to handle the increased load without overloading the existing servers.
What is auto-scaling and how does it differ from pre-scaling?
-Auto-scaling is a feature provided by cloud services that automatically adjusts the number of servers based on the current load. Unlike pre-scaling, which is based on predictions, auto-scaling reacts to real-time demand.
Why is job scheduling a server-side problem that needs to be addressed?
-Job scheduling is a problem because tasks like sending email notifications to a large number of users at once can create a sudden spike in load. It needs to be managed to avoid overwhelming the server.
What is batch processing and how does it help in job scheduling?
-Batch processing involves breaking down large tasks into smaller chunks and executing them over time. In job scheduling, this can help distribute the load evenly, preventing server overload.
How can approximate statistics be used to improve server performance?
-Approximate statistics involve displaying estimated or rounded numbers for metadata like views or likes on a post, rather than exact numbers. This can reduce the load on the server by avoiding unnecessary database queries for exact counts.
What are some best practices mentioned in the script to avoid the Thundering Herd problem?
-The best practices include caching common requests, gradual deployments to minimize disruptions, and careful consideration of data coupling and caching sensitive data to improve performance without compromising security or accuracy.