How to avoid cascading failures in a distributed system 💣💥🔥

Gaurav Sen
2 Nov 201818:05

Summary

TLDRThis system design video tackles the 'Thundering Herd' problem, where a surge of requests overwhelms servers. It explains rate-limiting as a solution, using server queues to manage load and prevent cascading failures. The video also addresses challenges like viral traffic, job scheduling, and popular posts, suggesting strategies like pre-scaling, auto-scaling, batch processing, and approximate statistics. It concludes with best practices, including caching, gradual deployments, and cautious data coupling, to enhance performance and mitigate the impact of sudden traffic spikes.

Takeaways

  • 🚦 The main problem addressed is the 'thundering herd' issue, which occurs when a large number of requests overwhelm the server, similar to a stampede of bison.
  • 🔄 Rate limiting is introduced as a server-side solution to prevent server overload by controlling the rate at which users can send requests.
  • 💡 The concept of load balancing is explained, where servers are assigned specific ranges of requests to handle, ensuring even distribution of load.
  • 🔄 The script discusses the cascading failure problem, where the failure of one server can lead to additional load on others, potentially causing a system-wide crash.
  • 🚫 A workaround mentioned is to stop serving requests for certain user IDs to prevent further overload, although not an ideal solution.
  • 📈 The importance of having a smart load balancer or the ability to quickly bring in new servers during peak loads is highlighted.
  • 🛑 The script suggests using request queues with each server having a defined capacity to handle requests, which helps in managing overloads.
  • ⏱️ It emphasizes the need for clients to handle failure responses appropriately, possibly retrying after some time, to manage server load.
  • 📈 Auto-scaling is presented as a solution for unpredictable traffic increases, such as during viral events or sales periods like Black Friday.
  • 📅 Job scheduling is identified as a server-side problem, where tasks like sending new year wishes to all users should be batched to avoid overload.
  • 📊 The script introduces the concept of approximate statistics, where displaying approximate numbers for metadata like views or likes can reduce database load.
  • 💾 Caching is recommended as a best practice to handle common requests efficiently and reduce database queries.
  • 📈 Gradual deployments are suggested to minimize server-side issues during updates, by deploying in increments and monitoring the impact.
  • 🔗 The script ends with a cautionary note on coupling, where keeping sensitive data in cache can improve performance but also poses risks if not managed carefully.

Q & A

  • What is the main problem addressed in the system design video?

    -The main problem addressed is the 'Thundering Herd' problem, which refers to a large number of requests overwhelming the server, potentially causing a cascading failure of the system.

  • What is rate limiting and why is it used?

    -Rate limiting is a technique used on the server side to control the amount of incoming traffic to prevent the server from being overwhelmed by too many requests at once, thus avoiding system crashes.

  • How does the load balancing scenario with four servers work in the script?

    -In the load balancing scenario, each server is assigned a range of requests to handle (1 to 400 in increments of 100). If one server crashes, the load balancer redistributes the load among the remaining servers, increasing their request ranges accordingly.

  • What is the cascading failure problem mentioned in the script?

    -The cascading failure problem occurs when one server's crash leads to an increased load on other servers, which may also become overwhelmed and crash, causing a chain reaction that can take down the entire system.

  • How can a server queue help in managing the load?

    -A server queue can help manage the load by allowing each server to have a limit on the number of requests it can handle. If the queue reaches its capacity, additional requests are either ignored or the server returns a failure response, preventing overload.

  • What is the difference between temporary and permanent errors in the context of rate limiting?

    -Temporary errors indicate that the request failure is due to a temporary issue, such as server load, and the client may try again later. Permanent errors suggest there is a logical error in the request that needs to be corrected by the client.

  • How can pre-scaling help with events like Black Friday?

    -Pre-scaling involves increasing the server capacity in anticipation of high traffic during specific events like Black Friday. This proactive approach helps to handle the increased load without overloading the existing servers.

  • What is auto-scaling and how does it differ from pre-scaling?

    -Auto-scaling is a feature provided by cloud services that automatically adjusts the number of servers based on the current load. Unlike pre-scaling, which is based on predictions, auto-scaling reacts to real-time demand.

  • Why is job scheduling a server-side problem that needs to be addressed?

    -Job scheduling is a problem because tasks like sending email notifications to a large number of users at once can create a sudden spike in load. It needs to be managed to avoid overwhelming the server.

  • What is batch processing and how does it help in job scheduling?

    -Batch processing involves breaking down large tasks into smaller chunks and executing them over time. In job scheduling, this can help distribute the load evenly, preventing server overload.

  • How can approximate statistics be used to improve server performance?

    -Approximate statistics involve displaying estimated or rounded numbers for metadata like views or likes on a post, rather than exact numbers. This can reduce the load on the server by avoiding unnecessary database queries for exact counts.

  • What are some best practices mentioned in the script to avoid the Thundering Herd problem?

    -The best practices include caching common requests, gradual deployments to minimize disruptions, and careful consideration of data coupling and caching sensitive data to improve performance without compromising security or accuracy.

Outlines

00:00

🚫 Rate Limiting to Prevent Server Overload

This paragraph introduces the concept of rate limiting as a solution to the 'thundering herd' problem, where a massive influx of requests can overwhelm server capacity. It uses the analogy of a bison stampede to describe the impact of too many requests hitting the server at once. The scenario of four servers handling a load-balanced request range is presented, with the failure of one server causing a cascading effect that could lead to a total system crash. The paragraph emphasizes the importance of avoiding such scenarios by implementing rate limiting to manage server load effectively.

05:01

🔄 Handling Server Overload with Queues and Scaling

The second paragraph delves into how to manage server load through the use of request queues and scaling. It explains that by assigning a compute capacity to each server and expanding the queue up to its limit, servers can avoid overloading. The paragraph also touches on the importance of client-side awareness when requests fail due to server limits, suggesting that temporary error messages can guide users to retry after some time. Additionally, it discusses the challenges of scaling in response to unpredictable viral traffic or planned events like Black Friday sales, highlighting pre-scaling and auto-scaling as potential strategies.

10:02

📅 Job Scheduling and Batch Processing

This paragraph addresses the issue of job scheduling, particularly the problem of running cron jobs that could potentially flood the server with tasks at a specific time, such as sending New Year greetings to all users simultaneously. The solution proposed is to break down the job into smaller, manageable chunks, using batch processing to distribute the load over time. This approach ensures that the server does not get overwhelmed and that the service remains reliable and responsive.

15:04

🔄 Batch Processing and Approximations for Popular Content

The fourth paragraph discusses the challenges of handling popular content, such as a viral post or a popular YouTube video, and how batch processing can mitigate the load. It also introduces the concept of adding 'jitter' to the notification process to spread out the load. Furthermore, the paragraph explores the idea of using approximate statistics for metadata, such as view counts, to reduce the load on the database and improve performance, even if it means displaying slightly inaccurate numbers to users who are not overly concerned with exact figures.

🛡️ Best Practices for Avoiding the Thundering Herd

The final paragraph wraps up the discussion by outlining best practices to avoid the thundering herd problem. It highlights caching as a means to reduce database queries and improve system performance. The paragraph also touches on the importance of gradual deployments to minimize the impact of new service updates. Lastly, it presents the controversial practice of coupling, where sensitive data might be cached to reduce load, but cautions that this approach must be used judiciously to avoid security risks.

Mindmap

Keywords

💡Rate-limiting

Rate-limiting is a technique used to control the amount of incoming and outgoing traffic to or from a network or an endpoint. In the context of this video, it is a method to prevent server overload by limiting the number of requests a server can handle. The script uses the analogy of a 'herd of bison' to describe the overwhelming number of requests that could potentially crash servers, and rate-limiting is presented as a solution to manage this influx.

💡Cascading failure

Cascading failure refers to a situation where the failure of one system leads to the failure of others, causing a chain reaction. In the video, it is illustrated with a scenario where the crash of one server (s1) leads to additional load on other servers, which may also crash if they cannot handle the increased load. This concept is central to understanding the importance of rate-limiting and load balancing in system design.

💡Load balancing

Load balancing is the distribution of workload across multiple computing resources to ensure no single resource bears too much demand, which can lead to failure. The script describes a scenario with four servers, each handling a range of requests. When one server crashes, the load balancer redistributes the load among the remaining servers, preventing a single point of overload.

💡Queue

In the context of this video, a queue is a line or sequence of requests waiting to be processed by a server. When a server's capacity is reached, incoming requests are placed in a queue and processed as capacity allows. This concept is used as a solution to manage the overflow of requests when a server is at full capacity, as seen when s1 crashes and the load is redistributed with queues expanding until they hit the new capacity limit.

💡Compute capacity

Compute capacity refers to the maximum amount of work a server can handle at a given time. The video assigns a compute capacity to each server, measured in requests per second. This concept is crucial for understanding how servers manage incoming requests and when they might become overloaded, which is a central theme in the discussion of rate-limiting and system stability.

💡Auto-scaling

Auto-scaling is a cloud service feature that automatically adjusts the number of computing resources based on the demand. In the video, it is suggested as a solution for unexpected traffic surges, such as going viral or during events like Black Friday. Auto-scaling helps maintain service stability by adjusting to the load, although it comes with the risk of increased costs.

💡Job scheduling

Job scheduling is the process of managing when tasks or jobs are executed on a server. The video discusses the problem of scheduling tasks like sending New Year's greetings to all users, which, if done simultaneously, could overload the server. The solution presented is to break the job into smaller pieces and execute them over time, spreading the load and avoiding a thundering herd problem.

💡Batch processing

Batch processing is a method of dividing large amounts of data into smaller manageable chunks for processing. In the video, it is used to mitigate the impact of sending notifications to a large number of users by sending them in batches over time, thus preventing server overload. This concept is key to understanding how to handle large-scale operations without compromising system performance.

💡Jitter

Jitter, in the context of this video, refers to the intentional introduction of variability in the timing of operations to prevent simultaneous overload. For example, when a popular post goes viral, notifications to followers can be sent in a staggered manner to avoid a sudden surge in traffic. This technique helps in distributing the load evenly over time.

💡Approximate statistics

Approximate statistics involve providing estimates or rounded figures for data that do not need to be exact, such as the number of views or likes on a video. The video suggests using approximate statistics to reduce the load on the server by avoiding frequent database queries for metadata. This approach can significantly lighten the server's workload, as demonstrated by YouTube's handling of view counts.

💡Caching

Caching is the process of storing copies of data in a temporary storage area to speed up future requests for that data. In the video, caching is presented as a best practice to improve system performance by reducing the need for repeated database queries. It is particularly useful for handling common requests, thus allowing the system to manage more load and maintain performance.

💡Gradual deployments

Gradual deployments refer to the practice of rolling out updates or changes to a system in stages rather than all at once. The video suggests this as a best practice to avoid potential issues that can arise from sudden, large-scale changes. By deploying in increments, any negative impacts can be more easily identified and managed, contributing to system stability.

💡Coupling

Coupling, in a software architecture context, refers to the degree of direct knowledge one module has of another. The video discusses 'coupling' as a controversial practice where data from one service, such as authentication details, might be stored in another service to reduce load. While this can improve performance, it also introduces risks, such as outdated or incorrect data being used, especially in sensitive applications.

Highlights

Introduction to the problem of the 'thundering herd' in server-side systems due to overwhelming request volumes.

Illustration of the server load balancing scenario with a range of requests per server and the impact of a server crash.

Explanation of cascading failure problem where the failure of one server leads to the overload and potential crash of others.

Discussion on the importance of server capacity and the danger of overloading with additional request ranges.

Proposing rate limiting as a solution to avoid server crashes by managing the number of requests handled.

Description of how request queues can be used to limit the load on servers and prevent overloading.

The concept of compute capacity per server and how it relates to the number of requests that can be processed.

The dilemma of serving some users versus no users in the event of server failures and the principle behind rate limiting.

Differentiating between temporary and permanent error messages to manage client expectations during request failures.

Strategies for handling sudden traffic spikes, such as pre-scaling and auto-scaling, especially during events like Black Friday.

The challenges of going viral and the role of rate limiting in managing unexpected surges in traffic.

Approaches to job scheduling to prevent server overload during tasks like sending new year wishes to all users.

The technique of batch processing to distribute workload evenly and prevent thundering herd problems.

Smart solutions like adding jitter to notifications to manage the load during popular posts or events.

The use of approximate statistics to减轻数据库负载 and improve performance without sacrificing user experience.

Best practices like caching to handle common requests efficiently and reduce database load.

The benefits of gradual deployments to minimize server-side issues during service updates.

Controversial practice of coupling systems for performance improvement with the associated risks.

Conclusion summarizing the importance of these strategies in mitigating the thundering herd problem and improving system design.

Transcripts

play00:00

hi everyone we are back with a new

play00:01

system design video on rate-limiting

play00:03

specifically the problem that we are

play00:05

trying to solve is that of the turn

play00:06

during hood so if you can imagine a huge

play00:08

group of bison charging towards you

play00:10

crushing everything in their path that's

play00:12

what it feels like many on the server

play00:14

side and there's a ton of requests

play00:15

coming in from users they're just going

play00:18

to crush our servers and crush your

play00:19

system completely so to avoid this

play00:22

problem what we do on the server side is

play00:25

something called rate limiting and let's

play00:27

just try to understand the scenario

play00:29

first let's say you have four servers

play00:31

and let's say you have a request range

play00:34

from 1 to 400 so every server is load

play00:36

balanced to serve a range of 100

play00:39

requests now let us assume that you have

play00:44

s1 crashing because for some reason

play00:47

maybe some internal issue s1 crashed

play00:51

resulting in s4 s3 and s2 taking

play00:54

additional load so s1 had the range from

play00:58

1 to 100 the load balance is going to be

play01:00

smart about this and assign them loads

play01:03

let's say 1 to 1 1 to 23 this is going

play01:07

to get an additional request range from

play01:11

34 to 67 and this is going to get an

play01:14

additional range from 68 to 100 ok so s1

play01:20

caching did not affect the rest of the

play01:23

servers because or rather did not affect

play01:25

the users because the rest of the

play01:27

servers are now able to serve their

play01:28

requests however there's a implicit

play01:32

assumption over here and the assumption

play01:35

is that each of these servers can handle

play01:36

the new load ok let us assume that s4

play01:40

did not have that much compute power it

play01:41

was barely surviving with request range

play01:44

of 100 and by adding to that range that

play01:48

it needs to serve now s4 is completely

play01:50

exhausted and the requests are taking

play01:53

too much time the too many timeouts s4

play01:55

crashes so s1 was already dead s4 is now

play02:01

dead and as you can expect somebody

play02:04

needs to actually answer so somebody

play02:06

needs to take these requests from this

play02:09

range so I'm going to give these

play02:12

additional ranges now 4 s 4 which is 3 0

play02:16

1 2 3 50 and also the additional range

play02:20

over here which is let's say 1 2 17 and

play02:24

now we have this serve also serving some

play02:28

range 351 - 418 - 23 you're probably

play02:36

getting an idea now these ranges are

play02:38

pretty big initially s3 was serving half

play02:42

of what it is serving right now as a

play02:43

request range and there's a good chance

play02:45

that s 3 will also crash

play02:49

so if s3 crashes it was serving 200% of

play02:53

the load that it could maybe just had

play02:55

50% additional and it crashes which

play02:58

means s2 has to serve around 400 percent

play03:01

of its original load approximately and

play03:04

there's a very good chance that s 2 will

play03:05

also crash resulting in the whole system

play03:08

crashing all of your users being upset

play03:10

and this is something that we really

play03:12

want to avoid so this problem is called

play03:15

the cascading failure problem and that's

play03:19

the first problem that we try to

play03:21

mitigate as you can see this cascading

play03:30

failure is a race against time when s1

play03:34

had crashed there's that Delta that

play03:36

small pine gap that you have for

play03:38

bringing in the new server before s4

play03:40

takes that much load and crashes so it

play03:43

is a race against time one of the things

play03:46

that you could do is have a really smart

play03:48

load balance or have some seamless sort

play03:51

of new server bring in but we should

play03:55

assume the worst and there are some

play03:57

possible let's say workarounds and one

play04:00

real solution to this problem one

play04:02

workaround of course is to just stop

play04:05

something requests for all users having

play04:06

requests IDs 1 to 100 yeah that's that's

play04:09

not really a solution but if you see

play04:12

that the other services can't take in

play04:14

more load it's better to be available to

play04:17

some users than to be available to none

play04:19

of the users and now we are out of

play04:22

workarounds so the real solution

play04:26

what we should do is take a cue and put

play04:31

our requests in this queue what's going

play04:35

to happen here is that every server can

play04:37

have a request queue and they can decide

play04:40

on answering or not answering a request

play04:42

so what I'm going to do is give each of

play04:46

these servers a particular capacity

play04:48

right compute capacity so s1 has 100

play04:53

compute capacity which for me one unit

play04:56

of compute capacity means it can handle

play04:57

one request per second so 100 requests

play05:00

per second

play05:01

300 requests per second 400 degrees per

play05:04

second 200 requests per second okay

play05:07

now let's say s1 clashes looking at the

play05:11

node s4 what we need to do is we need to

play05:14

see 300 is the maximum number of

play05:16

requests it can take so this queue is

play05:18

going to keep expanding till it hits 300

play05:21

if the 300 first request comes in what

play05:25

we are going to do is we are going to

play05:26

ignore that request we are going to just

play05:28

say no so when we return a failed

play05:34

response to the client at least this

play05:38

server is not being overloaded and also

play05:41

the client is now aware that ok this

play05:43

request fail

play05:44

maybe after five minutes I should try

play05:46

again so the user experience is going to

play05:48

be bad of course this user who made that

play05:50

request and fail is not going to be

play05:51

happy but again going by the principle

play05:54

that serving some users is better than

play05:56

serving no users we are going to start

play05:59

dumping requests one small thing to

play06:02

remember here is that the client

play06:03

shouldn't be stupid if the request fails

play06:06

and if the client is bombarding the

play06:08

server suddenly that on America's not my

play06:10

request this is going to be bad so there

play06:12

are some types of errors that you can

play06:14

send the client one is temporary and one

play06:17

is permanent right these are just types

play06:20

of errors that you can send if you say

play06:22

permanent it means that there's some

play06:23

serious mistake in the request you sent

play06:25

and there's a logical error temporary

play06:28

means that you should try in some time

play06:29

maybe there's some internal server issue

play06:31

going on maybe the database is too slow

play06:33

or maybe there's too much load so try

play06:35

after some time and the client can

play06:38

play messages accordingly but the

play06:40

general idea of course is to limit the

play06:42

number of requests you can take on the

play06:44

server side so that you can handle the

play06:46

load till the scaling bit comes in till

play06:49

you can thing in the new service all

play06:52

right so this is the first problem the

play06:55

second problem that you can face is if

play06:58

you go viral or if there's some sort of

play07:01

an event let's say Black Friday you know

play07:05

sales go up on Black Friday so might be

play07:08

an issue well when you have an event one

play07:12

of the things you can do is because you

play07:14

have prior knowledge you can scale

play07:16

beforehand you know if you have four

play07:19

servers and you assume that on Black

play07:21

Friday you're going to have 50% more

play07:22

users get 6 hours so that's the first

play07:26

solution which is pre scale however if

play07:32

you are not very sure about the number

play07:33

of servers you'll need during the during

play07:36

the event one thing that you could do is

play07:39

auto scale and please don't quote this

play07:42

video if you spend too much money auto

play07:44

scaling but yeah I mean auto scaling is

play07:48

something that is provided as a solution

play07:49

by cloud services you know if you if you

play07:53

host your service on the cloud you can

play07:55

probably ask them taught to scale your

play07:57

service and you know what is scaling is

play08:00

not a very bad idea usually because the

play08:02

increased number of traffic is probably

play08:06

meaning that you're going to make more

play08:07

money out of that traffic so yeah that's

play08:10

one solution how about if you go viral

play08:14

but if you go viral you can just fall

play08:17

back to the old solution of rate

play08:19

limiting so if you do rate limiting you

play08:28

will be stopping the maximum number of

play08:29

uses that you can actually serve and

play08:31

yeah

play08:32

and of course auto scaling and free

play08:34

scaling is a good idea but going viral

play08:36

is something that you can't predict so

play08:38

pre scaling is not really a solution

play08:39

these two other solutions the third

play08:42

problem is a real server-side problem

play08:44

and that is job scheduling

play08:48

so often is not rewrite cron jobs which

play08:52

run on some point in time I mean we we

play08:55

decide when it's going to run but

play08:57

imagine a cron job which is supposed to

play08:59

send email notifications to all users

play09:01

wishing them a happy new year on the 1st

play09:03

of January what could happen if you do

play09:06

this in a naive way is that you send all

play09:08

of the emails together when the clock

play09:11

hits first of January which means that

play09:13

if you have a million users you're going

play09:15

to send one million email notifications

play09:17

and that's of course like a huge herd of

play09:20

bison coming towards you so the way you

play09:22

avoid this is to break the job into

play09:26

smaller pieces let's say 1 million users

play09:29

so you have 1 million news IDs the first

play09:32

thousand users are broken I mean the

play09:35

users are broken into chunks the first

play09:37

thousand users are going to get the

play09:38

email in the first minute the second

play09:40

thousand users are going to get in the

play09:42

second minute

play09:42

and so on and so forth 1 million by

play09:44

1,000 is going to take 1000 minutes and

play09:48

with this what's happening is that you

play09:51

have divided the work that you hide on

play09:53

the server into smaller chunks which it

play09:55

can consume you know one minute 1000 is

play09:58

something which is not a tremendous load

play09:59

so it's going to survive and your users

play10:02

don't really care I mean if they don't

play10:04

get the email notification

play10:05

auto-generated email notification at the

play10:07

first minute of new year they don't

play10:09

really care so yeah it's something that

play10:12

you can do of course if they do care

play10:13

then you have to bring it down you have

play10:15

to bring this range down from 1,000

play10:17

minutes to whatever you like but you see

play10:19

that you can decide all right so batch

play10:22

processing is something that you should

play10:23

definitely do the fourth problem is as

play10:27

interesting as the other problems

play10:28

actually it's when someone popular post

play10:32

something or if a post goes like if a

play10:37

post becomes really popular if a lot of

play10:38

people liking it sharing it subscribing

play10:41

to it like you guys should but popular

play10:45

post let's say a user like PewDiePie

play10:49

post something on YouTube then you need

play10:51

to send it to all of their followers if

play10:53

you do it a knive way the same issue of

play10:55

you know job scheduling will come in

play10:58

there's too many users and a very small

play11:00

Delta so what you could do is batch

play11:03

processing over there you know send

play11:04

users in chunks of 1,000 but something

play11:08

that YouTube does really smartly is

play11:10

adding jitter in which case if you have

play11:15

a lot of followers what's going to

play11:17

happen is the notifications are going to

play11:19

go to them in the in a batch processing

play11:21

way but if they start hitting the page

play11:23

let's say the video page then there is

play11:26

some content the video content which is

play11:29

core to YouTube but there is a lot of

play11:32

data which actually doesn't matter if

play11:34

you think of it that's the number of

play11:36

views the number of likes number of

play11:41

comments and so on and so forth now if

play11:43

you have a very popular user like

play11:45

PewDiePie actually posting a video the

play11:48

number of views are going to be changing

play11:49

dramatically so what you could do is to

play11:53

faithfully display that or you could do

play11:56

it in the smart way so let's say in the

play11:57

first hour we get 1,000 views then in

play12:03

the second hour if there's a lot of

play12:06

users who are asking the number of views

play12:07

in this video I'm going to be smart and

play12:10

I'm just going to say 1,000 in 21.5 is

play12:14

the total number of views now so that's

play12:16

1,500 yeah but maybe the total number of

play12:20

views in reality was 1700 so there's a

play12:23

mismatch between reality and what is

play12:25

being displayed but we don't care

play12:28

because this is metadata this is

play12:31

something which is not code to the video

play12:32

so we are going to display some number

play12:34

which may or may not be true it's an

play12:37

approximation and do the users really

play12:39

care not really they want a general idea

play12:41

of what's going on and of course this

play12:44

seems like a really big difference but

play12:46

YouTube can be smart about this they can

play12:48

figure out how the views change over

play12:51

time and if it's this is the first hour

play12:54

and this is the second hour instead of

play12:56

finding out the total number of views in

play12:58

this video they can just run through

play13:01

this graph and figure out where it

play13:02

should lie okay

play13:05

YouTube must be much much smarter than

play13:07

this but I am just giving the general

play13:08

area of approximation instead of showing

play13:11

people the truth approximate and save a

play13:15

lot of load on your service

play13:17

potentially this could save a lot of

play13:18

database queries that you are making to

play13:20

get the metadata of a post right okay so

play13:25

that's the fourth smart solution and

play13:26

that is now the fifth smart solution

play13:29

which is approximate statistics

play13:32

apart from these solutions of course

play13:34

there's some good practices in the

play13:37

server side to avoid a thundering hood

play13:39

the first one is the most common one

play13:41

which is caching so if you're getting a

play13:46

lot of common requests then the response

play13:48

is going to be the same and you can just

play13:51

cash those requests and I mean basically

play13:54

cache the responses for those requests

play13:56

so those are key value pairs and this is

play14:01

going to save a lot of queries that

play14:03

you'll be making on the database in turn

play14:05

that will be improving the performance

play14:07

of your system and also you can handle

play14:09

more load then another thing that you

play14:11

can do is of course gradual deployments

play14:14

most of the issues that people get in

play14:18

the server side is when they're

play14:19

deploying the service so there's a lot

play14:21

of stories about you know site

play14:24

reliability engineers who are fighting

play14:26

deployments and the developers want to

play14:28

deploy more because they want to get

play14:29

more features out and the reliability

play14:32

engineers want to stop the province as

play14:33

much as they can because that makes the

play14:35

system more stable so it's it's an

play14:37

interesting tug-of-war and what you want

play14:40

to do essentially is deploy so gradual

play14:43

deployments in this what happens is you

play14:45

don't deploy let's say if you have a

play14:47

total of 100 so as you don't deploy them

play14:49

together you deploy the first thing you

play14:51

have a look at what's going on and then

play14:54

you deploy the next 10 and so on and so

play14:56

forth this won't be possible in certain

play14:59

scenarios when there's a breaking change

play15:01

so to speak but we are getting into too

play15:03

much detail and gradual deployment is a

play15:06

good idea

play15:07

deploy 10 10 10 10 together unless there

play15:10

is absolutely no choice that you haven't

play15:12

you have to deploy in parallel and the

play15:14

final point that I'm going to make of

play15:15

course is going to be controversial it's

play15:18

with

play15:19

star and that's called coupling so to

play15:24

improve performance sometimes what you

play15:26

need to do is you need to store data

play15:27

which is very similar to caching but let

play15:32

us assume that you have a service which

play15:34

for every request that gets asks an

play15:37

authentication service to authenticate

play15:39

the user first to authenticate this

play15:41

request and then serves the request

play15:44

let's say that this network call is too

play15:47

much for you

play15:47

maybe it's an external service what you

play15:50

could do is you could cache the users

play15:53

username and token or password or

play15:57

whatever you like and now what you can

play16:00

do is you can see that if the username

play16:02

password worked once in the past one

play16:05

hour then maybe the password hasn't

play16:07

changed and we are going to assume that

play16:09

this user is authenticated to make that

play16:10

like you know to call this service and

play16:13

we are going to go ahead instead of

play16:14

talking to authentication service and

play16:16

verifying whether it's true or not it's

play16:19

good in a way that it's not you know

play16:22

querying the authentication service all

play16:24

the time so reducing the load

play16:25

authentication service improving

play16:27

performance improving user experience

play16:31

except that if this is a financial

play16:34

system and the password has changed and

play16:36

if there's a person who's hacked into

play16:39

their account and then using this

play16:40

password then you're big trouble aren't

play16:42

you

play16:42

so if this is a double-edged sword in

play16:45

fact it seems like a really bad idea in

play16:47

most cases so you should only couple

play16:50

systems which is actually keeping data

play16:52

in the cache sensitive data or or

play16:56

important data in the cache that should

play16:57

be avoided however if you have some data

play17:01

like you know the profile picture or

play17:03

something you can keep that for one hour

play17:05

or two hours that's why there's a star

play17:07

over here you want to take this

play17:10

case-by-case and understand that keeping

play17:12

some data for an external service in

play17:15

your own service whether that's a good

play17:17

idea or not and that will improve

play17:19

performance and in turn that will help

play17:22

you handle more requests so the problem

play17:25

of the Thundering Herd will be slightly

play17:27

mitigated but really this is

play17:30

like performance improvement and

play17:31

probably we should put another star over

play17:34

here just to be sure that's it for this

play17:36

discussion on the thundering herd

play17:38

we often have discussions on system

play17:40

design so if you want notifications for

play17:41

that you can subscribe and of course if

play17:43

you have any doubts or comments on this

play17:45

discussion then you can leave them in

play17:47

the comments below I'll see you next

play17:49

time

play17:54

stuff like the views the number of views

play17:57

but the number of legs number of

play18:00

comments these things are not critical

play18:03

to a video

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
Rate LimitingSystem DesignServer LoadCascading FailureLoad BalancingAuto ScalingJob SchedulingBatch ProcessingApproximationCaching
هل تحتاج إلى تلخيص باللغة الإنجليزية؟