System Design Primer ⭐️: How to start with distributed systems?

Gaurav Sen
23 Nov 201909:21

Summary

TLDRThis video script uses the analogy of a growing pizza parlor to illustrate key concepts in system design and scalability. It explains vertical scaling with a single chef, introduces the idea of process optimization, and highlights the importance of redundancy to avoid single points of failure. The script further discusses horizontal scaling, specialization, microservices, and the benefits of a distributed system for fault tolerance and efficiency. It concludes with the importance of system extensibility and the distinction between high-level and low-level system design, offering insights for aspiring senior engineers.

Takeaways

  • 🍕 The script uses the analogy of a pizza parlor to explain system engineering concepts, starting with a single chef (resource) handling orders.
  • 📈 When one resource can't handle the load, vertical scaling is introduced as a solution, akin to upgrading a computer's capabilities.
  • 🛠️ The concept of optimizing processes is discussed, such as pre-making pizza dough during non-peak hours to increase efficiency.
  • 👨‍🍳 The importance of resilience is highlighted by introducing a backup chef to avoid a single point of failure, similar to a master-slave architecture in computing.
  • 👥 Horizontal scaling is presented as the solution to further growth, which involves hiring more chefs (resources) to handle more orders.
  • 🔄 The script introduces the idea of routing orders based on chef expertise to optimize the workflow, like a microservice architecture in software systems.
  • 🏢 The need for a microservice architecture is discussed, where different teams handle different responsibilities, making the system more manageable and scalable.
  • 🌳 The script touches on the concept of fault tolerance by suggesting opening a new shop as a backup, which is a step towards a distributed system.
  • 🌐 It explains the idea of a distributed system, where multiple locations can serve orders, improving response times and fault tolerance.
  • 🤖 The role of a load balancer is introduced as a central authority that intelligently routes requests to optimize for time and efficiency.
  • 🔧 The importance of system decoupling is discussed, where different parts of the system have clear, separate responsibilities for better efficiency and flexibility.
  • 📊 The script emphasizes the need for logging and metrics to understand and improve the system's performance over time.
  • 🛠️ The takeaway on system extensibility is highlighted, emphasizing the need for a system design that can adapt to new purposes without major rewrites.

Q & A

  • What is the primary example used in the script to illustrate system engineering principles?

    -The primary example used in the script is opening and scaling a pizza parlor.

  • What is the initial challenge faced by the pizza parlor in the script?

    -The initial challenge is that one chef cannot handle all the orders brought in by new customers.

  • What is vertical scaling in the context of the pizza parlor example?

    -Vertical scaling refers to optimizing processes and increasing throughput by asking the chef to work harder, potentially by paying them more, which is analogous to upgrading a computer's resources.

  • Why is it beneficial to prepare pizza paste during non-peak hours?

    -Preparing pizza paste during non-peak hours ensures that the chef is not occupied with this task when regular orders come in, thus optimizing the process and increasing efficiency.

  • What is the term used in the script to describe having a backup chef in case the primary chef is unavailable?

    -The term used is 'single point of failure', and the solution is to hire a backup chef to avoid this issue.

  • How does hiring more chefs relate to the concept of horizontal scaling?

    -Hiring more chefs is an example of horizontal scaling, which involves adding more resources of the same type to increase the system's capacity.

  • What is the strategy suggested in the script for routing orders based on the chefs' specialties?

    -The strategy is to route all garlic bread orders to the chef who specializes in garlic bread, and all pizza orders to the chefs who specialize in making pizzas, thus optimizing the use of each chef's skills.

  • What is the term used in the script to describe the architecture where responsibilities are well-defined and separated?

    -The term used is 'microservice architecture', which involves dividing the system into smaller, independent services that handle specific tasks.

  • Why is it important to distribute the pizza parlor business to different locations, as suggested in the script?

    -Distributing the business to different locations helps in creating a fault-tolerant system, where issues at one location do not affect the entire operation, and it can also improve response times for customers.

  • What is the role of a load balancer in the context of the distributed pizza parlor system described in the script?

    -A load balancer is responsible for routing customer requests to the most appropriate pizza parlor based on factors like wait time and delivery time, ensuring efficient service and maximizing profits.

  • What is the significance of decoupling in system design as discussed in the script?

    -Decoupling in system design allows for flexibility and scalability. It separates concerns so that changes in one part of the system do not affect others, making it easier to manage and extend the system.

  • How does the script differentiate between high-level and low-level system design?

    -High-level system design focuses on the overall architecture, deployment, and interaction between systems, while low-level system design deals with the actual coding, including class structures, object creation, and function definitions.

Outlines

00:00

🍕 Scaling a Restaurant: From One Chef to Multiple Chefs

This paragraph discusses the challenges and strategies involved in scaling a restaurant business, using the analogy of a pizza parlor. It starts with the scenario of a single chef unable to handle all orders, leading to the concept of vertical scaling where the chef is asked to work harder for more pay. The speaker then introduces the idea of optimizing processes, such as pre-making pizza paste during non-peak hours, to increase efficiency. The concept of resilience is introduced by suggesting the hiring of a backup chef to avoid business loss due to the main chef's absence. The paragraph concludes with the idea of horizontal scaling, where more chefs are hired to handle increased demand, and the importance of utilizing each chef's expertise for maximum efficiency.

05:01

🌐 Distributed Systems and Load Balancing in Scalability

The second paragraph delves into the concept of distributed systems, using the example of a pizza shop expanding to multiple locations. It highlights the importance of having local servers, akin to opening new shops, to improve fault tolerance and response times. The speaker discusses the role of a central authority or load balancer in routing customer orders to the most efficient shop based on wait times and delivery times. The paragraph also touches on the idea of decoupling, where different parts of the system (e.g., pizza shops and delivery agents) operate independently but are coordinated by a central system. The importance of logging and metrics for understanding system performance and maintaining extensibility is emphasized, concluding with the distinction between high-level and low-level system design in the context of scalability and efficiency.

Mindmap

Keywords

💡Vertical Scaling

Vertical scaling refers to the process of increasing the capacity of a system by adding more resources to a single machine, such as increasing CPU power, memory, or storage. In the video's context, it is used to describe the initial approach to handling increased demand in a pizza parlor by asking the chef to work harder and potentially earning more, which is analogous to upgrading a computer's hardware to handle more tasks.

💡Optimizing Processes

Optimizing processes involves making changes to the way tasks are performed to improve efficiency and throughput. In the video, this is demonstrated by preparing pizza paste in advance during non-peak hours, thus freeing up the chef to focus on other tasks when orders come in. This strategy is crucial for managing resources effectively and maintaining service quality.

💡Resilience

Resilience in a system refers to its ability to withstand or recover from failures. The video uses the example of a chef calling in sick, which could halt business operations. To build resilience, the video suggests hiring a backup chef, ensuring that the business can continue operating even if the primary chef is unavailable.

💡Single Point of Failure

A single point of failure is a part of a system that, if it fails, will stop the entire system from working. In the video, the chef is initially the single point of failure in the pizza parlor. The solution proposed is to hire a backup chef, thus distributing the risk and ensuring that the business is not entirely dependent on one individual.

💡Horizontal Scaling

Horizontal scaling is the practice of adding more machines or nodes to a system to increase its capacity. The video illustrates this concept by suggesting the hiring of more chefs to handle the growing number of orders, which is akin to adding more servers to a computer system to handle more workload.

💡Microservice Architecture

A microservice architecture is a way of designing software systems as a suite of small services, each running in its own process and communicating with lightweight protocols. In the video, this concept is applied to the pizza parlor by creating specialized teams of chefs for different tasks, such as making pizzas or garlic bread, which simplifies management and allows for more efficient scaling.

💡Fault Tolerance

Fault tolerance is the ability of a system to continue operating properly in the event of the failure of some of its components. The video discusses the importance of fault tolerance by suggesting the opening of a second pizza shop to ensure that if one shop encounters issues, the other can still serve customers.

💡Distributed System

A distributed system is a network of computers that work together to perform tasks. The video introduces the concept of a distributed system by discussing the idea of having multiple pizza shops that can handle orders independently but also communicate with each other to ensure efficient order processing.

💡Load Balancer

A load balancer is a device or service that distributes network or application traffic across multiple servers to ensure no single server bears too much demand. In the video, the concept is used to describe how a central authority can intelligently route customer orders to different pizza shops based on factors like wait times and delivery times.

💡Decoupling

Decoupling is the process of separating the components of a system to make them independent of each other, which can improve flexibility and maintainability. The video uses the example of separating the responsibilities of pizza shops and delivery agents, allowing each to operate independently and efficiently.

💡Extensibility

Extensibility refers to the ability of a system to be extended or modified to accommodate new requirements or features. The video emphasizes the importance of designing systems to be extensible, allowing for easy adaptation to new business needs, such as changing from delivering pizzas to burgers.

Highlights

Introduction to the concept of system engineering with a real-world example of opening a restaurant.

The issue of a single chef being unable to handle increased customer orders, leading to the concept of vertical scaling.

Optimizing processes by preparing pizza paste during non-peak hours to increase throughput.

The importance of making systems resilient by avoiding single points of failure, such as having a backup chef.

The transition from a single chef to multiple chefs as a strategy for horizontal scaling.

Specialization of chefs based on their expertise to optimize order routing and system efficiency.

The introduction of microservice architecture to define clear responsibilities within the system.

The scalability of the pizza shop business model and its ability to handle all orders within time.

Addressing external risks such as electricity outages and the need for a distributed system.

The concept of opening a new shop as a step towards a more fault-tolerant and responsive system.

The role of a central authority or load balancer in intelligently routing customer requests.

The benefits of decoupling the system to increase flexibility and handle changes more efficiently.

The importance of logging and metrics for understanding system events and performance.

The need for system extensibility to avoid rewriting code for different purposes, illustrated by the delivery agent example.

Mapping business scenarios to technical solutions and the process of high-level design.

The distinction between high-level and low-level system design and their respective focuses.

The significance of writing efficient and clean code for a senior engineering level.

Encouragement to subscribe for future videos on system design and related topics.

Transcripts

play00:00

Hey everyone, today we'd be talking. I'm sorry.

play00:03

Is it fine if I record a video? No, no problem. Oh, thank you so much.

play00:06

.

play00:17

Usually when you're building a system and engineering system,

play00:19

there's actually some sort of background behind it.

play00:21

We'll be taking a real world example of opening a restaurant.

play00:24

Let's see how that happens.

play00:30

Let's take an example of a pizza parlo and we have just one chef.

play00:35

There comes a point though that one chef cannot handle all the orders that all

play00:39

the new customers are bringing in.

play00:44

If you think like a manager,

play00:45

the first thing that you're going to do is ask the chef to work harder and you

play00:49

can pay them more, put in more money. They give you more output.

play00:52

You want to optimize processes and increase throughput

play00:56

using the same resource. When you think of the chef as a computer,

play01:00

and put this in technical terms, it's called vertical scaling.

play01:04

Speaking of optimizing processes, you can do some things beforehand.

play01:07

When you get a order, you don't need to actually make the pizza paste.

play01:10

That can be pre-made preparing beforehand at non-peak hours.

play01:15

The reason you want to do this at non-peak hours is because you don't want a

play01:18

regular order to come in and your chef being busy making the pizza basis

play01:23

somewhere around 4:00 AM in the night is really good because you surely won't

play01:27

have any pizza orders that time. Now that the system is set up,

play01:30

let's make it resilient.

play01:33

Let's say that the chef calls sick one day. At this point,

play01:36

your business is in trouble because there won't be any business that day.

play01:39

This person is a single point of failure.

play01:41

So what you can do then is hire a backup chef in case the chef doesn't come.

play01:46

You employ them for that day only and you pay them. Of course, in this case,

play01:51

the chance of you losing out on business is really low because you have not just

play01:55

one chef, but also the backup.

play01:56

Keep backups and avoid single points of failure for

play02:01

computers. It's something like a master slave architecture, the master chef,

play02:05

and you have a slave chef, which is a little lot to say. So that's what we need.

play02:09

Now, if your business keeps growing every time,

play02:11

then you better make that backup chef a full-time chef. In fact,

play02:14

hire more chefs. Let's say instead of one chef,

play02:16

you have now 10 chefs and a few in backup. Also, just in case,

play02:20

hire more resources, which maps to horizontal scaling.

play02:26

Horizontal scaling is buying more machines of similar types to get more work

play02:30

done.

play02:34

Let's say we have three of our chefs over here, one, two, and three.

play02:37

They have some specialties. Here's a question.

play02:41

You have chefs one and three who are experts at making pizzas and chef two's

play02:45

expertise is garlic bread. If you have two types of incoming orders,

play02:48

which is pea and garlic bread, how would you route them?

play02:55

What you can do is randomly assign the orders. So if you have garlic bread,

play02:58

it can go to chef, to chef one, you can take pizza and send it to chef two,

play03:01

but this is not the most efficient way to use your employees.

play03:04

You can build on their strengths and route all garlic bread orders to chef two

play03:09

and all pizza orders to chef one and three.

play03:11

This makes the system a little simpler because anytime you need to make a change

play03:14

in the recipe for garlic bread, you just need to notify chef Two,

play03:18

anytime you need the status of any order on garlic bread,

play03:21

chef two is the person you ask. You can actually make a team

play03:25

like a team of chefs over here who are specialists in garlic bread.

play03:28

Maybe you just need three chefs over here for garlic bread because the number of

play03:31

orders is going to be a little less. So for pizzas,

play03:33

you need the remaining seven chefs distributed enter team of three and four.

play03:37

They're good at making pizzas and they're getting all the pizza orders.

play03:40

What you're doing is you're scaling this team at a different rate compared to

play03:42

these two teams and also dividing responsibilities.

play03:46

So we have something called a microservice architecture.

play03:50

You have all your responsibilities well-defined over here.

play03:53

There's nothing outside your business use case that you handle,

play03:56

so that is point number five. At this point,

play03:58

a pizza shop is actually doing really well because it's able to handle all

play04:01

orders within time,

play04:02

and it also has specialists for everything which you can scale easily.

play04:05

This business is scalable to a large extent,

play04:08

but what if there is an electricity outage in this pizza shop?

play04:12

You won't have business that day. What if you lose your license for a day?

play04:16

You won't have business that day.

play04:18

So what you want to do is you want to distribute. I mean,

play04:21

you don't wanna put all your eggs in one basket, not not even in one shop.

play04:24

You wanna buy a separate shop in a different place,

play04:26

which can also deliver pizzas. Maybe it takes more time.

play04:29

Maybe the number of chefs there is lesser, but at least you have a backup.

play04:33

So we take backup to a different level O here and open a new shop.

play04:39

This is probably the biggest step where we introduce a lot of complexity to the

play04:42

system because there sometimes needs to be communication between these shops.

play04:46

You need to be able to route your requests. I mean,

play04:49

you get a request for a pizza. You need to be able to tell that,

play04:52

should I order it to this or should I send the order over here?

play04:55

A distributed system.

play04:58

And one very clear advantage that we can have here is that any orders which are

play05:01

very close to this, which are local to its range, can be served by this shop

play05:06

in a large scale distributed system. Let's say Facebook,

play05:08

you get requests from all around the world to give quick response times.

play05:12

You need some sort of local servers everywhere, and that's what we are doing.

play05:15

We are distributing our system so that it's more fault tolerant and also gives

play05:19

quicker response times. Let's say you have the old shops pizza shop one and two,

play05:23

and you have delivery agents and you have customers.

play05:26

Every time a customer makes a request,

play05:28

they need to either send it to one or two,

play05:31

but the customer is not going to be taking that responsibility.

play05:34

So you want to send it to somebody else,

play05:36

maybe a central place which just routes requests,

play05:40

and you don't just want to send these requests randomly.

play05:44

You have a very clear parameter.

play05:46

How much time does it take for the customer to get the pizza? That's it.

play05:50

That's your parameter. If you send it to pizza shop one,

play05:53

it's a really popular shop.

play05:54

Maybe it takes one hour for it to wait in queue plus five minutes to make

play06:01

plus 10 minutes to deliver from PSS one to the customer. Over here,

play06:05

pizza Shop two has a really short wait time.

play06:09

The total time required here is one hour five minutes,

play06:11

which is less than the one hour 15 minutes required over here.

play06:13

So the central authority should actually send it over here. So,

play06:16

and as long as it's getting real time updates,

play06:18

it can make intelligent business decisions, which means more money.

play06:22

This thing that route requests in a smart way is called a load balancer,

play06:26

and you can assume why the system is now fault tolerant,

play06:30

but how do you make it flexible to change?

play06:35

At this point,

play06:35

you can almost tell that the delivery agent and the pizza shop have nothing in

play06:38

common. I mean, it could be a pizza shop,

play06:40

it could be a burger shop for the delivery agent.

play06:42

They just want to deliver their goods as quickly as possible to the customer.

play06:47

And similarly,

play06:47

the pizza shop doesn't care whether it's a delivery agent or the customer

play06:50

themselves who come and pick it up.

play06:52

So we are seeing some sort of separation of responsibilities.

play06:55

Instead of having the same managers managing the pizza shop and the delivery

play06:58

agents, you want to separate that out. It's called decoupling the system,

play07:03

separating out concerns so that you can handle

play07:07

separate systems more efficiently.

play07:10

Let's say pizza shop one has a faulty oven, their churning rate goes down.

play07:16

If you have a faulty bike,

play07:17

maybe that particular delivery agent's order times increase. So at this point,

play07:21

what you want is you want to log everything.

play07:23

You want to see at what time something happened and what is the next event,

play07:26

and so on and so forth. And also you want to be taking those events,

play07:29

condensing them, finding sense out of those events. So that's metrics.

play07:36

The final and most important point is to keep your system extensible.

play07:39

As a backend engineer.

play07:40

You don't want to rewrite all this code again and again to serve a

play07:45

different purpose. For example,

play07:46

this delivery agent doesn't need to know that they're delivering a pizza.

play07:50

It can be a burger tomorrow. And if you think about Amazon earlier,

play07:53

they used to deliver only parcels.

play07:55

And the reason why you can scale out your business is because you want to

play07:58

decouple everything to make sure that your system is extensible.

play08:02

What we have done is taken a business scenario,

play08:04

try to find solutions to all the problems that it came up with,

play08:07

and then just map them into technical terms. Now, if you think of these,

play08:10

they are solutions in themselves for the technical counterparts of these

play08:15

problems. Finally, we have managed to scale our restaurant at a high level.

play08:18

We can now define what kind of problems we face and how we'll be solving them.

play08:22

This is known as high level design. There's a counterpart to this,

play08:26

which is called low level design. Let's briefly talk about that,

play08:30

the difference between high level system design and low level system design.

play08:32

So high level is what we talk about on this channel. You know,

play08:35

deploying on servers figuring out how two systems will be interacting with each

play08:39

other.

play08:40

Lowell system design has a lot more to do with how you're actually going to code

play08:43

this stuff, like making classes, making objects, the functions, the signatures,

play08:47

these things are pretty important if you are a senior engineer.

play08:50

And even if you're not, if you want to go to the senior engineering level,

play08:54

you need to know about how do you write efficient and clean code,

play08:58

the load band.

play08:59

So microservice architecture and a few other videos are there in the

play09:02

description. And if you want notifications for the future videos,

play09:04

you can hit the subscribe button. Until next time, then I'll see you.

Rate This

5.0 / 5 (0 votes)

関連タグ
Restaurant ScalingEngineering SystemsVertical ScalingHorizontal ScalingChef ExpertiseMicroservicesFault ToleranceLoad BalancingDistributed SystemsSystem Design
英語で要約が必要ですか?