Back-Of-The-Envelope Estimation / Capacity Planning

ByteByteGo
13 Sept 202208:31

Summary

TLDRThis video script introduces back-of-the-envelope math as a valuable tool for system design, emphasizing its utility in quickly estimating and sanity-checking designs without the need for extreme precision. It explains how to estimate requests per second and queries per second using metrics like DAU and scaling factors. The script provides an example of calculating Twitter's peak tweet creation rate and offers tips for simplifying calculations with scientific notation. It also demonstrates how to estimate storage requirements for multimedia files in tweets, illustrating the process with made-up numbers. The video encourages a practical approach to system design, focusing on order-of-magnitude accuracy rather than exact figures.

Takeaways

  • 📝 Back-of-the-envelope math is a valuable tool for system design, used to quickly sanity-check a design without needing absolute accuracy.
  • 🔱 It's sufficient to be within an order of magnitude or two of the actual numbers when using this method for preliminary calculations.
  • 💡 This math helps in understanding the scale of infrastructure needed, like the number of web servers or the necessity of database sharding or caching.
  • ⚙ The most common metric to estimate is requests per second at the service level or queries per second at the database level.
  • đŸ‘„ Daily Active Users (DAU) is a key input, often estimated as a percentage of Monthly Active Users (MAU) if only MAU is available.
  • 📈 Usage per DAU must be estimated, taking into account that not all active users will engage in the same way with the service.
  • 📊 A scaling factor is necessary to account for peak usage times, which can be significantly higher than average traffic.
  • ✂ Simplifying calculations involves converting large numbers into scientific notation to minimize errors and facilitate quick mental math.
  • 🧠 Memorizing certain large number conversions, like 10^12 representing a trillion or a terabyte, can expedite the back-of-the-envelope math process.
  • đŸ—ƒïž Storage requirements can be estimated by calculating the volume of data generated, the proportion containing multimedia, average file sizes, replication factors, and retention periods.
  • 📚 The script emphasizes that precision is less critical than being within an order of magnitude, which is typically adequate for design validation.

Q & A

  • What is back-of-the-envelope math used for in system design?

    -Back-of-the-envelope math is used for quickly sanity-checking a design in system design, where absolute accuracy is not as important as getting within an order of magnitude or two of the actual numbers.

  • Why is absolute accuracy not crucial when using back-of-the-envelope math?

    -Absolute accuracy is not crucial because it's usually sufficient to be within an order of magnitude or two of the actual numbers to make informed decisions about system design.

  • What are the two key insights gained from the example of web service needing to handle 1M requests per second?

    -The two key insights are that a cluster of web servers with a load balancer is needed, and approximately 100 web servers would be required to handle the load.

  • Why might a single database server be sufficient to handle the load for a while without sharding or caching?

    -A single database server might be sufficient if the math shows that it only needs to handle a few queries per second at peak, indicating that it can manage the load without additional optimizations for a while.

  • What is the most useful metric to estimate when using back-of-the-envelope math for system design?

    -The most useful metric to estimate is requests per second at the service level or queries per second at the database level.

  • How is Daily Active Users (DAU) typically obtained and related to Monthly Active Users (MAU)?

    -DAU should be easy to obtain, but if only MAU is available, DAU is estimated as a percentage of MAU.

  • What is the significance of the usage per DAU estimate in system design calculations?

    -The usage per DAU estimate helps determine the percentage of active users who will interact with the service, which is crucial for calculating the load on the system.

  • Can you explain the concept of a scaling factor in the context of back-of-the-envelope math?

    -A scaling factor is used to estimate how much higher the traffic would peak compared to the average, reflecting the potential requests-per-second peak where the design could break.

  • How is the example of estimating Tweets created per second on Twitter calculated?

    -The calculation involves multiplying the number of DAU by the average tweets per DAU, applying a scaling factor for peak times, and then dividing by the number of seconds in a day.

  • What technique is suggested for simplifying calculations in back-of-the-envelope math?

    -Converting all big numbers to scientific notation is suggested, as it simplifies multiplication to addition and division to subtraction.

  • Why is it important to memorize certain conversions when performing back-of-the-envelope math?

    -Memorizing certain conversions, like 10^12 being a trillion or a terabyte, helps in quickly converting and simplifying large numbers during calculations.

  • How does the example calculate the required storage for multimedia files in tweets?

    -The calculation involves estimating the percentage of tweets containing multimedia, the average size of those files, the replication factor, the duration of storage, and then applying the appropriate mathematical operations.

  • What is the conclusion about back-of-the-envelope math in system design?

    -Back-of-the-envelope math is a very useful tool in system design; it's important not to over-index on precision, as being within an order of magnitude is usually enough to inform and validate design decisions.

Outlines

00:00

📊 Back-of-the-Envelope Math for System Design

This paragraph introduces the concept of back-of-the-envelope math, a tool used by developers for quick sanity checks in system design. It emphasizes that the goal is not absolute accuracy but rather to be within an order of magnitude. The paragraph provides examples, such as determining the number of web servers needed based on request rates, and the decision-making process regarding database load. It also outlines the importance of estimating requests per second and queries per second, and discusses the common inputs for these calculations, including Daily Active Users (DAU), usage per DAU, and scaling factors for peak traffic. An example calculation for the number of tweets created per second on Twitter is given, illustrating the process of estimation using made-up numbers.

05:04

🔱 Simplifying Calculations with Scientific Notation

The second paragraph focuses on the technique of using scientific notation to simplify large number calculations in back-of-the-envelope math. It provides a method for quickly converting large numbers and emphasizes the importance of memorizing certain conversions, such as 10^12 representing a trillion or a terabyte. The paragraph also touches on the practical approach of ignoring the exact byte count in kilobytes for the sake of simplicity in this context. It concludes with an example calculation estimating the storage required for multimedia files in tweets, considering the number of tweets per day, the percentage containing multimedia, average file sizes, replication, and retention period. The summary includes the mathematical process and the resulting storage estimates in petabytes for pictures and videos.

Mindmap

Keywords

💡Back-of-the-envelope math

Back-of-the-envelope math refers to a quick, rough estimation technique used to make approximate calculations, often on the spot, without the need for precise tools or methods. In the context of the video, it is a tool used by experienced developers to quickly sanity-check a design's feasibility. The script mentions that absolute accuracy is not necessary; rather, it's about getting within an order of magnitude or two of the actual numbers, which helps in making informed decisions about system design.

💡System design

System design is the process of planning and constructing a system or a set of components to achieve a particular purpose. In the video, system design is the overarching theme, and back-of-the-envelope math is presented as a crucial tool within this process. The script discusses how this math can be used to estimate the number of web servers needed or whether a database can handle the load, which are both critical aspects of designing a scalable and efficient system.

💡Requests per second (RPS)

Requests per second (RPS) is a metric used to measure the number of requests a server can handle in one second, which is vital for assessing the capacity and performance of web services. The video emphasizes RPS as one of the most useful numbers for estimation, particularly when considering the load a web service must handle. For example, if a web service needs to handle 1 million RPS and each server can handle 10K RPS, the math quickly indicates the need for a cluster of web servers.

💡Queries per second

Queries per second is similar to requests per second but is specific to database operations. It measures how many queries a database can process in a second, which is crucial for database performance and scalability. The script uses this concept to illustrate that if a database only needs to handle about 10 queries per second at peak, a single database server might suffice, delaying the need for more complex solutions like sharding or caching.

💡Daily Active Users (DAU)

Daily Active Users (DAU) is a metric that represents the number of unique users who have interacted with a service within a day. In the video, DAU is identified as a key input for calculating requests per second, as it helps estimate the potential load on a service. The script provides an example of estimating DAU as a percentage of Monthly Active Users (MAU), which is essential for back-of-the-envelope calculations.

💡Scaling factor

A scaling factor is used to estimate the peak load on a system compared to its average load. It accounts for fluctuations in usage throughout the day or week. In the context of the video, the scaling factor is crucial for estimating the peak requests per second that a service might experience, such as during commute hours for a service like Google Maps or on weekend nights for a ride-sharing service like Uber.

💡Scientific notation

Scientific notation is a way of expressing numbers that are too large or too small to be conveniently written in decimal form. It is typically written as the product of a number between 1 and 10 and a power of 10. In the video, scientific notation is recommended for simplifying calculations, especially when dealing with very large numbers, such as the number of daily active users or seconds in a day.

💡Estimation

Estimation in the context of the video refers to the process of approximating values to make informed decisions during system design. It is not about achieving exact numbers but getting close enough to understand the scale and requirements of the system. The script provides examples of estimating the percentage of active users who make posts or the percentage of tweets containing multimedia content.

💡Sharding

Sharding is a database architecture practice where a large database is partitioned into smaller, faster, more easily managed pieces called shards. In the video, sharding is mentioned as a potential solution for handling a database load, but it is suggested that for some scenarios, a single database server might be sufficient for a while, delaying the need for sharding.

💡Caching

Caching is the process of storing frequently accessed data in a temporary storage area called a cache to speed up future access to that data. In the video, caching is presented as another potential solution for database load management, but it is suggested that for some peak loads, a single server could handle the load for some time before caching becomes necessary.

💡Storage estimation

Storage estimation is the process of calculating the amount of storage space required for a particular purpose, such as storing multimedia files for tweets. The video provides an example of how to estimate the storage needed for pictures and videos in tweets, taking into account the number of tweets, the percentage containing multimedia, file sizes, replication, and retention period.

Highlights

Back-of-the-envelope math is a useful tool for system design, not requiring absolute accuracy but rather a rough estimate within an order of magnitude.

Experienced developers use this method to quickly sanity-check a design, understanding that precise numbers are less important than the order of magnitude.

An example provided shows how to estimate the number of web servers needed based on request rates, suggesting a need for clustering and load balancing.

Another example explains how to estimate database load and the potential need for sharding or caching based on queries per second.

Requests per second at the service level and queries per second at the database level are identified as the most useful metrics for estimation.

Daily Active Users (DAU) and Monthly Active Users (MAU) are key inputs for estimating service usage, with DAU often estimated as a percentage of MAU.

The usage rate per DAU is essential, with a suggested 10%-25% usage rate for services like Twitter.

A scaling factor is needed to account for peak traffic times, such as during commute hours for services like Google Maps.

An example calculation is provided for estimating the number of Tweets created per second on Twitter, using made-up numbers for illustration.

Techniques for simplifying calculations include converting large numbers into scientific notation to reduce errors and simplify the process.

Grouping powers of ten together and performing simple addition and subtraction instead of complex multiplication and division is recommended.

Memorizing handy conversions, such as 10^12 representing a trillion or a terabyte, is suggested for quick calculations.

An example is given for estimating storage requirements for multimedia files in tweets, considering replication and storage duration.

The example shows calculations for both pictures and videos in tweets, highlighting the difference in storage needs based on file size and popularity.

The conclusion emphasizes the importance of back-of-the-envelope math in system design, advocating for an order-of-magnitude approach over precision.

The video encourages viewers to learn more about system design through books and a weekly newsletter, offering resources for further education.

Transcripts

play00:07

Back-of-the-envelope math is a very  useful tool in our system design toolbox.

play00:12

In this video, we will go over how and when to use  it, and share some tips on using it effectively.

play00:18

Let’s dive right in.

play00:19

Experienced developers use back-of-the-envelope  math to quickly sanity-check a design.

play00:25

In these cases, absolute  accuracy is not that important.

play00:28

Usually, it is good enough to get  within an order of magnitude or two  

play00:32

of the actual numbers we are looking for.

play00:35

For example, if the math says at our scale our web  service needs to handle 1M requests per second,  

play00:42

and each web server could only handle about 10K  requests per second, we learn two things quickly:

play00:49

One, we learn that we will need to cluster of web  servers, with a load balancer in front of them.

play00:54

Two, we will need about 100 web servers.

play00:57

Another example is if the math shows that the  database needs to handle about 10 queries per  

play01:02

second at peak, it means that a single  database server could handle the load  

play01:07

for a while, and there is no need to  consider sharding or caching for a while.

play01:13

Now let’s go over some of the  most popular numbers to estimate.

play01:17

The most useful by far is requests per second  

play01:21

at the service level or queries  per second at the database level.

play01:26

Let’s go over the common inputs in  a requests-per-second calculation.

play01:30

The first input is DAU, or Daily Active Users.

play01:35

This number should be easy to obtain.

play01:37

Sometimes, the only available number  would be Monthly Active Users.  

play01:41

In that case, estimate the  DAU as a percentage of MAU.

play01:46

The second input is the estimate of the usage  per DAU of the service we are designing for.

play01:53

For example, not everyone  active on Twitter makes a post.

play01:58

Only a percentage does that.

play02:00

10%-25% seems to be reasonable.

play02:03

Again, it doesn’t have to be exact.

play02:05

Getting within an order of  magnitude is usually fine.

play02:09

The third input is a scaling factor.

play02:11

The usage rate for a service usually has  peaks and valleys throughout the day.

play02:16

We need to estimate how much higher the  traffic would peak compared to the average.

play02:22

This would reflect the estimated  requests-per-second peak where  

play02:25

the design could potentially break.

play02:28

For example, for a service like Google Maps,  

play02:30

the usage rate during commute hours  could be 5 times higher than average.

play02:36

Another example is a ride-sharing  service like Uber, where weekend  

play02:40

nights could have twice as many rides as average.

play02:44

Now, let’s go over an example.

play02:46

We will estimate the number of  Tweets created per second on Twitter.

play02:51

Note that these numbers are made up, and  they are not official numbers from Twitter.

play02:55

Let’s assume Twitter has 300 million MAU,  and 50% of the MAU use Twitter daily.

play03:02

So that’s 150 million DAU.

play03:05

Next, we estimate that about  25% of Twitter DAU make tweets.

play03:10

And each one on average makes 2 tweets.

play03:14

That is 25% * 2 = 0.5 tweets per DAU.

play03:20

For the scaling factor, we estimate  that most people tweet in the morning  

play03:24

when they get up and can’t wait to share  what they dreamed about the night before.

play03:29

And that spikes the tweet creation traffic to  twice the average when the US east coast wakes up.

play03:36

Now we have enough to calculate  the peak tweets created per second.

play03:40

We have:

play03:41

150 million DAU times 0.5 tweet per DAU, times 2x  scaling factor divided by 86,400 seconds in a day.

play03:53

That is roughly about 1,500  tweets created per second.

play03:57

Let’s go over the techniques  to simplify the calculations.

play04:01

First, we convert all big  numbers to scientific notation.

play04:06

Doing the math on really big  numbers is very error-prone.

play04:09

By converting big numbers to scientific notation,  

play04:12

part of the multiplication becomes simple  addition, and division becomes subtraction.

play04:19

In the example above, 150 million  DAU becomes 150 times 10 to the sixth  

play04:25

or 1.5 times 10 to the eighth.

play04:28

There are 86,400 seconds in a day,  

play04:31

we round it up to 100,000 seconds, and  that becomes 10 to the fifth seconds.

play04:37

And since it’s a division, 10 to the  fifth 5 becomes 10 to the minus fifth.

play04:43

Next, we group all the power of tens together,  and then all the other numbers together.

play04:48

So the math becomes:

play04:50

1.5 times 0.5 times 2

play04:54

And

play04:55

10^8 * 10 ^(-5) = 10^(8-5) = 10^3

play05:04

Putting it all together, it’s 1.5x10^3, or 1,500.

play05:10

Now with practice, we should be able to convert  a large number to scientific notation in seconds.

play05:16

And here are some handy  conversions we should memorize:

play05:19

As an example, we should know by heart that  10^12 is a trillion or a TB, and when we see a  

play05:27

number like 50TB, we should be able to convert  it quickly to 5x10^1x10^12, which is 5x10^13.

play05:36

We are going to ignore the fact  that 1KB is actually 2^10 bytes,  

play05:40

or 1,024 bytes, and not a thousand bytes.

play05:44

We don’t need that degree of accuracy  for back-of-the-envelope math.

play05:49

Let’s wrap up by going through one last example.

play05:52

We’ll estimate how much storage is required  for storing multimedia files for tweets.

play05:58

We know from the previous example that  there are about 150M tweets per day.

play06:03

Now we need an estimate on a percentage of  tweets that could contain multimedia content,  

play06:09

and how large those files are on average.

play06:13

With our meticulous research, we estimate that  10% of tweets contain pictures, and they are about  

play06:20

100KB each, and 1% of all the tweets  contain videos, and they are 100MB each.

play06:27

We further assume that the files  are replicated, with 3 copies each,  

play06:31

and that Twitter would keep the media for 5 years.

play06:35

Now here is the math.

play06:37

For storing pictures, we have the following:

play06:40

150M tweets x 0.1 in pictures x 100KB per  picture x 400 days in a year x 5 years * 3 copies

play06:52

So, that turns into:

play06:54

1.510^8 x 10^(-1) x 10^5 x 4x10^2 x 5 x 3

play07:06

Again, we group the powers of tens together.

play07:08

This becomes:

play07:09

1.5 times 4 times 5 times 3,

play07:13

which is 90

play07:15

and 10 to the (8-1+5+2), which is 10^14

play07:22

And that becomes 9x10^15,  which is, from the table, 9 PB.

play07:29

For storing videos, we take yet another shortcut.

play07:32

Since videos on average are 100MB  each while pictures are 100KB,  

play07:38

a video is 1000 times bigger  than a picture on average.

play07:43

Second, only 1% of tweets contain a video,  while pictures appear in 10% of all the tweets.

play07:49

So videos are one-tenth as popular.

play07:52

Putting the math together, the total  video storage is 1000 x 1/10 of picture  

play07:59

storage, which is 100 x 9PB, or 900 PB.

play08:06

In conclusion, back-of-the-envelope math is a  very useful tool in our system design toolbox.

play08:12

Don’t over-index on precision.

play08:14

Getting within an order of magnitude is usually  enough to inform and validate our design.

play08:20

If you would like to learn more about system  design, check out our books and weekly newsletter.

play08:25

Please subscribe if you learned something new.

play08:27

Thank you so much, and we’ll see you next time.

Rate This
★
★
★
★
★

5.0 / 5 (0 votes)

Étiquettes Connexes
System DesignEstimation TechniquesPerformance ScalingWeb ServiceDatabase QueriesDAU MetricsUsage PatternsScaling FactorData StorageTech Tutorial
Besoin d'un résumé en anglais ?