Why Computers Can't Count Sometimes

Tom Scott
12 Nov 201808:44

Summary

TLDRThe video explains why view and subscriber counts can be inconsistent on sites like YouTube and Twitter. It comes down to race conditions when requests collide, eventual consistency where updates are batched, and caching where different servers supply slightly outdated information. Calculating accurately at scale is difficult. So while the final counts will be correct, there can be lag and fluctuations before the dust settles.

Q & A

  • Why can view counts and subscriber counts sometimes fluctuate or be inaccurate on sites like YouTube?

    -This can happen due to race conditions when requests collide, as well as caching, where different caches store slightly different view counts that get served to users randomly. Overall it takes time for the eventual consistency to resolve across the distributed system.

  • What is a race condition and how does it cause issues with accuracy?

    -A race condition happens when multiple threads try to read, update and write data at the same time, and the final result depends on the order those operations occur in. This can result in lost updates.

  • How does queueing requests help accuracy?

    -Putting all requests in a queue and processing them one by one ensures updates cannot collide. However, this does not scale well to sites with enormous traffic volumes.

  • What is eventual consistency?

    -Eventual consistency is where distributed databases sync up less frequently to reduce load, allowing counts to be slightly wrong temporarily before resolving. This scales better but accuracy lags.

  • What is caching and how does it help sites handle traffic?

    -Caching is keeping commonly requested data in fast memory to avoid hitting the main database every time. This improves performance dramatically but can cause inconsistent results.

  • Why can buying something like concert tickets require stronger consistency?

    -For buying tickets, it would be very problematic if race conditions allowed the same seat to be sold twice. So stronger consistency guarantees are needed, even if it reduces scalability.

  • Why is scaling database systems to handle massive traffic difficult?

    -Every request takes time to process. At large scale the volume can overwhelm a single database. Solutions like caching and eventual consistency help but reduce accuracy.

  • How could the issues explained impact something like tracking ad revenue?

    -Inaccurate counts could lead to incorrect ad revenue tracking. However important metrics usually use stronger consistency. Caching and lagging mostly impacts secondary data like page views.

  • What are some ways sites balance consistency and scalability?

    -They prioritize stronger guarantees for important things like financial data, while allowing lags and inaccuracies temporarily for secondary metrics. Different tradeoffs per metric.

  • Why is accurate counting harder than it may appear at first glance?

    -Even simple incrementing relies on separate read, calculate and write steps. At scale these collide across threads and servers, with caching adding further complexity on large systems.

Keywords

💡race condition

A race condition occurs when two or more processes or threads access a shared resource in an unpredictable order. This can lead to inconsistent outputs or data corruption. The video explains how race conditions can happen when multiple requests try to update a database count at the same time, causing some updates to get lost.

💡eventual consistency

Eventual consistency is a consistency model used in distributed computing where replicas of a database are allowed to temporarily have different data. The data will eventually become consistent once it has been replicated across all database copies. The video explains how YouTube uses eventual consistency to update view counts without overloading its central databases.

💡caching

Caching refers to storing data in a temporary storage location that allows for fast retrieval. The video explains how Twitter and other sites use caching servers to quickly serve commonly requested data instead of hitting backend databases every time.

💡scaling

Scaling refers to the ability of a system to handle increasing loads. The video notes that scaling systems to deal with large traffic volumes introduces challenges like race conditions and consistency issues.

💡queue

A queue is a linear data structure where elements are inserted at one end (the back) and removed from the other end (the front). The video mentions how queues can be used to serialize database updates and prevent race conditions, but don't scale well.

💡thread

A thread is a sequence of instructions that can be executed concurrently with other threads. Multi-threaded systems introduce challenges like race conditions because thread execution order can vary.

💡latency

Latency refers to delays in processing data or transactions. The video explains why view counts can be inconsistent - updates are latent as data gets replicated across databases.

💡bottleneck

A bottleneck is a point in a system that limits overall throughput. The video implies that central databases can become bottlenecks if directly hit with large traffic volumes, motivating caching and queues.

💡vertical scaling

Vertical scaling involves increasing the processing power of individual servers. The video notes how additional cache servers can be spun up to handle increased loads.

💡consistency

Consistency refers to all replicas in a distributed system reflecting the same data. The video contrasts strong consistency models with eventual consistency.

Highlights

Counting things accurately is really difficult at scale due to race conditions, caching, and eventual consistency.

Race conditions happen when code tries to do multiple things at once and the order affects the result in an uncontrolled way.

Putting all requests in a queue and processing them one by one avoids race conditions but does not scale up.

Eventual consistency bundles updates over time rather than reporting every single change right away to relieve pressure on the system.

YouTube view counts may lag because updates are bundled and sent less frequently from local servers to the central database.

Buying tickets uses a queue system to avoid selling the same ticket to multiple people, ensuring 100% consistency.

Caching stores commonly requested information in fast memory to avoid hitting slower databases for every single request.

Different caches pull data at different times, so requests routed randomly between them show inconsistent results before eventual consistency.

Adding cache servers easily handles sudden traffic spikes without worrying the central database.

Old single-threaded computers execute instructions one by one without accepting new inputs until finished.

Modern systems break large tasks like databases into distributed pieces that eventually reconcile.

Core databases use multiple computers to update each other for redundancy and scale.

Accuracy is sacrificed for availability and partition tolerance in large eventually consistent systems.

Updates like video privacy settings require immediate total consistency so pause other operations.

Given enough time, eventually consistent systems converge on accurate counts despite short-term noise.

Transcripts

play00:00

This is a brilliant tweet.

play00:03

But I don't want you to pay attention to the tweet.

play00:05

It's good, sure, but I want you to watch the numbers that are underneath it.

play00:07

That's a screen recording,

play00:09

and the numbers are going up and down, all over the place.

play00:13

They should steadily rise, but they don’t.

play00:15

There aren't that many people tapping ‘like’ by mistake and then taking it back.

play00:20

So why can't Twitter just count?

play00:26

You'll see examples like this all over the place.

play00:28

On YouTube, subscriber and view counts sometimes rise and drop seemingly at random,

play00:32

or they change depending on which device you're checking on.

play00:36

Computers should be good at counting, right?

play00:39

They're basically just overgrown calculators.

play00:41

This video that you're watching,

play00:43

whether it's on a tiny little phone screen or on a massive desktop display,

play00:46

it is all just the result of huge amounts of math that turns

play00:50

a compressed stream of binary numbers into amounts of electricity

play00:54

that get sent to either a grid of coloured pixels or a speaker,

play00:58

all in perfect time.

play01:00

Just counting should be easy.

play01:02

But sometimes it seems to fall apart.

play01:05

And that's usually when there's a big, complicated system

play01:07

with lots of inputs and outputs,

play01:09

when something has to be done at scale.

play01:12

Scaling makes things difficult. And to explain why,

play01:15

we have to talk about race conditions, caching, and eventual consistency.

play01:20

All the code that I've talked about in The Basics so far has been single-threaded,

play01:24

because, well, we’re talking about the basics.

play01:27

Single-threaded means that it looks like a set of instructions

play01:29

that the computer steps through one after the other.

play01:32

It starts at the top, it works its way through, ignoring everything else,

play01:36

and at the end it has Done A Thing.

play01:39

Which is fine, as long as that's the only thread,

play01:41

the only thing that the computer's doing,

play01:43

and that it's the only computer doing it.

play01:45

Fine for old machines like this,

play01:47

but for complicated, modern systems, that’s never going to be the case.

play01:51

Most web sites are, at their heart, just a fancy front end to a database.

play01:55

YouTube is a database of videos and comments.

play01:57

Twitter is a database of small messages.

play01:58

Your phone company's billing site is a database of customers and bank accounts.

play02:04

But the trouble is that a single computer holding a single database can only deal with

play02:08

so much input at once.

play02:10

Receiving a request, understanding it, making the change, and sending the response back:

play02:15

all of those take time,

play02:17

so there are only so many requests that can fit in each second.

play02:21

And if you try and handle multiple requests at once,

play02:24

there are subtle problems that can show up.

play02:27

Let's say that YouTube wants to count one view of a video.

play02:30

It just has the job of adding one to the view count.

play02:33

Which seems really simple, but it's actually three separate smaller jobs.

play02:37

You have to read the view count,

play02:39

you have to add one to it,

play02:40

and then you have to write that view count back into the database.

play02:43

If two requests come along very close to each other,

play02:45

and they’re assigned to separate threads,

play02:47

it is entirely possible that the second thread

play02:52

could read the view count

play02:53

while the first thread is still doing its calculation.

play02:57

And yeah, that's a really simple calculation, it's just adding one,

play03:02

but it still takes a few ticks of a processor.

play03:04

So both of those write processes would put the same number back into the database,

play03:09

and we've missed a view.

play03:11

On popular videos, there'll be collisions like that all the time.

play03:15

Worst case, you've got ten or a hundred of those requests all coming in at once,

play03:18

and one gets stuck for a while for some reason.

play03:21

It'll still add just one to the original number that it read,

play03:24

and then, much later,

play03:25

it'll finally write its result back into the database.

play03:27

And we've lost any number of views.

play03:30

In early databases, having updates that collided like that could corrupt the entire system,

play03:34

but these days things will generally at least keep working,

play03:37

even if they're not quite accurate.

play03:40

And given that YouTube has to work out not just views,

play03:42

but ad revenue and money,

play03:44

it has got to be accurate.

play03:46

Anyway, that’s a basic race condition:

play03:49

when the code’s trying to do two or more things at once,

play03:52

and the result changes depending on the order they occur in,

play03:56

an order that you cannot control.

play03:59

One solution is to put all the requests in a queue,

play04:01

and refuse to answer any requests until the previous one is completed.

play04:06

That's how that single-threaded, single-computer programming works.

play04:08

It's how these old machines work.

play04:10

Until the code finishes its task and says "okay, I'm ready for more now",

play04:14

it just doesn't accept anything else.

play04:17

Fine for simple stuff, does not scale up.

play04:19

A million-strong queue to watch a YouTube video doesn't sound like a great user experience.

play04:24

But that still happens somewhere, for things like buying tickets to a show,

play04:28

where it'd be an extremely bad idea to accidentally sell the same seat to two people.

play04:34

Those databases have to be 100% consistent, so for big shows,

play04:37

ticket sites will sometimes start a queue,

play04:39

and limit the number of people accessing the booking site at once.

play04:43

If you absolutely must count everything accurately, in real time, that’s the best approach.

play04:49

But for sites dealing with Big Data, like YouTube and Twitter,

play04:52

there is a different solution called eventual consistency.

play04:56

They have lots of servers all over the world,

play04:58

and rather than reporting every view or every retweet right away,

play05:03

each individual server will keep its own count,

play05:05

bundle up all the viewcounts and statistics that it’s dealing with,

play05:08

and just it will just update the central system when there's time to do so.

play05:12

Updates doesn't have to be hours apart,

play05:14

they can just be minutes or even just seconds,

play05:16

but having a few bundled updates that can be queued and dealt with individually

play05:21

is a lot easier on the central system

play05:23

than having millions of requests all being shouted at once.

play05:26

Actually, for something on YouTube’s scale,

play05:28

that central database won't just be one computer:

play05:31

it'll be several, and they'll all be keeping each other up to date,

play05:34

but that is a mess we really don't want to get into right now.

play05:39

Eventual consistency isn't right for everything.

play05:42

On YouTube, if you're updating something like the privacy settings of a video,

play05:46

it's important that it's updated immediately everywhere.

play05:50

But compared to views, likes and comments, that's a really rare thing to happen,

play05:54

so it's OK to stop everything, put everything else on hold,

play05:57

spend some time sorting out that important change, and come back later.

play06:02

But views and comments, they can wait for a little while.

play06:04

Just tell the servers around the world to write them down somewhere, keep a log,

play06:07

then every few seconds, or minutes, or maybe even hours for some places,

play06:12

those systems can run through their logs,

play06:13

do the calculations and update the central system once everyone has time.

play06:19

All that explains why viewcounts and subscriber counts lag sometimes on YouTube,

play06:23

why it can take a while to get the numbers sorted out in the end,

play06:26

but it doesn't explain the up-and-down numbers you saw at the start in that tweet.

play06:32

That's down to another thing: caching.

play06:34

It's not just writing into the database that's bundled up. Reading is too.

play06:38

If you have thousands of people requesting the same thing,

play06:41

it really doesn't make sense to have them all hit the central system

play06:44

and have it do the calculations every single time.

play06:46

So if Twitter are getting 10,000 requests a second for information on that one tweet,

play06:52

which is actually a pretty reasonable amount for them,

play06:54

it'd be ridiculous for the central database to look up all the details and do the numbers every time.

play06:59

So the requests are actually going to a cache,

play07:02

one of thousands, or maybe tens of thousands of caches

play07:04

sitting between the end users and the central system.

play07:07

Each cache looks up the details in the central system once,

play07:10

and then it keeps the details in its memory.

play07:13

For Twitter, each cache might only keep them for a few seconds,

play07:16

so it feels live but isn't actually.

play07:18

But it means only a tiny fraction of that huge amount of traffic

play07:22

actually has to bother the central database:

play07:24

the rest comes straight out of memory on a system that is built

play07:27

just for serving those requests,

play07:29

which is orders of magnitude faster.

play07:31

And if there's a sudden spike in traffic,

play07:33

Twitter can just spin up some more cache servers,

play07:35

put them into the pool that's answering everyone's requests,

play07:37

and it all just keeps working without any worry for the database.

play07:42

But each of those caches will pull that information at a slightly different time,

play07:46

all out of sync with each other.

play07:47

When your request comes in, it's routed to any of those available caches,

play07:51

and crucially it is not going to be the same one every time.

play07:54

They've all got slightly different answers,

play07:56

and each time you're asking a different one.

play08:00

Eventual consistency means that everything will be sorted out at some point.

play08:05

We won't lose any data, but it might take a while before it's all in place.

play08:10

Sooner or later the flood of retweets will stop, or your viewcount will settle down,

play08:14

and once the dust has settled everything can finally get counted up.

play08:17

But until then: give YouTube and Twitter a little leeway.

play08:22

Counting things accurately is really difficult.

play08:26

Thank you very much to the Centre for Computing History here in Cambridge,

play08:29

who've let me film with all this wonderful old equipment,

play08:31

and to all my proofreading team who made sure my script's right.