DynamoDB: Under the hood, managing throughput, advanced design patterns | Jason Hunter | AWS Events

AWS Events
26 Aug 202250:59

Summary

TLDRThis video script delves into the inner workings of Amazon DynamoDB, exploring its partitioning scheme, data replication across availability zones, and the mechanics of read and write operations. It discusses consistency models, Global Secondary Indexes, and throughput capacity, offering practical use cases and design patterns for optimizing NoSQL database performance. The talk also covers advanced features like DAX, transactions, parallel scans, and the new Standard-Infrequent Access table class for cost-effective data storage.

Takeaways

  • 🔍 **Deep Dive into DynamoDB**: The talk focuses on understanding the inner workings of Amazon DynamoDB, exploring how it delivers value and operates under the hood.
  • 📦 **Data Partitioning**: DynamoDB uses a partitioning system where data is distributed across multiple partitions based on the hash of the partition key, ensuring efficient data retrieval and storage.
  • 🔑 **Consistent Hashing**: The hashing of partition keys determines the allocation of items to specific partitions, which is crucial for data distribution and load balancing.
  • 💾 **Physical Storage**: Behind the logical partitions, DynamoDB replicates data across multiple servers and availability zones to ensure high availability and durability.
  • 🌐 **DynamoDB's Backend Infrastructure**: The service utilizes a large number of request routers and storage nodes that communicate via heartbeats to maintain data consistency and handle leader election.
  • 🔍 **GetItem Operations**: Retrieving items can be done through either strong consistency, which always goes to the leader node, or eventual consistency, which can use any available node.
  • 🔄 **Global Secondary Indexes (GSIs)**: GSIs are implemented as separate tables that require their own provisioned capacity and can be used to efficiently query data based on non-primary key attributes.
  • ⚡ **Performance Considerations**: The design of the table schema, such as the use of sort keys and GSIs, can significantly impact performance and cost, especially for large-scale databases.
  • 💡 **Optimizing Data Access**: The script suggests using hierarchical sort keys and sparse GSIs to optimize access patterns and reduce costs, catering to different query requirements.
  • 🛠️ **Handling High Traffic**: For extremely high-traffic scenarios, such as during Amazon Prime Day, DynamoDB Accelerator (DAX) can be used to cache and serve popular items quickly.
  • 🔄 **Auto Scaling and Partition Splitting**: DynamoDB automatically scales and splits partitions to handle increased traffic and data size, ensuring consistent performance without manual intervention.

Q & A

  • What is the main focus of the second part of the DynamoDB talk?

    -The main focus of the second part of the DynamoDB talk is to explore how DynamoDB operates under the hood, providing insights into the database's internal workings and explaining the value delivered in the first part.

  • How does DynamoDB handle partitioning of data?

    -DynamoDB uses a partition key to hash data items and distribute them across different physical partitions. Each partition is responsible for a section of the key space and can store an arbitrary number of items.

  • What is the purpose of hashing in DynamoDB's partitioning scheme?

    -Hashing is used to determine the partition in which a data item should be stored. It takes the partition key, runs it through a mathematical process, and produces a fixed-length output that falls within a specific partition's range.

  • How does DynamoDB ensure high availability and durability of data?

    -DynamoDB replicates each partition across multiple availability zones and hosts. This replication ensures that if an issue occurs with an availability zone or a host, there are still working copies of the data, making DynamoDB resilient to failures.

  • What is the difference between strong consistency and eventual consistency in DynamoDB reads?

    -Strong consistency ensures that the most recent data is always read by directing the request to the leader node. Eventual consistency, on the other hand, can direct the request to any of the nodes, which may result in slightly stale data but at a lower cost.

  • How are Global Secondary Indexes (GSIs) implemented in DynamoDB?

    -GSIs are implemented as a separate table that is automatically maintained by DynamoDB. A log propagator moves data from the base table to the GSI, and GSIs have their own provisioned capacity or can be in on-demand mode.

  • What are Read Capacity Units (RCUs) and Write Capacity Units (WCUs) in DynamoDB?

    -RCUs and WCUs are the units of measure for the throughput of DynamoDB tables. RCUs represent the capacity to read data, while WCUs represent the capacity to write data. They can be provisioned or on-demand, with different pricing and performance implications.

  • What is the maximum size of an item that can be stored in a DynamoDB table?

    -The maximum size of an item in a DynamoDB table is 400 KB. If more data needs to be stored, it should be split into multiple items.

  • How does DynamoDB handle scaling and partition splitting?

    -DynamoDB scales by partitioning. Each physical partition supports up to 1,000 WCUs or 3,000 RCUs per second. If a partition's size grows beyond 10 gigabytes or experiences high traffic, it will automatically split to maintain performance.

  • What is the role of the burst bucket in DynamoDB's provisioned capacity mode?

    -The burst bucket in DynamoDB's provisioned capacity mode allows for temporary spikes in traffic above the provisioned capacity by storing unused capacity tokens from previous time intervals, enabling higher throughput for short durations without incurring additional costs.

  • What is the significance of having a good dispersion of partition keys in DynamoDB?

    -A good dispersion of partition keys ensures that the read and write load is evenly distributed across different storage nodes, preventing any single node from becoming a bottleneck and maintaining overall performance and scalability.

  • How can DynamoDB's auto scaling feature be beneficial for handling varying workloads?

    -Auto scaling in DynamoDB adjusts the provisioned capacity up and down based on actual traffic, within specified minimum and maximum limits. This ensures that the performance is maintained without the need for manual intervention and can handle varying workloads efficiently.

  • What is the purpose of using a hierarchical sort key in DynamoDB?

    -A hierarchical sort key in DynamoDB allows for querying data at different granularities. It enables efficient retrieval of related items, such as all offices in a specific country or city, by structuring the sort key to reflect the hierarchy of the data.

  • Can you provide an example of how to optimize data storage for a shopping cart in DynamoDB?

    -Instead of storing the entire shopping cart as a single large item, it's more optimal to store each attribute, such as cart items, address, and order history, as separate items within the same partition key. This approach allows for more efficient retrieval, update, and cost management.

  • What is a sparse Global Secondary Index (GSI) in DynamoDB?

    -A sparse GSI in DynamoDB is a design pattern where the GSI is not populated with every item from the base table. It's used for specific access patterns where only a subset of items are relevant, reducing the cost and storage overhead of the GSI.

  • How can DynamoDB Streams be utilized to update real-time aggregations?

    -DynamoDB Streams can trigger a Lambda function upon a mutation event, such as an insert or update. The Lambda can then perform actions like incrementing a count attribute, providing real-time aggregations that can be queried efficiently.

  • What is the significance of the Time-To-Live (TTL) feature in DynamoDB?

    -The TTL feature in DynamoDB allows items to be automatically deleted after a specified duration, without incurring any write capacity unit (WCU) charges. This is useful for data that is meant to be temporary, such as session data, reducing storage costs for expired data.

  • What is the DynamoDB Standard-Infrequent Access (Standard-IA) table class, and how does it differ from the standard table class?

    -The Standard-IA table class is a cost-effective option for storing data that is infrequently accessed. It offers 60% lower storage costs compared to the standard table class but with a 25% increase in retrieval costs. There is no performance trade-off, making it a suitable choice for large tables where storage cost reduction is beneficial.

  • How can the new DynamoDB feature introduced in November 2021 help in managing costs for large tables?

    -The introduction of the Standard-Infrequent Access (Standard-IA) table class in November 2021 allows for significant cost savings on storage for large tables that are infrequently accessed. It provides a 60% reduction in storage costs while maintaining the same performance levels as the standard table class.

Outlines

00:00

🚀 Introduction to DynamoDB's Internals

The speaker welcomes the audience to the second part of the DynamoDB discussion, expressing enthusiasm for exploring the database's operations. The session aims to clarify how the value discussed in the first part is delivered. The talk will include a deep dive into DynamoDB's functioning, addressing partitioning, hashing of partition keys, and data storage in physical partitions. It will also cover server replication for data resilience, request routing, and the application of learned techniques to solve complex problems presented as puzzlers.

05:02

🔑 Understanding DynamoDB's Partitioning and Replication

This paragraph delves into the mechanics of DynamoDB's partitioning system, explaining how data is distributed across different partitions based on hashed partition keys. It details the replication process across multiple availability zones for fault tolerance and data integrity. The explanation includes how write and read operations are handled, the concept of leaders and followers in data replication, and the use of the Paxos algorithm for leader election. The paragraph also discusses different read consistency models—strong consistency and eventual consistency—and their impact on performance and cost.

10:03

📚 Deep Dive into Storage Nodes and Request Handling

The speaker provides an in-depth look at the backend processes of DynamoDB, focusing on storage nodes and how they communicate through heartbeats to maintain health checks and leader election. The paragraph explains the write process to the leader node and the subsequent propagation to follower nodes. It also covers the retrieval of items through the load balancer and request router, highlighting the distinction between strong and eventual consistency reads. The implementation of Global Secondary Indexes (GSIs) as separate tables with their own capacity is also discussed, along with the impact of GSIs on throughput during data operations.

15:05

🔄 Throughput Management and Auto-Scaling in DynamoDB

This section discusses the management of throughput in DynamoDB, explaining the difference between on-demand and provisioned capacity modes. It describes how Read Capacity Units (RCUs) and Write Capacity Units (WCUs) function in both modes, and the importance of a good partition key dispersion to avoid throttling. The paragraph also covers the auto-scaling feature, which adjusts capacity based on live traffic, and the use of burst buckets to handle temporary traffic spikes without exceeding provisioned capacity.

20:06

🌐 Auto-Administration and Partition Splitting in DynamoDB

The speaker explains the auto-administration feature of DynamoDB, which automatically splits partitions when they receive excessive traffic or grow beyond a certain size. This process is designed to maintain performance and is carried out without user intervention. The paragraph also introduces contributor insights, a tool that helps monitor hot keys and throttled items, providing visibility into the database's partition access patterns and load distribution.

25:07

🛍️ Optimizing Data Storage for a Shopping Cart Use Case

The paragraph presents a real-life example of a shopping cart stored in DynamoDB, discussing the limitations of storing a shopping cart as a single data item due to the 400 KB size limit and the inability to perform index-driven retrievals on nested attributes. It suggests an optimized approach where different attributes of a user are stored as separate items within the same collection, each with a unique sort key. This method improves performance, reduces costs, and allows for more granular updates and retrievals.

30:11

🔍 Efficient Data Retrieval Using Indexes and GSIs

The speaker discusses the importance of using indexes and Global Secondary Indexes (GSIs) for efficient data retrieval. The paragraph explains how to optimize queries by using the sort key and how GSIs can be used to retrieve data based on different attributes. It also covers the concept of a sparse GSI, which is useful for specific access patterns where the attribute may not always be present, and the benefits of hierarchical sort keys for querying different granularities of data.

35:14

🎉 Handling High Traffic Scenarios with Partitioning and DAX

This section addresses strategies for handling high traffic and large-scale data scenarios in DynamoDB. It introduces the concept of partitioning to distribute write load across multiple partitions and the use of DynamoDB Accelerator (DAX) for managing read-heavy loads with in-memory caching. The paragraph also discusses the use of multiple GSIs to handle large volumes of events and the importance of designing data models that avoid hot partition keys.

40:15

🔐 Transactions, Parallel Scans, and Stream Processing

The paragraph covers advanced features of DynamoDB, including transactions that ensure atomic operations across multiple items, parallel scans that allow for efficient and rapid scanning of the table using multiple threads, and stream processing that enables real-time monitoring of table changes. It also discusses the use of Lambda functions to perform actions based on stream events, such as updating aggregations or triggering notifications.

45:16

⏱️ Time-To-Live and Cost-Effective Data Management

The speaker introduces the Time-To-Live (TTL) feature, which allows for the automatic deletion of expired items in the background without incurring write costs. This feature is beneficial for data that has a natural expiration, such as session data. The paragraph also highlights the cost savings achieved by using TTL, especially for large tables where delete operations were a significant portion of the workload.

50:17

🌐 Introducing DynamoDB Standard-Infrequent Access

The final paragraph introduces a new feature, DynamoDB Standard-Infrequent Access (Standard-IA), designed to reduce storage costs for data that is infrequently accessed. It explains the cost benefits of Standard-IA, which offers 60% lower storage costs compared to the standard table class, without compromising performance or availability. The speaker also suggests strategies for determining whether Standard-IA is suitable for a particular use case, such as analyzing cost structures and considering table partitioning based on access patterns.

Mindmap

Keywords

💡DynamoDB

DynamoDB is a fully managed NoSQL database service provided by Amazon Web Services (AWS). It supports key-value and document data structures and is known for its scalability, performance, and low latency. In the video, the speaker discusses the inner workings of DynamoDB, explaining how it operates under the hood and its use cases, which is central to the video's theme of understanding NoSQL database management.

💡Partition Key

A partition key in DynamoDB is a unique identifier for data storage within a table. It is used to distribute data across different partitions for efficient storage and retrieval. In the script, the speaker uses the concept of a partition key to explain how data is organized and accessed within DynamoDB, such as when they discuss the hashing of OrderId to determine the partition it belongs to.

💡Hash Function

A hash function is a mathematical process that converts an input (or 'message') into a fixed-size string of characters, which is typically used for indexing data in a database. In the context of the video, the hash function is used to determine the partition in which a particular item should be stored in DynamoDB, ensuring an even distribution of data across the database.

💡Global Secondary Indexes (GSIs)

Global Secondary Indexes in DynamoDB are secondary indexes that allow for fast query access to data based on alternate keys. They are implemented as separate tables that are automatically maintained by DynamoDB. The script discusses GSIs as a way to enhance data retrieval and how they have their own provisioned capacity, which is crucial for optimizing performance and cost.

💡Provisioned Capacity

Provisioned capacity in DynamoDB refers to the amount of throughput capacity that a user specifies for their table or index. It is measured in Read Capacity Units (RCUs) and Write Capacity Units (WCUs). The speaker explains the concept of provisioned capacity and how it differs from on-demand capacity, emphasizing the importance of choosing the right mode based on traffic patterns.

💡On-Demand Capacity

On-Demand Capacity is a feature in DynamoDB that allows the database to automatically adjust its capacity based on the actual traffic patterns, without the need for the user to manage capacity units. The script mentions on-demand capacity as an alternative to provisioned capacity, suitable for unpredictable workloads where traffic can spike or drop significantly.

💡Read Capacity Units (RCUs)

Read Capacity Units in DynamoDB represent the amount of throughput for reading data from a table. One RCU is the capacity equivalent to one strongly consistent read per second for items up to 4 KB in size. The video script discusses RCUs in the context of on-demand and provisioned modes, highlighting the cost and performance implications of read operations.

💡Write Capacity Units (WCUs)

Write Capacity Units in DynamoDB are the throughput capacity dedicated to writing data to a table. Similar to RCUs, one WCU allows for one write per second for items up to 1 KB in size. The script explains how WCUs are crucial for understanding the cost associated with write operations and the importance of managing them for performance optimization.

💡Auto Scaling

Auto Scaling in the context of DynamoDB is a feature that automatically adjusts the provisioned throughput of a table based on the specified utilization rate. The speaker recommends using auto-scaling with provisioned capacity to handle varying traffic loads efficiently, ensuring that the table can scale up and down according to demand without manual intervention.

💡Strong Consistency

Strong consistency in DynamoDB is a read consistency model where every read operation receives the most recent written data. The video script contrasts strong consistency with eventual consistency, explaining that while strong consistency ensures up-to-date data, it may come at a higher cost in terms of read capacity units.

💡Eventual Consistency

Eventual consistency is a read consistency model in DynamoDB where the data returned from a read operation might not be the most recent, but it will 'eventually' become consistent. The script mentions that reads with eventual consistency are less costly than strongly consistent reads, trading off the most up-to-date data for cost savings.

💡DynamoDB Accelerator (DAX)

DynamoDB Accelerator (DAX) is an in-memory cache for DynamoDB that provides fast read performance for frequently accessed items. The script discusses DAX as a solution for handling high-traffic read scenarios, such as during events like Amazon Prime Day, where it can significantly offload traffic from the main DynamoDB tables.

💡Transactions

Transactions in DynamoDB allow multiple actions, such as puts, updates, and deletes, to be executed as a single atomic operation across one or more items. The script explains the use of TransactWriteItems to perform transactions, emphasizing their importance when operations need to be completed in an all-or-nothing manner.

💡Parallel Scan

A parallel scan in DynamoDB is a method of scanning a table using multiple threads or segments to improve performance and speed. The video script describes how a parallel scan works by dividing the table into segments and processing each segment in parallel, which is useful for quickly retrieving large amounts of data.

💡Time-To-Live (TTL)

Time-To-Live is a feature in DynamoDB that enables items to expire and be automatically deleted from the database after a specified duration. The script mentions TTL as a way to manage the lifecycle of data, such as session information, without incurring write capacity unit charges for deletions.

💡Standard-Infrequent Access (Standard-IA)

Standard-Infrequent Access is a table class in DynamoDB introduced to reduce storage costs for data that is infrequently accessed. The script explains that this feature offers a 60% reduction in storage costs compared to the standard table class, making it an attractive option for archiving old data while keeping it readily accessible in DynamoDB.

Highlights

Introduction to the second part of a DynamoDB deep dive, focusing on its internal operations and practical use case scenarios.

Explanation of how DynamoDB uses partition keys and hash functions to distribute data across physical partitions for efficient storage and retrieval.

The concept of data replication across multiple availability zones in DynamoDB for high availability and fault tolerance.

DynamoDB's architecture involving load balancers, request routers, and storage nodes, and how they interact to handle data requests.

The difference between strong consistency and eventual consistency in DynamoDB reads, and their impact on performance and cost.

How Global Secondary Indexes (GSIs) are implemented as separate tables with their own provisioned capacity in DynamoDB.

The distinction between provisioned and on-demand capacity modes in DynamoDB, and their implications for performance scaling and cost management.

The importance of partition key design for achieving even data distribution and avoiding hotspots in DynamoDB.

Auto-scaling feature in DynamoDB that adjusts provisioned capacity based on real-time traffic without manual intervention.

DynamoDB's burst bucket mechanism that allows temporary spikes in traffic above the provisioned capacity by reusing unused capacity.

The use of transactions in DynamoDB to ensure atomicity across multiple items, with the consideration of increased write costs.

Optimization strategies for handling high-traffic scenarios, such as distributing writes across multiple partitions to avoid throttling.

The introduction of DynamoDB Accelerator (DAX) as an in-memory cache to offload traffic for read-heavy workloads and improve performance.

Practical examples of DynamoDB use cases, including shopping cart storage optimization and device log management.

Design patterns for efficient querying in DynamoDB, such as using hierarchical sort keys and sparse Global Secondary Indexes.

Advanced topics like DynamoDB Streams for real-time event watching and triggering actions, and Time-To-Live (TTL) feature for automatic data expiration.

Announcement and explanation of the new DynamoDB Standard-Infrequent Access (IA) table class for cost-effective storage of less frequently accessed data.

Recommendations for choosing between Standard and Standard-IA table classes based on storage and throughput costs analysis.

Transcripts

play00:00

(bright music)

play00:09

- Hello, and welcome to part two of our DynamoDB talk.

play00:12

Thanks for joining me back.

play00:14

Hopefully, you either liked the first part,

play00:15

and wanna join in, or you're an expert,

play00:17

who already knew everything in the first part

play00:19

and wanna dig a little deeper with me.

play00:21

In this talk, we're going to look under the hood

play00:24

of how DynamoDB actually operates,

play00:26

and get a sense of when I explain the value

play00:28

in the first part, how is that value actually delivered?

play00:31

So this is my favorite kind of stuff,

play00:33

how does a database really work underneath,

play00:35

so hope you enjoy the ride with me.

play00:37

We'll end with some kind of puzzlers.

play00:39

Here's a use case.

play00:41

What would you do about that use case?

play00:43

We get to apply what we learned,

play00:44

and learn some new techniques

play00:46

for being able to handle problems

play00:48

that maybe aren't obvious at the beginning

play00:50

what the solution is.

play00:52

All right, let's dig under the hood.

play00:54

In this case, we have three items

play00:57

that we need to put inside of a DynamoDB table.

play01:00

They have an OrderId.

play01:01

That's their partition key.

play01:03

They have other attributes,

play01:05

which are in a sense just payload

play01:08

when it comes to the partitioning aspects of DynamoDB.

play01:11

On the right-hand side, we have a DynamoDB table,

play01:13

a logical representation of our orders.

play01:16

Inside of a table, there are partitions.

play01:19

I'll oftentimes call 'em physical partitions.

play01:21

Sometimes, you hear virtual partitions.

play01:24

This is the actual bucket in which the data goes,

play01:26

and each one is responsible for a section of the key space.

play01:30

So Partition A here is responsible for 00 to 55.

play01:34

So it's kind of like housing addresses between 00

play01:37

and 55 go to this partition.

play01:39

Partition B is 55 to AA,

play01:41

and Partition C is AA to FF.

play01:44

In reality, these are longer than two digits,

play01:46

but let's, for simplicity, we'll just do two digits here.

play01:49

So now let's look at an OrderId 1.

play01:52

Where should we put it?

play01:54

What you do is you take the partition key,

play01:56

which is sometimes you see it called a hash key,

play01:58

and this is why.

play01:59

We hash it.

play02:01

Remember a hash, you take an arbitrary input value.

play02:04

You run it through a mathematical process,

play02:06

and you get out a fixed length string, or value,

play02:08

MD5, for example, SHA-256, things like that.

play02:12

So you run a hash on this,

play02:13

and if we assume in our hash function a 1 input produces

play02:17

a 7B output every time, then we can say, all right,

play02:20

well, where should this item go?

play02:22

Well, it should go into Partition B,

play02:25

because 7B is between 55 and AA in hex.

play02:30

All right, OrderId 2, we're going to hash the 2.

play02:33

Produces a 48.

play02:34

It goes into Partition A,

play02:37

because 48 is between 00 and 55.

play02:40

And then we'll hash 3, which equals CD in this case,

play02:43

and it goes into Partition C.

play02:46

Okay, so this is how it works.

play02:48

Now, each physical partition can store

play02:50

an arbitrary number of items.

play02:51

It doesn't just store, for example,

play02:53

the first partition there doesn't only store OrderId 2.

play02:56

It stores any OrderId that hashes between 00 and 55.

play03:01

And as is common with NoSQL databases,

play03:05

all the data items together,

play03:07

all those attributes are stored continuously,

play03:09

so that they're easy to retrieve

play03:11

as a singular unit, right?

play03:13

So it's not scattering all the attributes out

play03:15

into different locations that has to be joined back.

play03:18

The items are stored contiguously.

play03:21

So that's the logical representation.

play03:24

Let's look a little bit more at what is really going on,

play03:27

because on the backend there are servers.

play03:30

And so, when we hash 1,

play03:32

it goes to Partition B.

play03:34

How's Partition B really represented?

play03:36

We have multiple servers within the AWS region

play03:41

that this table is a part of,

play03:43

and each partition is going to be on a particular host,

play03:46

and there are multiple availability zones,

play03:48

three availability zones responsible for every table.

play03:51

So in this case, Order 1 is hashed to Partition B,

play03:54

which can go to three different availability zones,

play03:58

each Partition B.

play03:59

So it goes on Host 2, Host 5, and Host 8

play04:02

replicated across all three.

play04:05

And why do we replicate?

play04:06

We replicate because should there be an issue

play04:08

with an availability zone, or with Host 2, or 5, or 8,

play04:12

we still have two working copies,

play04:14

and you don't notice as the user of DynamoDB.

play04:17

It's resilient to the failures.

play04:20

So now, let's step back to about a 10,000 foot,

play04:22

and understand what's really going on,

play04:24

because behind the scenes, there aren't just three servers

play04:26

responsible in each availability zone.

play04:28

There's actually thousands.

play04:30

So you as the user on the left make a request.

play04:33

It goes to a load balancer.

play04:34

That gets sent to a request router,

play04:37

and there are thousands of request routers in each AZ.

play04:40

That request router knows for each request coming in,

play04:44

because of the partition key, sometimes called a hash key,

play04:47

that it goes to a certain storage node.

play04:49

There are thousands of storage nodes on the right,

play04:51

and it will route that request to the right storage node,

play04:54

and because there are three storage nodes involved

play04:57

in every particular item,

play04:59

it will go to the other storage nodes

play05:01

in the other availability zones.

play05:06

The storage nodes communicate with each other,

play05:09

doing a heartbeat that goes on continuously,

play05:12

saying, "Hey, are you still there?

play05:14

"Are you still healthy?"

play05:15

And also, "Who's the leader?"

play05:17

Because when you do a write, you always write to the leader,

play05:20

and the other nodes are followers.

play05:22

When you do a request, do an insert,

play05:25

it goes to the leader, the leader writes it,

play05:27

it propagates that to the other storage nodes.

play05:29

When one of them replies that they've got it,

play05:32

then you can get the response back acknowledged.

play05:35

You don't have to wait for the third one.

play05:36

That's a performance improvement,

play05:38

but you always have it written to at least two locations,

play05:41

and if one of the machines should die,

play05:43

we use a Paxos algorithm to elect a new leader,

play05:45

or actually, periodically we elect new leaders.

play05:49

Makes sense?

play05:51

So how do you get an item?

play05:54

When you do a GetItem, it goes, again,

play05:56

through the load balancer to the request router,

play05:58

and that request router knows which storage nodes

play06:00

are responsible for that particular item

play06:02

by looking at the partition key and knowing the metadata.

play06:05

All right, so now, there are two different ways

play06:07

to do a request.

play06:08

Remember, you could do strong consistency,

play06:11

or eventual consistency.

play06:13

So here's what's really going on when you do that.

play06:15

When you say strong consistency,

play06:17

it always goes to the leader,

play06:19

because the leader always has the very latest data.

play06:23

If you say eventual consistency,

play06:25

then it can pick any of the nodes to go to.

play06:30

Statistically, 2/3 of the time,

play06:33

you will be getting the very latest version of that item,

play06:35

but potentially, you could be going to that third node

play06:37

that is a millisecond behind the other two,

play06:41

and that's why it's eventually consistent.

play06:42

It is going to get that data item,

play06:44

but you might ask so quickly that the data hadn't propagated

play06:47

from the leader to the follower, and that's the,

play06:51

I'd say downside of eventually consistent,

play06:53

but the advantage is that it's half the cost, right?

play06:57

You use half as many RCUs for an eventually consistent read

play07:00

as a strongly consistent read.

play07:02

What's nice is when you do a strongly consistent read,

play07:04

you're not polling numerous machines.

play07:07

You're just going to the leader,

play07:08

so it's still very fast when you do

play07:10

a strongly consistent read.

play07:15

We talked in the first section about GSIs,

play07:18

Global Secondary Indexes.

play07:19

So now, let's look at how they're actually implemented.

play07:22

They're essentially a second table

play07:26

that is automatically maintained by the DynamoDB system

play07:30

with a log propagator that knows how to move data

play07:32

from the base table into the GSI table.

play07:36

And this is why GSIs actually have

play07:37

their own provisioned capacity,

play07:39

and you can put them in on-demand mode,

play07:40

and things like that.

play07:42

So when you do an insert to some date item

play07:44

in the base table, the log propagator says,

play07:47

"All right, is that relevant for my GSI?"

play07:49

Not every item will have the attributes

play07:52

that are responsible in the GSI,

play07:54

might not have the partition key, and sort key.

play07:56

And if so, nothing gets propagated,

play07:58

which is an interesting way for the GSI

play08:01

to have less throughput load than the base table,

play08:04

because if it's a sparse GSI,

play08:06

and I'll give you some examples later,

play08:08

maybe the partition key needed for the GSI doesn't exist

play08:12

in the attribute, in the item as an attribute,

play08:14

and therefore doesn't propagate.

play08:16

There are also cases, though,

play08:17

where the GSI might get more throughput,

play08:20

because if you in the base table update the attribute

play08:23

that is used as the partition key in the GSI,

play08:27

then you have to delete one partition,

play08:30

and insert into another partition.

play08:32

So this is an amplification effect,

play08:34

where if you're continuously changing the partition key

play08:38

used by the GSI in the base table,

play08:40

even if it's not the same partition key

play08:41

as in the base table, you're going to get a delete

play08:44

and an insert in the GSI.

play08:46

So be aware that the GSI in some cases will have

play08:48

less throughput than the base table.

play08:50

In some cases, it might have more throughput.

play08:51

That's why it's independently provisioned.

play08:54

Let's look next at the throughput.

play08:57

I covered this in part one.

play08:59

Remember provisioned capacity and on-demand capacity?

play09:02

Let's look a little bit about what's going on

play09:03

on the backend when you make these choices.

play09:05

So throughput, it comes in Read Capacity Units

play09:09

and Write Capacity Units,

play09:11

shorthand, RCU and WCU.

play09:14

In on-demand mode, a Read Capacity Unit is a 4 K request.

play09:18

If you do a request for 7 K to retrieve, that's two RCUs.

play09:23

If you do 100 K, that's 25 RCUs.

play09:28

And Write Capacity Units, if you write a 10 KB item,

play09:31

that's 10 Write Capacity Units in on-demand.

play09:34

When you're in provisioned mode, it's a little different.

play09:36

They're called the same thing, but the units are different.

play09:38

It's that amount per second.

play09:40

So if you provision at 1,000 RCUs per second,

play09:44

what you're basically saying is every second,

play09:46

I expect to be consuming about 1,000 RCUs,

play09:50

and over time, so long as you stay around there,

play09:53

that's what the table's going to give you,

play09:55

similar with Write Capacity Units.

play09:56

So it can be confusing for people to think,

play09:59

"Well, isn't WCU a singular request, or is it a rate?"

play10:02

And it depends on your mode.

play10:04

In on-demand, it's a singular request.

play10:05

With provisioned, it's a rate,

play10:07

and they are independent.

play10:08

So a table can have really high read, low write,

play10:10

or the reverse, or high, or low of both.

play10:13

Now, remember, eventually consistent reads are

play10:16

at half the rate.

play10:17

So if you do a 100 KB item retrieval eventually consistent,

play10:21

then instead of 25 RCUs, it would be half that.

play10:26

There are some other things to think about.

play10:27

The max size of an item in a table is 400 KB.

play10:31

That is somewhat

play10:33

to keep you using DynamoDB

play10:36

the way it was intended to be used.

play10:37

It's not meant to store megabytes of data

play10:39

as a singular item.

play10:41

So 400 KB real limit.

play10:43

If you wanna store more than that,

play10:44

you wanna split your item into multiple pieces,

play10:46

and store them as separate items.

play10:48

And on the backend, scaling is achieved with partitioning.

play10:51

We just reviewed the physical partitioning.

play10:53

Each one of those virtual partitions,

play10:55

or physical partitions, supports 1,000 WCUs per second,

play10:59

or 3,000 RCUs per second, or a mix.

play11:02

So if you're using 500 WCUs,

play11:04

then you'd get 1,500 RCUs, for example.

play11:06

It's one, or the other, or a mix of the two,

play11:09

and you know, when you start getting hot,

play11:11

it starts splitting for you.

play11:12

So if your traffic grows, it can split.

play11:15

When your size gets large,

play11:16

specifically when a partition under whatever partition keys

play11:22

sort into that physical partition,

play11:23

when it gets bigger than 10 gigabytes,

play11:25

it'll split that into two different partitions.

play11:30

You get to pick your capacity mode

play11:33

when you create the table.

play11:34

On-demand, really straightforward.

play11:36

With provisioned, I would recommend you turn on auto-scaling

play11:39

in almost all cases.

play11:41

What this says is instead of saying

play11:42

I want a particular level steady,

play11:45

you say please notice the live traffic

play11:48

that is going on with the table right now,

play11:50

and adjust up and down accordingly.

play11:52

So you can specify the minimum, don't go below this,

play11:56

the maximum, don't go above this, and a target utilization.

play12:00

And so, this is a representation

play12:03

of how the provisioned capacity is enforcing those limits.

play12:07

So step one, you make a request over the network

play12:11

to a load balancer, which gets sent to a request router.

play12:13

The request router authenticates.

play12:15

That's number three on the slide.

play12:18

It looks at the partition metadata system to understand

play12:20

the topology of where that partition key

play12:22

would actually be assigned to a storage node.

play12:25

But then before sending to the storage node,

play12:27

it keeps track of how much traffic you've been sending,

play12:31

and it has basically a bucket of tokens.

play12:34

So every time you do a 20 RCU request, or a 20 WCU request,

play12:38

it subtracts that from the table bucket,

play12:42

and that bucket fills every second

play12:43

with whatever your provisioned capacity is.

play12:45

So if you've provisioned 1,000, it fills with 1,000 tokens,

play12:48

and every second you can be pulling out.

play12:50

So you pull out 20, you pull out 50, you pull out one,

play12:52

you pull out 50, you pull out 100,

play12:54

so long as you have tokens, it lets the traffic go forward

play12:57

onto the storage note.

play12:59

All right, what happens if you ask

play13:01

for more than the provisioned capacity?

play13:03

So you've provisioned 1,000, and for a little while,

play13:06

you'd like to drive 2,000.

play13:08

That's actually okay for short durations,

play13:11

thanks to a burst bucket.

play13:12

The burst bucket takes any tokens that weren't used

play13:16

during a particular second, and keeps those for you,

play13:19

kind of like roll-over minutes on old style cellphone plans,

play13:22

and it is able to keep track of all the unused capacity

play13:26

over the last five minutes.

play13:28

So if you are running at exactly your provisioned capacity,

play13:32

the burst bucket would be empty,

play13:33

but typically, you're running below capacity,

play13:36

especially if you've set utilization at 70%.

play13:38

Then you've got a certain amount of unused tokens

play13:42

from the table bucket that can go into the burst bucket.

play13:45

That's what allows temporary spikes

play13:47

above the provisioned capacity, and that's just fine.

play13:49

It's still allowed forward to the storage node.

play13:53

There is another set of buckets.

play13:55

Each particular storage node has its own bucket,

play13:59

which is enforcing that 3,000 RCU, 1,000 WCU limit.

play14:04

All right?

play14:05

So just because the table has capacity,

play14:08

if you've only used one partition key for the table,

play14:11

and all traffic is going to the same item,

play14:14

it's still going to be throttled on the storage node side,

play14:16

because of the buckets that are associated

play14:17

with that particular storage node.

play14:19

So that's a good explanation, I think,

play14:22

of why you want to have a pretty good dispersion

play14:26

of partition keys.

play14:27

You want to have a good number of storage nodes involved

play14:29

with your table, not too few.

play14:33

In on-demand capacity mode,

play14:36

this is what you set up when you say,

play14:38

"You know, my traffic isn't really smooth.

play14:40

"I might go down to zero for a while.

play14:42

"I might spike up rapidly.

play14:44

"I might be doing a bulk load without any advance notice,

play14:47

"and I just want the table to not enforce

play14:50

"the table bucket and the burst bucket."

play14:52

And so, with on-demand, those buckets don't exist.

play14:55

There's no limit at that level saying that,

play14:58

"Oh, this request shouldn't go forward."

play15:02

There is, however, still the limit

play15:04

on each particular storage node.

play15:06

So that's, again, why it's important to have

play15:08

good dispersion of your partition keys.

play15:10

But in this case, it's really easy, set and forget.

play15:13

You just say this table,

play15:15

I don't know what the traffic will be.

play15:17

If a request comes in,

play15:18

I want you to do your best to process it,

play15:20

and so long as you have good partition key dispersion,

play15:22

it'll work great.

play15:24

When you create a new table in DynamoDB

play15:26

and put it into on-demand mode,

play15:28

it has to have a certain expectation of how much capacity

play15:31

it should provision on the backend,

play15:33

and this is what it provisions.

play15:34

It expects up to 4,000 write request units,

play15:37

which would be 4,000 writes per second,

play15:40

or up to 12,000 read request units,

play15:42

which, if it's eventually consistent,

play15:44

would be 24,000 consistent reads per second,

play15:46

or any combination of the two.

play15:48

That's just the default throughput.

play15:50

If you send traffic, and your traffic is greater than that,

play15:53

it will adjust the table up,

play15:55

and in fact, there's no maximum throughput.

play15:57

You can send, you know, and people do,

play16:00

million write request units per second,

play16:02

multi-million request units per second.

play16:05

If you remember in the first section, amazon.com was sending

play16:08

over 80 million requests per second to DynamoDB,

play16:12

and an on-demand table could definitely support that.

play16:14

It won't support it out-of-the-box necessarily,

play16:17

but there is a way to do it.

play16:19

If you provision the table to a certain level above this,

play16:23

it will have the table able to support

play16:24

that level of traffic, and then if you switch to on-demand,

play16:27

it will keep that ability.

play16:31

If you don't do that, it will always keep track

play16:33

of what the request rate is,

play16:34

and here we see a synthetic amount of requests per second,

play16:38

and each time there's a new peak,

play16:40

the table on the backend will auto-adjust.

play16:42

So, oh, there's a lot of traffic.

play16:44

Let me grow the table to twice that.

play16:46

Oh, new peak, let me grow the table to twice that.

play16:48

Auto admin on the backend is noticing the traffic,

play16:51

and continuously adjusting the table's capabilities,

play16:54

so that you're not close to hitting the max

play16:56

that the table can support,

play16:58

always up to twice the previous peak.

play17:01

And in fact, these on-demand tables don't scale down.

play17:03

So if you hit a peak,

play17:05

and then you go silent over the weekend,

play17:07

when you come back Monday morning with a lot of traffic,

play17:09

it'll still be capable of supporting that level of traffic

play17:12

that had been the previous peak.

play17:14

And in fact, that red line should probably be

play17:15

double the green line, right?

play17:17

Because we just learned twice the previous peak.

play17:21

So which one do you wanna pick?

play17:24

You know, you can actually pick one, and then change,

play17:27

but in advance, provisioned mode is when you have

play17:29

steady workloads, you have an expectation

play17:32

that I have kind of a sine wave throughout the day.

play17:34

It doesn't grow from 10 to 1,000 in a second.

play17:39

I have events maybe where I know the traffic is coming,

play17:41

and I wanna provision at that certain level.

play17:44

On-demand is when you don't know in advance what's coming.

play17:48

Maybe you open up a new region,

play17:49

and you're not sure what the traffic will be.

play17:50

Go in on-demand mode, DynamoDB will handle the traffic

play17:53

that's coming at it.

play17:54

When it's idle, if it's completely idle,

play17:57

there is no throughput cost.

play17:59

It's great when you don't know what's coming.

play18:01

You just set it and forget it.

play18:03

What we see on this slide is a certain partition,

play18:06

our favorite Partition A,

play18:08

and A is interesting, because it has two different items

play18:11

that are getting quite a lot of traffic,

play18:13

item foo and bar.

play18:15

If it gets too much traffic, either write traffic,

play18:17

or read traffic, we know that there's a certain amount

play18:19

of capacity that a certain partition can support.

play18:22

So what should we do?

play18:23

What should auto admin do?

play18:24

What auto admin does is it splits the partition,

play18:28

sometimes in half, not always perfectly in half,

play18:30

and will try to separate the two hot items,

play18:32

so that item foo is in a different partition than item bar,

play18:35

and this happens in the background.

play18:36

You don't see it.

play18:37

You're not aware of it,

play18:38

except it's happening to make sure

play18:39

that your DynamoDB performance is always maintained.

play18:43

And in fact, that foo item is still really red.

play18:47

And so, eventually, auto admin might separate foo

play18:50

into its own partition.

play18:51

One partition might, at the end of the day,

play18:53

be one singular item if it gets quite a lot of traffic.

play18:56

This is always done automatically in the background,

play18:58

detecting what's the traffic, should there be more splits?

play19:03

If you're curious what's going on with hotkeys,

play19:06

you can turn on contributor insights,

play19:09

just a push button in the console,

play19:11

and you get a screen like this.

play19:12

It shows on the left the partition keys

play19:15

that are being most accessed.

play19:17

Top left is partition keys most accessed.

play19:20

Top right is the partition key, sort key combination

play19:23

that is most accessed.

play19:25

On the bottom left, you can see the most throttled items,

play19:28

so this is the partition keys that are most often throttled.

play19:31

Throttling is what you would get

play19:32

when a partition is getting more traffic

play19:35

than its limits of 3,000 RCUs, 1,000 WCUs.

play19:38

But you see how the throttling very quickly goes away?

play19:42

That's because auto admin's doing the job on the backend,

play19:44

saying, "Whoa, this is very hot, let me split it.

play19:46

"This is still hot, let me split it,"

play19:48

and then very quickly, no more throttling.

play19:50

And on the bottom right you see

play19:51

the most throttled particular items

play19:53

of the partition key, sort key combination.

play19:56

So this one item was throttled,

play19:58

and then it was no more throttling.

play20:00

Why? Because it was the foo in this case.

play20:02

It was the one that was a particularly hot individual item,

play20:06

probably went into its own physical partition.

play20:09

At this point, you've seen what's going on underneath

play20:12

the covers of DynamoDB, and let's put it into practice.

play20:16

Let's look at some real-life situations,

play20:19

and think how could this be improved?

play20:21

What's the best approach?

play20:23

Here's a real-life example.

play20:25

This is a shopping cart that was being stored

play20:27

inside DynamoDB, and it was stored as a singular data item,

play20:31

basically just a JSON document held in DynamoDB.

play20:35

And that's okay, you can do that,

play20:37

but it's not the most optimal way to store the data,

play20:40

and why not?

play20:42

Well, you only get 400 kilobytes per item.

play20:45

And so, putting everything into the same item

play20:47

starts to eventually grow beyond the 400 KB limit.

play20:51

You can't query on any of these nested attributes.

play20:54

You can filter on them,

play20:55

but you can't put them into the sort key

play20:57

to be able to do index-driven retrievals.

play21:00

So it starts to be just a very simple key value store,

play21:02

as opposed to the full power of DynamoDB,

play21:05

where you can query and retrieve based on attributes

play21:08

that are held.

play21:09

But I think maybe even more so is that anytime you change

play21:12

an item, anytime you update an item,

play21:15

the cost of that is the larger of the before, or after size.

play21:20

So if you update a 300 KB item and make it a 350 KB item,

play21:25

you're charged, in this case, 350 write units,

play21:28

Write Capacity Units.

play21:30

That's quite a lot if you're only updating

play21:32

a fraction of the item.

play21:34

So what's a better way to do this?

play21:37

Look at this. This is a screenshot from NoSQL Workbench.

play21:40

I have as a partition key the certain user ID,

play21:43

and I've separated out all the different attributes

play21:45

about this user into different individual items

play21:49

in the same item collection of the user,

play21:51

and in the sort key, the category of the data is prepended

play21:57

to the particular unique value.

play21:59

So I have their address.

play22:01

I have items in their cart.

play22:03

I have their order history,

play22:05

their profile name, the store that they prefer.

play22:08

What's nice about this, instead of one singular large item,

play22:12

I can retrieve all their cart items

play22:14

by saying give me this user ID,

play22:16

all the sort key that begins with Cart#.

play22:18

I can retrieve their address with a singular

play22:20

give me the sort key that starts Address#.

play22:25

I can update the address, and only update the address item,

play22:28

not update their cart, and their order history,

play22:30

and their profile name, and all the rest.

play22:32

So this is better performance,

play22:33

better selectivity, lower cost.

play22:36

That's why you see this whole general pattern

play22:39

of a customer oftentimes as a partition key

play22:42

and different aspects of that customer

play22:44

that are remembered are in the sort key,

play22:46

oftentimes prepended with the category of the data.

play22:51

All right, here, we have a device log.

play22:53

The DeviceID is the partition key.

play22:56

And for the sort key, we have the Date

play22:59

of some sort of warning, or state adjustment here.

play23:03

This is pretty good.

play23:05

We like the high cardinality of the DeviceID.

play23:08

We can scale to an unlimited number of devices.

play23:11

All right, so what do we wanna retrieve?

play23:17

Let's say we want to retrieve all warnings.

play23:22

Here's pseudocode of what it would look like.

play23:23

Select everything from DeviceLog,

play23:26

WHERE the device is equal to whatever,

play23:28

and FILTER ON warning, WARNING1.

play23:32

The bottom left is the command line invocation

play23:34

that makes this happen.

play23:36

So you say aws dynamodb query.

play23:38

You specify the table-name as DeviceLog,

play23:41

and you do a key-condition-expression,

play23:43

where the dID is equal to dID.

play23:45

Now, this is the first time I've shown this hash and colon.

play23:50

It's for substitution,

play23:51

because the IDs might have special characters.

play23:54

You do the substitution where you say #dID,

play23:58

and expression-attribute-names #dID is equal to DeviceID.

play24:06

:dID is substituted as d#12345

play24:09

under expression-attribute-values, okay?

play24:11

So we're saying DeviceID is equal to d#12345.

play24:15

filter-expression #s, which is State,

play24:18

is equal to :s, which is WARNING1.

play24:21

And no-scan-index-forward is a way of saying scan backwards.

play24:25

So this is a way to specify

play24:27

that what the pseudocode SQL on the top left says,

play24:30

and say go to this, find me where the dID is equal

play24:32

to DeviceID of 12345,

play24:36

and the State is WARNING1.

play24:38

All right, sounds pretty good.

play24:40

I'm filtering for the state, however,

play24:43

which means that when I do the request, on the backend,

play24:49

the state retrieval is not index optimized.

play24:52

Is that an issue?

play24:56

Well, a little bit,

play24:59

because if there's 99% of the time normal,

play25:03

1% of the time warning, then I'm going to be filtering away

play25:07

99% of the items I'd like to look at,

play25:09

which will mean that I'm basically expending 100X more RCUs

play25:14

than I would otherwise want to do.

play25:15

So what I'd really like is for that state

play25:17

to be index-driven.

play25:20

If we wanted index-driven, we'd put it into the sort key,

play25:22

the common pattern.

play25:24

So what we've done here is instead of a sort key

play25:27

of simply the date,

play25:28

the sort key is the State# and then the Date.

play25:36

Now, we've adjusted the command line

play25:39

to say look at the DeviceLog backwards.

play25:42

Find me under key expression

play25:44

where the DeviceID is 12345,

play25:47

and the State#Date, as the name, is equal to WARNING1.

play25:51

This is completely index optimized.

play25:54

If there was one warning out of 10,000 normal,

play25:57

it will retrieve just that singular item,

play25:59

because it's index optimized to find sort keys

play26:01

that begin with a certain value.

play26:03

So this is a good design pattern,

play26:05

because it reduces cost and improves performance.

play26:12

All right, let's continue with the puzzlers.

play26:15

We have the same data here, now with the new sort key.

play26:19

What we want is to fetch all the device logs

play26:21

for a given operator between two dates.

play26:24

And we see that the Operator is an attribute.

play26:27

We have Liz and Sue.

play26:30

Hmm.

play26:31

Well, what's the right way to, for a given table,

play26:33

where you want to retrieve the data using

play26:35

different attributes than the base table has

play26:39

in the partition key and the sort key?

play26:41

We use a GSI.

play26:42

All right, so all device logs for given operator

play26:45

between two dates.

play26:46

We want to have a GSI,

play26:47

where Operator would be the partition key,

play26:50

and the Date would be the sort key,

play26:52

so that I can retrieve by dates.

play26:53

So here's my partition key.

play26:56

Here's my GSI representation inside NoSQL Workbench,

play26:59

and the sort key here is Date,

play27:01

and this is very straightforward then to find

play27:03

all device logs for a certain operator between two dates,

play27:05

because I can do a sort key index-driven between two dates.

play27:10

All right, good use of a GSI.

play27:14

Here's what the CLI would look like.

play27:17

You notice that we specify the table-name, DeviceLog,

play27:20

and the index-name as well, because the index is associated

play27:23

with the table, but has its own name.

play27:24

So indexes in DynamoDB aren't implicitly used.

play27:27

They're explicitly used.

play27:28

You say, "Get this from this index."

play27:32

And so, against that index,

play27:33

you look for the key-condition-expression

play27:35

where the Operator is Liz,

play27:36

and the Date is between March 20th and 25th.

play27:42

Index-driven, very efficient.

play27:46

All right, new access pattern,

play27:47

fetch all escalated device logs for a given supervisor.

play27:52

All right, what we see here, supervisor is Sarah.

play27:56

But only sometimes.

play27:58

It's not always present.

play27:59

Not every one is escalated.

play28:01

Does that matter?

play28:03

Can I still do a GSI if the attributes don't exist?

play28:07

Yes, you can, and in fact, it's called a sparse GSI.

play28:11

It's a great pattern when you have

play28:14

a needle in a haystack type query.

play28:17

It's, when you create a sparse GSI,

play28:21

if the item in the base table doesn't match the needs

play28:23

for that GSI, it's not propagated.

play28:25

Therefore, there's no cost of the reads,

play28:27

or writes for the GSI.

play28:28

Well, no writes, no write cost for the GSI,

play28:31

and there's no storage cost on the GSI.

play28:33

So this is a very low cost GSI to maintain,

play28:36

and yet it's really much better than doing,

play28:38

say, a scan against the whole table to try to find

play28:40

this needle in the haystack type query.

play28:42

So good thing to remember, sparse GSI.

play28:44

There's no formal definition of a sparse GSI.

play28:46

It's just something we think of.

play28:48

It's a GSI where I don't expect most items to match,

play28:51

and then we can consider it a sparse GSI

play28:53

just as nomenclature to share with each other,

play28:56

the general design pattern.

play29:00

And then here's an example from Amazon's Phone Tool.

play29:04

Phone Tool is how you can look up another employee,

play29:06

and see where their office is, what their time zone is,

play29:08

what their phone number is, and things like that,

play29:11

and it shows a common pattern of making

play29:13

a hierarchical sort key,

play29:15

where the partition key in this case is Country,

play29:17

and the sort key is the state hash the airport code hash

play29:22

the office location.

play29:24

So this would be how you'd look up metadata about

play29:27

every office inside of Amazon.

play29:30

And why would you do this?

play29:32

Because it's very easy, therefore, to do a query that says,

play29:35

"Give me all the offices in the USA,"

play29:37

'cause you just do a query where you don't specify

play29:39

a sort key condition, and you specify the Country.

play29:42

You can say, "Give me all the New York locations,"

play29:45

by saying the sort key has to start with NY#,

play29:48

all the New York City locations,

play29:50

where the sort key starts with NY#NYC#,

play29:53

or just a particular item by adding in the airport code

play29:57

that we use for, for example, JFK14,

play30:00

which is a certain office building.

play30:02

So this hierarchical data in the sort key

play30:04

is a common pattern.

play30:07

It lets you query different granularities.

play30:11

Would this be a good design for public-facing access?

play30:20

Actually not, and the reason is that

play30:22

that partition key of Country,

play30:24

I expect that is going to be a very hot partition key,

play30:27

and you generally want to create designs

play30:30

where you don't have too much going on

play30:31

under a certain partition key

play30:33

relative to the whole table access.

play30:35

So you might wanna do something where the Country is

play30:38

USA-NY, USA-WA as the partition key.

play30:43

The pros and cons of that,

play30:44

you have more dispersion of your partition key,

play30:46

but if you did want every USA office,

play30:48

then you would have to do a BatchGetItem, or a BatchQuery,

play30:52

which goes and retrieves all the items

play30:55

for every particular state,

play30:57

and aggregate it yourself in your code.

play31:01

All right, at this point,

play31:03

we've covered sort of the common scenarios.

play31:06

Now, let's get into some

play31:07

that are a little bit more interesting and unique,

play31:09

a little bit more likely to be stumpers,

play31:10

even for people who have used DynamoDB for a little while.

play31:14

The first one I'll call "American Idol."

play31:18

In this case, we want to keep track of votes

play31:20

for certain numbers of candidates.

play31:22

The natural way to represent that is as shown here

play31:25

in the screenshot that we have a certain candidate

play31:26

with a certain number of votes,

play31:28

and the top two candidates get most of the votes.

play31:31

So Candidate A gets a bunch of votes,

play31:32

Candidate B gets a bunch of votes.

play31:36

The challenge with this design is

play31:37

that there's that single item limit of 1,000 WCUs,

play31:40

which means we would get

play31:42

1,000 writes per second per candidate.

play31:45

I think during "American Idol" final week

play31:47

we were getting more votes than that per second.

play31:50

So what do you do about it?

play31:53

Hmm.

play31:55

I'll let you think for a second.

play31:57

All right, that's long enough.

play32:00

Let's assume that we have 20,000 votes going to Candidate A

play32:02

and 10,000 votes going to Candidate B.

play32:05

Maybe that's a good hint.

play32:07

What will we do?

play32:08

What we will do is create different partitions

play32:13

for each candidate, so that instead of having

play32:14

one giant ballot box per candidate,

play32:16

we will separate, and give maybe 20,

play32:18

or 50 ballot boxes per candidate,

play32:21

and people can go to that particular ballot box randomly,

play32:24

and place their vote in that ballot box.

play32:26

Maybe another analogy is a buffet line at a conference.

play32:29

I hate it when they have one buffet line.

play32:31

Let's do 10 buffet lines, much shorter lines.

play32:33

I get my food faster.

play32:35

That's kind of what we're doing here.

play32:36

So when you update an item,

play32:37

you just randomly pick the partition,

play32:40

Candidate A 1, 2, 3, 4,

play32:42

and you add your vote there.

play32:43

Each one of those can get 1,000 WCUs,

play32:46

and therefore with 20, you could have 20,000 WCUs.

play32:49

With 100, you would have 100,000 WCUs, right?

play32:52

So one vote goes here.

play32:54

A vote goes there.

play32:55

Vote, vote, vote, vote, vote, vote.

play32:59

At some point, maybe periodically, every second, at the end,

play33:03

you do an aggregation.

play33:05

So you do a parallel collection.

play33:08

You go to each of these partitions.

play33:09

You go a GetItem on the votes,

play33:12

and you insert as Candidate A total

play33:15

the total number of votes.

play33:17

That's what your user interface,

play33:18

that's what your application goes against.

play33:21

So you keep track of how many people

play33:23

went through the buffet line by keeping a separate counter

play33:25

per buffet line, and keeping one vote in one,

play33:28

one metric in one location for the number of people

play33:31

that have enjoyed the beautiful food at the conference,

play33:34

and similar with Candidate B.

play33:37

Okay, so in the application, just retrieve from the total.

play33:40

Your application doesn't have to be aware of the fact

play33:42

that there are actually multiple ballot boxes

play33:43

on the backend.

play33:46

Here's a puzzler.

play33:48

All right, I've got different events.

play33:52

Each event is the partition key

play33:53

and each event gets a timestamp.

play33:56

These are possibly events that need to be handled,

play34:01

and I need to query for find me all the events

play34:03

that are still in the database

play34:05

that are older than four hours.

play34:08

All right, well, this partition key is good,

play34:12

because it's great dispersion.

play34:13

Every event gets its own partition key.

play34:14

That's fabulous.

play34:15

The timestamp is underneath, good,

play34:17

but I need to fetch all the events older than four hours.

play34:19

So what do I do with that?

play34:21

Well, I think your first instinct is create a GSI,

play34:24

and that's good, but the natural way to create the GSI

play34:28

would be to have some partition key,

play34:30

which aggregates all of the timestamps underneath it,

play34:32

to make it easy to fetch events

play34:34

that are older than four hours, and that'll work.

play34:36

It'll only work up to a certain limit,

play34:38

because, again, the GSI has the same 1,000 WCU limits.

play34:44

And so, that would limit how many events.

play34:45

Now, you know, a small scale database,

play34:48

in fact, even a normal scale database,

play34:50

this wouldn't be an issue.

play34:51

But we're thinking massive scale here.

play34:53

So how do you do this when the number of events

play34:55

is potentially, you know, multi-million, or a billion?

play34:59

All right, what do you do with that?

play35:01

What you do is the same idea I just discussed,

play35:03

where you have a GSI,

play35:06

and you put all the timestamps

play35:07

under that same partition key,

play35:09

except in this case, we create more than one partition key,

play35:13

as many as you need, N many.

play35:15

So GSI-PartitionKey 1, 2, and 3.

play35:18

What you're doing here is creating three, or N, or 50,

play35:21

or 100, however many different separate GSIs,

play35:24

separate item collections,

play35:25

each of which will keep its own sorted list of timestamps,

play35:29

and then you can go to each of those individual items,

play35:31

and get the ones that are older than four hours,

play35:32

and aggregate it together, right?

play35:35

So you insert the event with a timestamp,

play35:37

and all you have to do is pick a random number

play35:39

between zero and N, and say,

play35:40

"That's gonna be my GSI-PartitionKey."

play35:44

And then this is the GSI point of view.

play35:47

You will have a bunch of different partitions,

play35:49

as many as you picked,

play35:50

each of which has its own set of timestamps,

play35:53

and then you can do a direct query against that, and say,

play35:55

"Find me those which have a timestamp

play35:56

"older than four hours."

play35:58

Do that in parallel.

play36:00

When it finds one, it alerts, and says,

play36:02

"I found it, here you go," okay?

play36:05

So again, this is what you do when you have

play36:08

scale above typical.

play36:10

It's a good technique.

play36:13

Those were all write examples,

play36:15

where we were thinking about WCU limits.

play36:17

Here's a read example.

play36:19

This is, say, Amazon Prime Day,

play36:21

and different items have different popularity

play36:23

in the product catalog.

play36:25

The Instant Pot, that is the hit of the year,

play36:27

70,000 requests per second,

play36:29

need to retrieve the metadata about the Instant Pot.

play36:32

"Childhood's End," science fiction on the right,

play36:34

a lot less traffic.

play36:36

What do you do if you need to get the same item

play36:38

70,000 times per second?

play36:43

Is your instinct to do a read shard?

play36:46

It's almost a trick question.

play36:48

That's not what you have to do.

play36:51

Here's a typical access pattern of almost anything.

play36:54

What is this, the Pareto curve?

play36:56

Some items get most of the traffic.

play37:01

DAX is the solution.

play37:03

Remember DAX, the DynamoDB accelerator?

play37:05

This is an in-memory cache that can go in front of DynamoDB,

play37:09

and when you do a GetItem against it,

play37:10

it can do millions per second.

play37:12

So it's the same API as DynamoDB itself.

play37:15

If you do a GetItem, and it doesn't have it in its cache,

play37:17

it will retrieve it from DynamoDB, and send it on,

play37:20

and remember it for the next request.

play37:22

If you do a PutItem, you write an item,

play37:24

it'll write it, and remember for the next request as well.

play37:27

So it's a great way to offload traffic for hot items,

play37:32

and you just put as many servers.

play37:35

This is not serverless.

play37:36

This is, there are servers.

play37:37

You can pick as many servers up to 10 read replicas

play37:39

in front of DynamoDB.

play37:41

So that's something to think about when you think,

play37:42

"You know what, this item,

play37:45

"even if it's in its own physical partition,

play37:46

"I'm going to need to do more than 3,000 RCUs

play37:48

"against the same individual item."

play37:51

In-memory cache will rescue.

play37:54

All right, that's enough puzzlers.

play37:56

Let me give you a couple just more advanced topics

play37:59

I wanted to cover today.

play38:00

So transactions we did mention before.

play38:02

The way that you implement that is you issue

play38:04

a TransactWriteItems command.

play38:06

It batches up a set of items,

play38:08

and says, "I want these done atomically."

play38:11

And you can, you know, do a put in there,

play38:13

an update in there, a delete in there,

play38:14

any number of those together

play38:16

up to 25 items within a transaction.

play38:19

And you can do it across tables,

play38:21

and you can do it in some conditional checks as well

play38:24

that would say if this condition fails,

play38:25

I want this transaction to end.

play38:28

Why not do everything with a TransactWriteItems?

play38:29

Well, it consumes 2X the WCU

play38:33

as if it weren't a TransactWriteItems, okay?

play38:36

So that's basically because there's a two-phase commit,

play38:39

so it's literally twice as much work,

play38:40

therefore twice as much cost.

play38:42

So this is good when you have to make changes across items,

play38:45

or things like that.

play38:46

You say this needs to happen either together,

play38:50

or not at all.

play38:53

Side note, individual item updates are always transactional.

play38:57

This is only when you need to do updates across items.

play39:00

So if you have multiple different clients competing

play39:03

to update the same item, it is always serialized.

play39:06

This is for cross-item transactions.

play39:09

And here's an example.

play39:10

This is a game state, where Hammer57 is going to buy

play39:15

health with coins, and we need to make sure

play39:18

that if we deduct the coins, we add the health.

play39:22

You don't wanna do this halfway.

play39:24

So this is what would be sent over the wire essentially.

play39:26

You say update the Gamers table,

play39:29

where the GamerID is Hammer57.

play39:32

I wanna set the health to 100.

play39:36

All right, but also update the Gamers table

play39:38

where the partition key is Hammer57,

play39:43

and I want to say set the coins equal to coins minus 400.

play39:48

So subtract 400 coins, add 100 health,

play39:51

or set 100 health, not add,

play39:53

but there is a condition expression.

play39:54

Only do it if they have enough coins,

play39:57

which is a good way, good technique to remember.

play39:58

So if they have at least 400 coins,

play40:00

and that probably should've been

play40:01

a greater than, or equal, shouldn't it?

play40:03

If they have at least 400 coins,

play40:06

subtract the coins, and add the health.

play40:09

And now, you've seen the bug.

play40:10

If they have exactly 400 coins, they can't buy health.

play40:12

They'd need at least 401, due to the condition expression

play40:15

being a greater than instead of greater than equals.

play40:18

This is a good use case for transactions.

play40:21

Another thing to keep in mind,

play40:23

let's say you have found a case

play40:25

where you need to read all the items from the table

play40:27

as quickly as possible.

play40:28

You have a certain analytical style query

play40:31

that you just wanna drive.

play40:33

You wanna say, "Give me all the offices

play40:35

"from that Phone Tool example,

play40:36

"and I don't want it limited by country, or anything."

play40:38

I just wanna list of all the offices.

play40:40

Just scan it.

play40:42

One way to do it, you'd do a scan,

play40:44

and you retrieve it serially,

play40:46

but if you want it fast, you probably want a parallel scan.

play40:49

The way a parallel scan works is you specify

play40:51

on the scan request a total segment count that you want.

play40:55

So 10 threads, 100 threads, you name it.

play40:58

You say, "I want 100, and I am worker five,"

play41:02

and I will get allocated as worker five that 1% of the table

play41:07

and every other worker gets there 1% of the table.

play41:10

So here's all the data items.

play41:11

Remember, the 00 to FF in the key space.

play41:14

I say four segments here, so it's 0 through 3.

play41:18

My main thread creates four workers.

play41:21

Every worker says, "Hey, please do this request.

play41:23

"I want four segments, and I am Worker 0, 1, 2, 3,"

play41:27

and they can in parallel process the key space.

play41:31

They each get their own portion

play41:32

of the key space process through.

play41:34

With this, you can scan a table

play41:36

almost as quickly as you need, right?

play41:39

So don't think you can only scan with one thread.

play41:41

You can scan with N many threads.

play41:45

Something else that's important to know about DynamoDB is

play41:47

that you do have a stream capability.

play41:50

This live watches for events

play41:52

that happen on the DynamoDB table,

play41:55

and when there's a mutation event, it gets updated.

play41:57

And with that, you can invoke a Lambda,

play42:00

and make some choices.

play42:01

What do you wanna do when you notice that the table moves?

play42:04

Well, one way to, one common usage pattern for this is

play42:07

to update the table with some real-time aggregations.

play42:11

So you wanna keep a count of how many Xs a user has.

play42:15

Well, the code can just insert the item,

play42:17

and then the Lambda notices that you did the insert,

play42:19

and it will increment by one the number of items

play42:21

that are under there, so like, you know,

play42:22

how many bug reports are on this code base?

play42:26

Just add the bug report.

play42:27

Have the Lambda insert the count,

play42:29

so that later on, you can say,

play42:30

"Well, which items have the most bug reports,"

play42:32

and make that an efficient query.

play42:34

That's a good example.

play42:36

You can also feed OpenSearch.

play42:38

So hey, every time there's a change,

play42:40

put that change into OpenSearch,

play42:41

so that you can have a search,

play42:43

text search oriented version of the data inside DynamoDB,

play42:47

or you can, for example, feed it to Kinesis Firehose,

play42:49

which can write to S3 in Parquet format,

play42:51

and you can run Athena to do a deep analytical query

play42:55

against the data, without actually having

play42:57

to do an export in this case.

play42:59

You can keep it real-time updated based

play43:01

on the mutation events that are happening inside DynamoDB.

play43:05

So something to keep in mind,

play43:07

that you can watch for the events in DynamoDB,

play43:11

or anything else you wanna do.

play43:11

Email somebody.

play43:13

"Hey, we noticed that you just created an account, welcome,"

play43:15

and you can fire off that, put it into a queue

play43:18

based on a Lambda that notices when new users are added.

play43:21

And in fact, there's a feature added in November of 2021,

play43:26

which was that you can filter to have

play43:29

only certain items invoke the Lambda.

play43:31

It used to be the Lambda would be invoked,

play43:33

and the Lambda could decide if it cared.

play43:34

Now, you can have a pre-filter,

play43:36

where the Lambda won't even be invoked

play43:37

if the item doesn't match the constraints.

play43:39

So you can watch for just inserts of new users,

play43:42

and not any other kind of inserts,

play43:43

and have that particular Lambda be the one

play43:46

that fires off the welcome message.

play43:49

Something else to keep in mind as a feature is TTL,

play43:53

Time-To-Live.

play43:55

If you create an item in DynamoDB,

play43:57

you can create an attribute,

play43:59

and tell the database this attribute is my TTL attribute,

play44:03

and its seconds since epoch.

play44:06

Once the time has moved forward from that point,

play44:09

it is eligible for deletion.

play44:11

What's cool is that deletion happens in the background

play44:13

without incurring any WCU charges.

play44:16

So it's a way to freely delete the data in the database

play44:19

if it's naturally going to expire,

play44:21

like, say, it's a session.

play44:22

You might say this session is good for three days,

play44:25

and then after three days, just let the database delete it.

play44:28

It's not guaranteed to delete that second.

play44:30

It generally gets deleted as a background process

play44:33

within 48 hours of that second.

play44:35

And I mention it here with streams,

play44:37

because something that's a common pattern is

play44:39

to have streams watch for those deletes,

play44:41

and possibly export that data,

play44:42

say, to S3 for posterity.

play44:44

All right, it's in the database.

play44:46

Three days later, I'm gonna let it delete.

play44:48

When it deletes, I'll put a copy in S3

play44:50

just to remember that it once existed.

play44:52

And because it doesn't incur WCUs,

play44:55

this is a real-life example,

play44:56

where TTL was enabled on a table that was,

play44:59

you see the WCUs, very large table,

play45:02

but a lot of it was deletes.

play45:03

So as soon as the TTL went on,

play45:05

then the delete stopped consuming WCUs,

play45:08

and we were pleased that the customer could save

play45:09

quite a lot of cost using the TTL feature.

play45:13

All right, so here's what we've covered.

play45:16

We've come a long way.

play45:19

High SLA, DAX, global tables.

play45:23

I won't read it all.

play45:25

If you have questions, I'll stick around for questions

play45:28

at the end of this,

play45:29

but there is one more thing I wanted to cover,

play45:31

and that is a new feature introduced in November of 2021,

play45:35

DynamoDB Standard-Infrequent Access table type, table class,

play45:39

and this is a great way to reduce costs.

play45:43

So the reason this was introduced is that, you know,

play45:47

you, as you add more data to a DynamoDB table,

play45:50

the older data, it stays,

play45:53

but it maybe isn't as valuable to you,

play45:55

because you don't access it as much.

play45:58

Yet, the cost from a storage,

play46:00

a monthly per gigabyte storage, remains.

play46:03

So people were saying, you know,

play46:04

"Is there a way to keep the new data readily available,

play46:09

"but the old data available as well,

play46:10

"but maybe reduce my cost, so that I don't have to pay

play46:13

"as much on a per gigabyte basis for the data

play46:15

"that's a little bit more archive-oriented?"

play46:19

And you see this in a bunch of different situations.

play46:21

Social media, you've posted last year.

play46:24

You wanna keep it.

play46:25

It's great for your application if you don't have to,

play46:27

you know, what did people do?

play46:29

They would maybe export the old data to S3,

play46:31

but then the application would have to have a path

play46:34

where the new data would be available in DynamoDB,

play46:35

and the old data pulled out of S3 with a higher latency,

play46:39

or data analytics similarly.

play46:40

The recent data, commonly used,

play46:43

do they wanna pull the old data out of S3,

play46:45

or would you rather keep the old data in DynamoDB

play46:48

readily available at a lower cost?

play46:50

Or retail, you know,

play46:52

Amazon remembers all your old orders,

play46:54

and it's kind of a fun game to see

play46:56

who made their first amazon.com order, and what was it.

play47:00

And so, you can go into your own order history,

play47:03

and see back, for me, I think it was about '97, or '96,

play47:07

and like what was the first thing I bought?

play47:09

It was a book, of course.

play47:12

So what's the right way on the backend to keep this data?

play47:15

Do you want to export it?

play47:18

Most customers have told us, you know,

play47:20

they'd like to keep the data in DynamoDB,

play47:22

make it as quick to access old data as new data,

play47:26

but come up with a way where that old data

play47:27

could be more cost-effectively stored.

play47:30

And so, for that was introduced this feature,

play47:33

which is the Standard-Infrequent Access table class,

play47:35

same name as you see in S3,

play47:37

where a similar design pattern is available

play47:39

for objects that are infrequently accessed.

play47:42

It is 60% lower cost on a per gigabyte basis

play47:47

for the table than the standard.

play47:50

It's a good savings for large tables,

play47:52

and there's not a performance trade-off.

play47:54

It is just as fast to retrieve any particular item,

play47:56

to do a query.

play47:58

It's just as durable, just as available.

play48:01

You won't notice any different at the application level.

play48:03

It's only a billing-oriented change,

play48:05

and you as a developer,

play48:06

you don't have to make any code changes.

play48:08

You click in the console, you issue a CLI, do an SDK change.

play48:11

You just say, "Please change this table

play48:13

"from one to the other," and it will do it.

play48:16

The trade-off is that the storage costs are lower,

play48:19

but as with S3 Infrequent Access,

play48:21

the retrieval is a little bit elevated,

play48:23

25% more on a throughput cost.

play48:26

So therefore it's a mathematical game

play48:28

of is this table large enough

play48:30

where the 60% storage savings is going to benefit me?

play48:35

And so, majority of cases,

play48:36

we think the Standard table access is going to be

play48:38

the right choice.

play48:39

So when throughput is the dominant cost, keep Standard.

play48:43

But when the data is large,

play48:46

you may wanna look at the Standard-IA table class,

play48:48

because then you get a 60% reduction in the storage costs,

play48:52

and that can be an actual large reduction

play48:54

in the overall cost that you're having

play48:57

on this DynamoDB table.

play48:59

It is not on a per item basis.

play49:01

It is on a per table basis.

play49:03

This is a situation where you also might wanna think about

play49:07

what if I had maybe a table per month,

play49:10

and I converted the older months into Standard-IA,

play49:14

and kept the newer, the latest month,

play49:16

or the last three months, or something, in Standard?

play49:19

And so, if you have a model where you can create

play49:23

maybe a table per month,

play49:24

and your application can handle that,

play49:25

this is a way to save 60% on the storage cost of that,

play49:30

those older tables that are rarely accessed.

play49:36

To pick which one's right for your use case,

play49:39

we do have an open source application

play49:42

that you can run that looks at your history,

play49:43

and will give you some advice.

play49:45

But basically, what you need to do is this.

play49:47

You log in to the Cost and Usage Reports,

play49:51

or the AWS Cost Explorer,

play49:52

and look at your table cost structure.

play49:54

It's good to tag each table with its name,

play49:56

which lets you isolate to that particular table.

play49:59

And if the storage cost exceeds 50% of the throughput cost,

play50:04

Standard-IA will probably be cost-effective for you,

play50:08

and that's just mathematical.

play50:09

So it's not 50% of the overall cost.

play50:11

It's actually 50% of the throughput cost.

play50:13

So if the throughput cost is a dollar,

play50:16

and your storage cost is 50 cents,

play50:19

it's probably good to switch to Standard-IA.

play50:22

And that's because the storage is reduced by 60%,

play50:26

and the throughput is a 25% uplift,

play50:30

and you can get started today.

play50:31

It's just as easy as a button click.

play50:33

You can also turn it on for a day, and see what the cost is,

play50:38

and compare to the previous day.

play50:39

So if your traffic is the same on both days,

play50:41

experimentation is one way

play50:43

to just actually look at the cost between the two.

play50:46

And you can switch back and forth as well as needed.

play50:48

If you say, "You know what, I wanna switch back,"

play50:51

you can do that, too.

play50:52

And so, with that, I'll end this.

play50:53

Thank you so much for coming,

play50:54

and I will stick around for any questions.

Rate This

5.0 / 5 (0 votes)

Related Tags
DynamoDBDatabase ManagementNoSQLData StorageAWS ServicesPartitioningData RetrievalWebinarTech TalkCloud Computing