How do NoSQL databases work? Simply Explained!

Simply Explained
8 Dec 202007:38

Summary

TLDRNoSQL databases are gaining popularity for their scalability and flexibility, especially in handling large volumes of data. Unlike traditional relational databases which struggle with scaling due to their complex relational structures, NoSQL databases, such as DynamoDB, BigTable, and CosmosDB, offer both vertical and horizontal scaling. They function as key-value stores, simplifying data management and enabling efficient distribution across multiple servers. Despite their advantages, NoSQL databases have limitations in data retrieval and consistency, making them a complementary choice to relational databases for specific use cases.

Takeaways

  • 📈 NoSQL databases are popular for their ability to handle large volumes of data and high query loads, as seen in big companies.
  • 🔄 Traditional relational databases like MySQL and SQL Server are efficient for relational data but struggle with scaling due to the need to maintain data relationships.
  • 🔄 NoSQL databases excel in scalability, offering both vertical and horizontal scaling, compared to the vertical-only scaling of relational databases.
  • 🏢 The concept of scaling can be visualized as adding floors to a building for vertical scaling and constructing new buildings for horizontal scaling.
  • 🔑 NoSQL databases function as key-value stores, simplifying data storage without the overhead of maintaining relationships between data items.
  • 📦 Items in NoSQL databases are stored based on partitions determined by a hash function applied to the primary key, allowing for efficient distribution across servers.
  • 📈 Apple's use of a NoSQL database with 75,000 servers exemplifies the massive scalability of these databases.
  • 🗂 NoSQL databases are schemaless, allowing for flexible and evolving data structures without the constraints of a predefined schema.
  • 🔍 NoSQL databases have limitations in data retrieval, primarily allowing access via primary keys, which can be inefficient for certain queries.
  • 🔄 The eventual consistency model of NoSQL databases means that data may not immediately reflect writes due to replication delays across servers.
  • 🌐 Major cloud providers offer NoSQL databases like AWS DynamoDB, Google BigTable, and Azure CosmosDB, highlighting their importance in cloud infrastructure.
  • 🤔 The term 'NoSQL' can be interpreted as 'not only SQL', indicating some support for SQL queries, or 'non-relational', emphasizing the difference from traditional relational databases.

Q & A

  • What is the primary reason NoSQL databases have become popular?

    -NoSQL databases have become popular due to their ability to handle large volumes of data and run a high number of queries per second, which traditional relational databases struggle with.

  • Why do relational databases have difficulty scaling?

    -Relational databases have difficulty scaling because they need to maintain relationships between data, which is an intensive process requiring a lot of memory and compute power.

  • What is the difference between vertical and horizontal scaling in the context of databases?

    -Vertical scaling refers to adding more resources to an existing server, like adding floors to a building. Horizontal scaling refers to adding more servers to distribute the workload, similar to adding more buildings.

  • How do NoSQL databases eliminate the need for maintaining relationships between data?

    -NoSQL databases work as key-value stores, where each item stands on its own without the need for maintaining relationships, simplifying the data structure and allowing for better scalability.

  • What is a partition in the context of NoSQL databases?

    -In NoSQL databases, a partition refers to a part of the database that is managed by a single server, helping to distribute the workload across multiple servers.

  • How does a NoSQL database determine where to store an item?

    -A NoSQL database uses a hash function to convert an item's primary key into a number that falls within a certain range, which then determines the partition or server where the item will be stored.

  • What is a keyspace in the context of NoSQL databases?

    -A keyspace is the range of numbers generated by the hash function, which is used to determine the distribution of data across different partitions or servers in a NoSQL database.

  • How does the schemaless nature of NoSQL databases provide an advantage?

    -The schemaless nature allows items in the database to have different structures, providing flexibility and making it easier to adapt to evolving application and data structures without the risk of data loss.

  • What is the trade-off when using NoSQL databases for data retrieval?

    -NoSQL databases are optimized for retrieving data by primary key, making operations like finding all orders above a certain amount inefficient compared to relational databases that can handle complex queries more effectively.

  • What does 'eventually consistent' mean in the context of NoSQL databases?

    -Eventual consistency means that after writing a new item to the database, it might not immediately be available when read back due to the replication process across multiple servers, although this is typically resolved within milliseconds.

  • Why are both NoSQL and relational databases expected to coexist?

    -Both NoSQL and relational databases have their own strengths and weaknesses, and they are suited to different types of data and use cases, which is why they are expected to coexist in the foreseeable future.

  • Can you name some popular NoSQL databases provided by cloud providers?

    -AWS offers DynamoDB, Google Cloud offers BigTable, and Azure offers CosmosDB, which are all examples of NoSQL databases provided by major cloud providers.

  • What is the significance of the name 'NoSQL' and how can it be interpreted?

    -The name 'NoSQL' can be interpreted as 'not only SQL', indicating that some NoSQL databases have partial SQL query language understanding, and 'non-relational', highlighting their inability to easily store relational data.

Outlines

00:00

📈 Understanding NoSQL Databases

NoSQL databases have gained popularity among large companies for storing massive amounts of data and handling numerous queries efficiently. Unlike relational databases, NoSQL databases do not rely on complex relationships between data, allowing them to scale both vertically and horizontally. Relational databases, such as MySQL and SQL Server, struggle with scaling due to their need to maintain these relationships, which consumes significant memory and computing power. NoSQL databases, being key-value stores, simplify data storage and retrieval, enhancing scalability. This design allows for splitting workloads across multiple servers, each handling a portion of the database. This method, called partitioning, uses a hash function to distribute data evenly. NoSQL databases also offer flexibility with their schemaless structure, accommodating evolving data formats without requiring predefined schemas. Despite their advantages, NoSQL databases have limitations, such as inefficient data retrieval beyond primary keys and eventual consistency issues. However, they remain a powerful tool for scalable and flexible data management.

05:03

🔍 Comparing NoSQL and Relational Databases

NoSQL databases offer scalability and flexibility but have limitations compared to relational databases. While NoSQL excels in handling large-scale data and evolving structures, it struggles with complex queries and data consistency. Relational databases efficiently retrieve data using complex queries and maintain strong consistency. NoSQL databases can have eventual consistency issues due to data replication delays across partitions, although this is often resolved within milliseconds. Both NoSQL and relational databases have their strengths and weaknesses, making them suitable for different use cases. Cloud providers like AWS, Google Cloud, and Azure promote NoSQL for its scalability, evidenced by Amazon's use of DynamoDB handling 45 million requests per second during peak times. NoSQL databases like Cassandra, Scylla, CouchDB, and MongoDB offer scalable solutions for self-hosted environments. The term 'NoSQL' can mean 'not only SQL,' indicating some compatibility with SQL queries, or 'non-relational,' highlighting the lack of traditional relational data storage.

Mindmap

Keywords

💡NoSQL databases

NoSQL databases are a type of database that does not use traditional structured query language (SQL) for data storage and retrieval. They are known for their flexibility and scalability, which is why they are popular among big companies dealing with large volumes of data. In the video, NoSQL databases are contrasted with traditional relational databases, highlighting their ability to scale both vertically and horizontally, which is crucial for handling the immense data and query loads mentioned.

💡Scalability

Scalability refers to the capability of a system to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. The video emphasizes the superior scalability of NoSQL databases compared to relational databases, explaining that while relational databases can only scale vertically (by adding more power to the existing server), NoSQL databases can scale both vertically and horizontally (by adding more servers), which is likened to adding more floors to a building versus adding more buildings.

💡Relational databases

Relational databases are a type of database that stores data in a structured format, using rows and columns in tables that are linked by relationships. The video script discusses the limitations of these databases, such as their difficulty in scaling due to the need to maintain data relationships, which is resource-intensive. Examples of relational databases given in the script include MySQL, MariaDB, and SQL Server.

💡Key-value stores

Key-value stores are a type of data storage that pairs a unique key with a value, which can be a simple piece of data or a complex data structure like a JSON document. The video explains that NoSQL databases function as key-value stores, which simplifies the database structure and contributes to their scalability. The script uses the example of using a product's barcode as a key and the product name as the value.

💡Partitions

In the context of NoSQL databases, partitions refer to the division of the database into separate parts, each managed by a different server. The video script explains that this partitioning is a method to distribute the workload and enhance scalability. It is mentioned that Apple's NoSQL database consists of 75,000 servers, which manage different partitions of the database.

💡Primary key

A primary key is a unique identifier for a record in a database. In NoSQL databases, the primary key is used to determine the partition where an item will be stored. The video script describes how NoSQL databases use a hash function to convert the primary key into a number that helps decide the storage location, which is essential for the database's operation when split across multiple partitions.

💡Hash function

A hash function is a mathematical algorithm that converts an input (or 'message') into a fixed-size string of characters, which is typically used for indexing data in a hash table. In the video, the hash function is described as a method used by NoSQL databases to map the primary key of an item to a specific partition, which is crucial for the distribution and retrieval of data across the database.

💡Schemaless

Schemaless databases do not require a predefined structure for the data they store. This is in contrast to relational databases, which have a fixed schema that all data must conform to. The video script highlights the advantage of NoSQL databases being schemaless, allowing for flexibility and the ability to store items with different structures, which can be beneficial for applications with evolving data needs.

💡Eventual consistency

Eventual consistency is a property of data storage systems that guarantees that if no new updates are made to a given data item, all accesses will return the last updated value. The video script explains that NoSQL databases are eventually consistent, meaning that after writing a new item, it might not immediately be available for reading due to the replication process across different servers. However, the script also notes that most NoSQL databases offer options for stronger consistency if needed.

💡Cloud providers

Cloud providers are companies that offer various subscription-based services, infrastructure, and software over the internet. The video script mentions that cloud providers like AWS, Google Cloud, and Azure promote NoSQL databases due to their scalability. It provides examples of NoSQL databases offered by these providers, such as DynamoDB, BigTable, and CosmosDB, and highlights the impressive scalability demonstrated by Amazon's NoSQL database during Amazon Prime Day in 2019.

💡Cassandra

Cassandra is an open-source NoSQL database management system known for its ability to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. The video script mentions Cassandra as an example of a NoSQL database that can be run independently, developed by Facebook, and is known for its scalability and performance.

Highlights

NoSQL databases have become very popular, storing hundreds of petabytes of data and handling millions of queries per second.

Relational databases, such as MySQL and SQL Server, are efficient for storing relational data but struggle with scaling.

Relational databases scale vertically, adding more resources to a single server, while NoSQL databases scale both vertically and horizontally.

NoSQL databases eliminate costly relationships between data items, making them essentially key-value stores.

NoSQL databases use a unique key and value for each item, where the value can be a complex data structure like a JSON document.

Horizontal scaling in NoSQL databases means spreading data across multiple servers, each responsible for a part of the database.

Apple's NoSQL database spans 75,000 servers, illustrating the scalability of NoSQL systems.

NoSQL databases use hash functions to distribute data items across different servers based on their primary keys.

The concept of 'keyspace' in NoSQL databases allows for virtually unlimited scalability.

NoSQL databases are schemaless, meaning each item can have a different structure, which is advantageous for evolving applications.

Relational databases offer more flexible querying capabilities compared to NoSQL, which typically retrieves items by primary key.

NoSQL databases are eventually consistent, meaning there can be a slight delay before a newly written item is available across all mirrors.

Cloud providers like AWS, Google Cloud, and Azure promote NoSQL databases for their scalability; examples include DynamoDB, BigTable, and CosmosDB.

Amazon's NoSQL database handled a peak of 45 million requests per second during Prime Day in 2019.

NoSQL databases like Cassandra, Scylla, CouchDB, and MongoDB can be run independently.

The term 'NoSQL' can mean 'not only SQL' or 'non-relational', reflecting its versatility and differences from traditional SQL databases.

Transcripts

play00:00

NoSQL databases have become very popular.

play00:03

Big companies rely on them to store hundreds of petabytes of data and run millions of queries

play00:08

per second.

play00:09

But what is a NoSQL database?

play00:11

How does it work, and why does it scale so much better than traditional, relational databases?

play00:17

Let's start by quickly explaining the problem with relational databases like MySQL, MariaDB,

play00:22

SQL Server, and alike.

play00:25

These are built to store relational data as efficiently as possible.

play00:29

You can have a table for customers, orders, and products, linking together logically:

play00:34

customers place orders and orders contain products.

play00:38

This tight organization is great for managing your data, but it comes at a cost: relational

play00:43

databases have a hard time scaling.

play00:46

They have to maintain these relationships, and that's an intensive process, requiring

play00:50

a lot of memory and compute power.

play00:53

So for a while, you can keep upgrading your database server, but at some point, it won't

play00:58

be able to handle the load.

play01:00

In technical terms, we say that relational databases can scale vertically, but not horizontally,

play01:06

whereas NoSQL databases can scale both vertically and horizontally.

play01:10

You can compare this to a building: vertically scaling means adding more floors to an existing

play01:15

building, while horizontal scaling means adding more buildings.

play01:20

You intuitively understand that vertical scaling is only possible to a certain extend, while

play01:25

horizontal scaling is much more powerful.

play01:29

Why do NoSQL databases scale so well?

play01:31

Well, first of all, they do away with these costly relationships.

play01:35

In NoSQL, every item in the database stands on its own.

play01:40

This simple modification means that they're essentially key-value stores.

play01:44

Each item in the database only has two fields: a unique key and a value.

play01:50

For instance: when you want to store product information, you can use the product's bar

play01:54

code as the key and the product name as the value.

play01:58

This seems restrictive, but the value can be something like a JSON document containing

play02:02

more data, like price and description.

play02:05

This simpler design is why NoSQL databases scale better.

play02:10

If a single database server is not enough to store all your data or handle all the queries,

play02:15

you can split the workload across two or more servers.

play02:18

Each server will then be responsible for only a part of your database.

play02:23

To give an example: Apple runs a NoSQL database that consists of 75,000 servers.

play02:27

In NoSQL terms, these parts of your database are called partitions, and it brings up a

play02:35

question.

play02:36

If your database is split across potentially thousands of partitions, how do you know where

play02:41

an item is stored?

play02:43

That's where the primary key comes in.

play02:44

Remember, NoSQL databases are key-value stores, and the key determines on what partition an

play02:51

item will be stored.

play02:52

Behind-the-scenes, NoSQL databases use a hash function to convert each item's primary key

play02:58

into a number that falls into a fixed range.

play03:02

Say between 0 and 100.

play03:04

This hash value and the range is then used to determine where to store an item.

play03:09

If your database is small enough or doesn't get many requests, you can put everything

play03:14

on a single server.

play03:16

This one will then be responsible for the entire range.

play03:20

If that server is becoming overloaded, you can add a secondary server, which means that

play03:24

the range will be split in half.

play03:26

Server 1 will be responsible for all items with a hash between 0 and 50, while server

play03:31

2 will store everything between 50 and 100.

play03:35

Theoretically, you've now doubled your database capacity: both in terms of storage and in

play03:40

the number of queries you can execute.

play03:43

This range is also called a keyspace.

play03:46

It's a simple system that solves two problems: where to store new items and where to find

play03:51

existing ones.

play03:53

All you have to do is calculate the hash of an item's key and keep track of which server

play03:57

is responsible for which part of the keyspace.

play04:00

Now, in this example, the range of 0 to 100 is a bit small.

play04:05

It would only allow you to split up your database into 100 pieces at most.

play04:10

So, real NoSQL databases have much bigger key spaces, allowing them to scale almost

play04:15

without restrictions.

play04:18

Besides great scalability, NoSQL is schemaless, which means that items in the database don't

play04:24

need to have the same structure.

play04:26

Each one can be completely different.

play04:29

In a relational database, you have to define your table's structure, and then each item

play04:33

must conform to it.

play04:35

Changing this structure isn't straightforward and could even lead to loss of data.

play04:40

Not having a schema can be a big advantage if your application and data structure is

play04:44

constantly evolving.

play04:47

At this point, it's clear that NoSQL databases have certain advantages over relational ones.

play04:52

But that's not to say that relational databases are obsolete, far from it.

play04:57

NoSQL is more limited in the way you can retrieve your data, only allowing you to retrieve items

play05:03

by their primary key.

play05:05

Finding orders by ID is no problem, but finding all orders above a certain amount would be

play05:10

very inefficient.

play05:12

Relational databases, on the other hand, have no trouble with this.

play05:16

There are workarounds for this issue, but only if you know how you're going to access

play05:21

your data.

play05:22

And that might not always be the case.

play05:24

Another downside is that NoSQL databases are eventually consistent.

play05:29

When you write a new item to the database and try to read it back straight away, it

play05:33

might not be returned.

play05:35

As I've explained, NoSQL splits your database into partitions.

play05:39

But each partition is mirrored across multiple servers.

play05:43

That way, a server can go down without much impact.

play05:47

When you write a new item to the database, one of these mirrors will store the new item

play05:51

and then copy it to the others in the background.

play05:54

This process might take a little bit of time.

play05:57

So when you read that item, the NoSQL database might try to read it from a mirror that doesn't

play06:02

have it yet.

play06:04

This is not a big issue in practice because data is replicated in just a few milliseconds.

play06:09

And if you want consistency, most NoSQL databases do have that option.

play06:14

So, in summary: both NoSQL and relational databases will be around for the foreseeable

play06:20

future.

play06:21

Each with their own strengths and weaknesses.

play06:24

So now you know how NoSQL works, let's look at a few examples.

play06:28

Cloud providers heavily promote NoSQL because they can scale it more easily.

play06:33

AWS has DynamoDB, Google Cloud has BigTable, and Azure has CosmosDB.

play06:39

To give you another example of their scalability: during Amazon Prime Day in 2019, Amazon's

play06:45

NoSQL database peaked at 45 million requests per second.

play06:50

That's mind-boggling!

play06:52

But you can also run NoSQL databases yourself with software like Cassandra (which was developed

play06:58

by Facebook), Scylla, CouchDB, MongoDB, and more.

play07:01

Before ending this video, let's quickly talk about the name "NoSQL."

play07:06

It's a bit confusing as it can be interpreted in two ways.

play07:09

First up: "NoSQL" can mean "not only SQL," pointing to the fact that some NoSQL databases

play07:15

partially understand the SQL query language, on top of their own query capabilities.

play07:21

And secondly, it's often called "NoSQL" in the sense of "non-relational" because it can't

play07:27

easily store relational data.

play07:29

So that was it for this video.

play07:31

Please subscribe if you learned something from it, and I hope to see you in the next

play07:34

video!

Rate This

5.0 / 5 (0 votes)

Related Tags
NoSQLRelationalScalabilityDatabasesBig DataHorizontal ScalingVertical ScalingKey-Value StoreSchemalessConsistency