How do NoSQL databases work? Simply Explained!
Summary
TLDRNoSQL databases are gaining popularity for their scalability and flexibility, especially in handling large volumes of data. Unlike traditional relational databases which struggle with scaling due to their complex relational structures, NoSQL databases, such as DynamoDB, BigTable, and CosmosDB, offer both vertical and horizontal scaling. They function as key-value stores, simplifying data management and enabling efficient distribution across multiple servers. Despite their advantages, NoSQL databases have limitations in data retrieval and consistency, making them a complementary choice to relational databases for specific use cases.
Takeaways
- 📈 NoSQL databases are popular for their ability to handle large volumes of data and high query loads, as seen in big companies.
- 🔄 Traditional relational databases like MySQL and SQL Server are efficient for relational data but struggle with scaling due to the need to maintain data relationships.
- 🔄 NoSQL databases excel in scalability, offering both vertical and horizontal scaling, compared to the vertical-only scaling of relational databases.
- 🏢 The concept of scaling can be visualized as adding floors to a building for vertical scaling and constructing new buildings for horizontal scaling.
- 🔑 NoSQL databases function as key-value stores, simplifying data storage without the overhead of maintaining relationships between data items.
- 📦 Items in NoSQL databases are stored based on partitions determined by a hash function applied to the primary key, allowing for efficient distribution across servers.
- 📈 Apple's use of a NoSQL database with 75,000 servers exemplifies the massive scalability of these databases.
- 🗂 NoSQL databases are schemaless, allowing for flexible and evolving data structures without the constraints of a predefined schema.
- 🔍 NoSQL databases have limitations in data retrieval, primarily allowing access via primary keys, which can be inefficient for certain queries.
- 🔄 The eventual consistency model of NoSQL databases means that data may not immediately reflect writes due to replication delays across servers.
- 🌐 Major cloud providers offer NoSQL databases like AWS DynamoDB, Google BigTable, and Azure CosmosDB, highlighting their importance in cloud infrastructure.
- 🤔 The term 'NoSQL' can be interpreted as 'not only SQL', indicating some support for SQL queries, or 'non-relational', emphasizing the difference from traditional relational databases.
Q & A
What is the primary reason NoSQL databases have become popular?
-NoSQL databases have become popular due to their ability to handle large volumes of data and run a high number of queries per second, which traditional relational databases struggle with.
Why do relational databases have difficulty scaling?
-Relational databases have difficulty scaling because they need to maintain relationships between data, which is an intensive process requiring a lot of memory and compute power.
What is the difference between vertical and horizontal scaling in the context of databases?
-Vertical scaling refers to adding more resources to an existing server, like adding floors to a building. Horizontal scaling refers to adding more servers to distribute the workload, similar to adding more buildings.
How do NoSQL databases eliminate the need for maintaining relationships between data?
-NoSQL databases work as key-value stores, where each item stands on its own without the need for maintaining relationships, simplifying the data structure and allowing for better scalability.
What is a partition in the context of NoSQL databases?
-In NoSQL databases, a partition refers to a part of the database that is managed by a single server, helping to distribute the workload across multiple servers.
How does a NoSQL database determine where to store an item?
-A NoSQL database uses a hash function to convert an item's primary key into a number that falls within a certain range, which then determines the partition or server where the item will be stored.
What is a keyspace in the context of NoSQL databases?
-A keyspace is the range of numbers generated by the hash function, which is used to determine the distribution of data across different partitions or servers in a NoSQL database.
How does the schemaless nature of NoSQL databases provide an advantage?
-The schemaless nature allows items in the database to have different structures, providing flexibility and making it easier to adapt to evolving application and data structures without the risk of data loss.
What is the trade-off when using NoSQL databases for data retrieval?
-NoSQL databases are optimized for retrieving data by primary key, making operations like finding all orders above a certain amount inefficient compared to relational databases that can handle complex queries more effectively.
What does 'eventually consistent' mean in the context of NoSQL databases?
-Eventual consistency means that after writing a new item to the database, it might not immediately be available when read back due to the replication process across multiple servers, although this is typically resolved within milliseconds.
Why are both NoSQL and relational databases expected to coexist?
-Both NoSQL and relational databases have their own strengths and weaknesses, and they are suited to different types of data and use cases, which is why they are expected to coexist in the foreseeable future.
Can you name some popular NoSQL databases provided by cloud providers?
-AWS offers DynamoDB, Google Cloud offers BigTable, and Azure offers CosmosDB, which are all examples of NoSQL databases provided by major cloud providers.
What is the significance of the name 'NoSQL' and how can it be interpreted?
-The name 'NoSQL' can be interpreted as 'not only SQL', indicating that some NoSQL databases have partial SQL query language understanding, and 'non-relational', highlighting their inability to easily store relational data.
Outlines
📈 Understanding NoSQL Databases
NoSQL databases have gained popularity among large companies for storing massive amounts of data and handling numerous queries efficiently. Unlike relational databases, NoSQL databases do not rely on complex relationships between data, allowing them to scale both vertically and horizontally. Relational databases, such as MySQL and SQL Server, struggle with scaling due to their need to maintain these relationships, which consumes significant memory and computing power. NoSQL databases, being key-value stores, simplify data storage and retrieval, enhancing scalability. This design allows for splitting workloads across multiple servers, each handling a portion of the database. This method, called partitioning, uses a hash function to distribute data evenly. NoSQL databases also offer flexibility with their schemaless structure, accommodating evolving data formats without requiring predefined schemas. Despite their advantages, NoSQL databases have limitations, such as inefficient data retrieval beyond primary keys and eventual consistency issues. However, they remain a powerful tool for scalable and flexible data management.
🔍 Comparing NoSQL and Relational Databases
NoSQL databases offer scalability and flexibility but have limitations compared to relational databases. While NoSQL excels in handling large-scale data and evolving structures, it struggles with complex queries and data consistency. Relational databases efficiently retrieve data using complex queries and maintain strong consistency. NoSQL databases can have eventual consistency issues due to data replication delays across partitions, although this is often resolved within milliseconds. Both NoSQL and relational databases have their strengths and weaknesses, making them suitable for different use cases. Cloud providers like AWS, Google Cloud, and Azure promote NoSQL for its scalability, evidenced by Amazon's use of DynamoDB handling 45 million requests per second during peak times. NoSQL databases like Cassandra, Scylla, CouchDB, and MongoDB offer scalable solutions for self-hosted environments. The term 'NoSQL' can mean 'not only SQL,' indicating some compatibility with SQL queries, or 'non-relational,' highlighting the lack of traditional relational data storage.
Mindmap
Keywords
💡NoSQL databases
💡Scalability
💡Relational databases
💡Key-value stores
💡Partitions
💡Primary key
💡Hash function
💡Schemaless
💡Eventual consistency
💡Cloud providers
💡Cassandra
Highlights
NoSQL databases have become very popular, storing hundreds of petabytes of data and handling millions of queries per second.
Relational databases, such as MySQL and SQL Server, are efficient for storing relational data but struggle with scaling.
Relational databases scale vertically, adding more resources to a single server, while NoSQL databases scale both vertically and horizontally.
NoSQL databases eliminate costly relationships between data items, making them essentially key-value stores.
NoSQL databases use a unique key and value for each item, where the value can be a complex data structure like a JSON document.
Horizontal scaling in NoSQL databases means spreading data across multiple servers, each responsible for a part of the database.
Apple's NoSQL database spans 75,000 servers, illustrating the scalability of NoSQL systems.
NoSQL databases use hash functions to distribute data items across different servers based on their primary keys.
The concept of 'keyspace' in NoSQL databases allows for virtually unlimited scalability.
NoSQL databases are schemaless, meaning each item can have a different structure, which is advantageous for evolving applications.
Relational databases offer more flexible querying capabilities compared to NoSQL, which typically retrieves items by primary key.
NoSQL databases are eventually consistent, meaning there can be a slight delay before a newly written item is available across all mirrors.
Cloud providers like AWS, Google Cloud, and Azure promote NoSQL databases for their scalability; examples include DynamoDB, BigTable, and CosmosDB.
Amazon's NoSQL database handled a peak of 45 million requests per second during Prime Day in 2019.
NoSQL databases like Cassandra, Scylla, CouchDB, and MongoDB can be run independently.
The term 'NoSQL' can mean 'not only SQL' or 'non-relational', reflecting its versatility and differences from traditional SQL databases.
Transcripts
NoSQL databases have become very popular.
Big companies rely on them to store hundreds of petabytes of data and run millions of queries
per second.
But what is a NoSQL database?
How does it work, and why does it scale so much better than traditional, relational databases?
Let's start by quickly explaining the problem with relational databases like MySQL, MariaDB,
SQL Server, and alike.
These are built to store relational data as efficiently as possible.
You can have a table for customers, orders, and products, linking together logically:
customers place orders and orders contain products.
This tight organization is great for managing your data, but it comes at a cost: relational
databases have a hard time scaling.
They have to maintain these relationships, and that's an intensive process, requiring
a lot of memory and compute power.
So for a while, you can keep upgrading your database server, but at some point, it won't
be able to handle the load.
In technical terms, we say that relational databases can scale vertically, but not horizontally,
whereas NoSQL databases can scale both vertically and horizontally.
You can compare this to a building: vertically scaling means adding more floors to an existing
building, while horizontal scaling means adding more buildings.
You intuitively understand that vertical scaling is only possible to a certain extend, while
horizontal scaling is much more powerful.
Why do NoSQL databases scale so well?
Well, first of all, they do away with these costly relationships.
In NoSQL, every item in the database stands on its own.
This simple modification means that they're essentially key-value stores.
Each item in the database only has two fields: a unique key and a value.
For instance: when you want to store product information, you can use the product's bar
code as the key and the product name as the value.
This seems restrictive, but the value can be something like a JSON document containing
more data, like price and description.
This simpler design is why NoSQL databases scale better.
If a single database server is not enough to store all your data or handle all the queries,
you can split the workload across two or more servers.
Each server will then be responsible for only a part of your database.
To give an example: Apple runs a NoSQL database that consists of 75,000 servers.
In NoSQL terms, these parts of your database are called partitions, and it brings up a
question.
If your database is split across potentially thousands of partitions, how do you know where
an item is stored?
That's where the primary key comes in.
Remember, NoSQL databases are key-value stores, and the key determines on what partition an
item will be stored.
Behind-the-scenes, NoSQL databases use a hash function to convert each item's primary key
into a number that falls into a fixed range.
Say between 0 and 100.
This hash value and the range is then used to determine where to store an item.
If your database is small enough or doesn't get many requests, you can put everything
on a single server.
This one will then be responsible for the entire range.
If that server is becoming overloaded, you can add a secondary server, which means that
the range will be split in half.
Server 1 will be responsible for all items with a hash between 0 and 50, while server
2 will store everything between 50 and 100.
Theoretically, you've now doubled your database capacity: both in terms of storage and in
the number of queries you can execute.
This range is also called a keyspace.
It's a simple system that solves two problems: where to store new items and where to find
existing ones.
All you have to do is calculate the hash of an item's key and keep track of which server
is responsible for which part of the keyspace.
Now, in this example, the range of 0 to 100 is a bit small.
It would only allow you to split up your database into 100 pieces at most.
So, real NoSQL databases have much bigger key spaces, allowing them to scale almost
without restrictions.
Besides great scalability, NoSQL is schemaless, which means that items in the database don't
need to have the same structure.
Each one can be completely different.
In a relational database, you have to define your table's structure, and then each item
must conform to it.
Changing this structure isn't straightforward and could even lead to loss of data.
Not having a schema can be a big advantage if your application and data structure is
constantly evolving.
At this point, it's clear that NoSQL databases have certain advantages over relational ones.
But that's not to say that relational databases are obsolete, far from it.
NoSQL is more limited in the way you can retrieve your data, only allowing you to retrieve items
by their primary key.
Finding orders by ID is no problem, but finding all orders above a certain amount would be
very inefficient.
Relational databases, on the other hand, have no trouble with this.
There are workarounds for this issue, but only if you know how you're going to access
your data.
And that might not always be the case.
Another downside is that NoSQL databases are eventually consistent.
When you write a new item to the database and try to read it back straight away, it
might not be returned.
As I've explained, NoSQL splits your database into partitions.
But each partition is mirrored across multiple servers.
That way, a server can go down without much impact.
When you write a new item to the database, one of these mirrors will store the new item
and then copy it to the others in the background.
This process might take a little bit of time.
So when you read that item, the NoSQL database might try to read it from a mirror that doesn't
have it yet.
This is not a big issue in practice because data is replicated in just a few milliseconds.
And if you want consistency, most NoSQL databases do have that option.
So, in summary: both NoSQL and relational databases will be around for the foreseeable
future.
Each with their own strengths and weaknesses.
So now you know how NoSQL works, let's look at a few examples.
Cloud providers heavily promote NoSQL because they can scale it more easily.
AWS has DynamoDB, Google Cloud has BigTable, and Azure has CosmosDB.
To give you another example of their scalability: during Amazon Prime Day in 2019, Amazon's
NoSQL database peaked at 45 million requests per second.
That's mind-boggling!
But you can also run NoSQL databases yourself with software like Cassandra (which was developed
by Facebook), Scylla, CouchDB, MongoDB, and more.
Before ending this video, let's quickly talk about the name "NoSQL."
It's a bit confusing as it can be interpreted in two ways.
First up: "NoSQL" can mean "not only SQL," pointing to the fact that some NoSQL databases
partially understand the SQL query language, on top of their own query capabilities.
And secondly, it's often called "NoSQL" in the sense of "non-relational" because it can't
easily store relational data.
So that was it for this video.
Please subscribe if you learned something from it, and I hope to see you in the next
video!
Weitere verwandte Videos ansehen
![](https://i.ytimg.com/vi/9mdadNspP_M/hq720.jpg)
Which Database Model to Choose?
![](https://i.ytimg.com/vi/j09EQ-xlh88/hq720.jpg)
Learn What is Database | Types of Database | DBMS
![](https://i.ytimg.com/vi/ym0cXSKZYnw/hq720.jpg?v=63351df9)
Types of Databases | Criteria to choose the best database in the System Design Interview
![](https://i.ytimg.com/vi/nThMEdgYUBA/hq720.jpg?sqp=-oaymwEmCIAKENAF8quKqQMa8AEB-AH-CYAC0AWKAgwIABABGGUgZShlMA8=&rs=AOn4CLBR7HpETuZnlCsWI0ziYGs1EX3tkQ)
Data management concepts
![](https://i.ytimg.com/vi/eQ3eNd5WbH8/hq720.jpg)
How indexes work in Distributed Databases, their trade-offs, and challenges
![](https://i.ytimg.com/vi/6GebEqt6Ynk/hq720.jpg)
Choosing a Database for Systems Design: All you need to know in one video
5.0 / 5 (0 votes)