Database Design Tips | Choosing the Best Database in a System Design Interview
Summary
TLDRThis video script delves into the critical role of database selection in system design, emphasizing its impact on scalability and performance. It outlines common use cases, such as caching with Redis, file storage with Amazon S3, text search with Elastic Search or Solr, and time series data with InfluxDB. The script guides through choosing between SQL and NoSQL databases based on data structure, query patterns, and scale, highlighting the importance of ACID properties for transactional systems and suggesting combinations of databases to meet diverse requirements.
Takeaways
- 🔍 The choice of database is crucial for system scalability and is influenced by non-functional requirements (NFRs).
- 📚 Databases do not impact functional requirements but can affect performance based on query patterns and data structures.
- 📈 The suitability of a database depends on factors such as data structure, query patterns, and the scale of data to be handled.
- 💾 Caching solutions like Redis, Memcached, etcd, and Hazelcast are essential for reducing database load and improving latency.
- 🖼️ For file storage, particularly for images and videos, Blob Storage solutions like Amazon S3 are commonly used.
- 🌐 Content Delivery Networks (CDNs) complement Blob Storage by distributing content geographically for faster access.
- 🔎 Text Search Engines like Elastic Search and Solr, built on Apache Lucene, are used for implementing search capabilities with support for fuzzy search.
- 📊 Time Series Databases such as InfluxDB and openTSDB are optimized for sequential data writes and time-range queries, ideal for metrics tracking.
- 📊 Data Warehouses like Hadoop are used for offline reporting and analytics by aggregating data from various transactional systems.
- 🔑 The decision between relational and non-relational databases often hinges on the structure of the data and the need for ACID transactions.
- 🔄 For ever-increasing data with a finite number of query types, columnar databases like Cassandra and HBase are recommended.
Q & A
What is the primary factor that determines the choice of database in a system design?
-The primary factor that determines the choice of database in a system design is the non-functional requirements (NFRs) such as query patterns, data structure, and scale to handle.
How do databases impact the functional requirements of a system?
-Databases generally do not impact the functional requirements of a system. Any database can be used to satisfy the functional requirements.
What are the three main factors that influence the choice of database?
-The three main factors that influence the choice of database are the structure of the data, the query pattern, and the amount of scale that needs to be handled.
Why is caching important in system design?
-Caching is important in system design to reduce the number of times a database is queried, to improve response times, and to handle high-latency remote calls by storing responses locally.
What are some common caching solutions mentioned in the script?
-Some common caching solutions mentioned in the script are Redis, Memcached, etcd, and Hazelcast.
How does caching work in terms of key-value stores?
-In caching, the key is typically the query parameter or request parameter, and the value is the response expected from the system. This key-value pair is stored in key-value stores.
What is Blob Storage and why is it used?
-Blob Storage is used to store large binary objects like images and videos. It is not a database but a data store where files are served as they are, without querying.
Why is Amazon S3 a popular choice for Blob Storage?
-Amazon S3 is a popular choice for Blob Storage because it is cost-effective and widely used by many companies, making it a reliable and efficient solution for storing images and videos.
What is the purpose of a Content Delivery Network (CDN) in the context of Blob Storage?
-A CDN is used to distribute the same image or video geographically across various locations, allowing users to access the content faster by querying servers closer to their location.
What is a Text Search Engine and why is it used?
-A Text Search Engine is used to provide text searching capabilities on textual data, such as product titles and descriptions. It supports fuzzy search, which helps in handling misspellings and provides a better user experience.
What are some common implementations of Text Search Engines?
-Some common implementations of Text Search Engines are Elastic Search and Solr, both of which are built on top of Apache Lucene.
Why are Time Series Databases used and what are some examples?
-Time Series Databases are used to store and manage time-stamped data, such as application metrics. They are optimized for sequential updates and bulk read queries. Examples include InfluxDB and openTSDB.
What is the role of a Data Warehouse in system design?
-A Data Warehouse is used for storing large amounts of data from various transactional systems and providing querying capabilities for offline reporting and analytics.
How does the choice between a relational and non-relational database depend on the data structure?
-If the data is structured and can be easily modeled in tables with rows and columns, a relational database is suitable. For unstructured data or data with complex queries, a non-relational database like a document DB is more appropriate.
What are some scenarios where a combination of databases is used?
-In real-world scenarios like an e-commerce platform, a combination of databases is often used. For example, an RDBMS can be used for inventory management, while Cassandra can be used for storing historical order data. Document DBs can be used for complex querying needs.
What are some common providers of relational databases?
-Some common providers of relational databases include MySQL, Oracle, SQL Server, and Postgres.
What are the characteristics of ever-increasing data and why is a columnar DB like Cassandra suitable for this?
-Ever-increasing data refers to data that grows at a rate that is more than linear, such as location pings from drivers in a service like Uber. Columnar DBs like Cassandra are suitable because they handle large volumes of data and support high write and read throughput.
Outlines
🗂️ Database Choice in System Design
The paragraph discusses the importance of database selection in system design, particularly for scaling. It emphasizes that while databases don't affect functional requirements, they significantly impact non-functional requirements (NFRs) such as query patterns, data structure, and scalability. The choice of database is influenced by data structure, query patterns, and scale. The video will cover common use cases and potential database solutions, including caching, file storage, text search engines, and time series databases. It also touches on SQL-NoSQL comparisons and the combination of databases for solving real-world problems.
🔍 Caching and File Storage Solutions
This paragraph focuses on caching solutions and their importance in system design to reduce database load and latency. It explains the concept of caching with keys and values, where keys are query parameters and values are the expected responses. Popular caching solutions like Redis, Memcached, etcd, and Hazelcast are mentioned, with Redis highlighted as a battle-tested and widely used solution. The paragraph also addresses file storage options, particularly for images and videos, recommending Amazon S3 for Blob Storage and Content Delivery Networks (CDN) for geographical distribution.
📚 Text Search Engines and Time Series Databases
The paragraph introduces text search engines, necessary for capabilities like searching product titles and descriptions, and mentions Elastic Search and Solr as common implementations built on Apache Lucene. It also covers fuzzy search, which allows for misspelled queries to return relevant results based on edit distance. Time series databases are then discussed as an extension of relational databases, optimized for sequential data updates and bulk read queries over time ranges, with InfluxDB and openTSDB as examples.
📊 Data Warehousing and Analytical Storage
This section delves into data warehousing for analytical purposes, such as transactional analytics in companies like Amazon or Uber. It contrasts transactional systems with data warehouses, which are used for offline reporting and require large databases to accommodate extensive data dumping and querying. Hadoop is suggested as a suitable solution for such use cases. The paragraph also explores the decision-making process for choosing between relational and non-relational databases based on data structure, transactional guarantees, and the nature of queries.
📈 Deciding Database Types for Varying Data Needs
The paragraph guides through the decision-making process for selecting the appropriate database type based on the data's structure and query requirements. It outlines scenarios for using relational databases when structured data and ACID transactions are needed, like in payment or inventory systems. For unstructured data with complex queries, document databases like MongoDB or Couchbase are recommended. The paragraph also discusses columnar databases for ever-increasing data with finite query types, citing Cassandra and HBase as stable options.
🛒 Combining Databases for Complex E-commerce Systems
The final paragraph illustrates the use of multiple databases in complex systems, such as an e-commerce platform like Amazon. It suggests using a combination of RDBMS for recent orders and Cassandra for permanent storage, leveraging the strengths of both for inventory management and data scalability. The paragraph also discusses using document databases for complex querying needs, like finding users who bought specific products, and emphasizes the importance of combining different databases to meet both functional and non-functional requirements in real-world scenarios.
Mindmap
Keywords
💡Database
💡Non-functional requirements (NFRs)
💡Caching Solutions
💡File Storage Options
💡Text Search Engine
💡Fuzzy Search
💡Time Series Database
💡Data Warehouse
💡Relational Database
💡NoSQL
💡Columnar DB
Highlights
The choice of database significantly impacts the scalability of a system design, particularly in relation to non-functional requirements.
Databases do not generally affect functional requirements, allowing any database to be used to meet these needs.
The structure of data, query patterns, and scale requirements are key factors in choosing a database.
Caching solutions like Redis, Memcached, etcd, and Hazelcast are essential for reducing database load and improving latency.
Redis is a popular and stable caching solution often used in system designs.
File Storage Options, such as Amazon S3, are crucial for storing images and videos in systems like Amazon or Netflix.
Content Delivery Networks (CDNs) complement Blob Storages like Amazon S3 by geographically distributing content for faster access.
Text Search Engines like Elastic Search and Solr, built on Apache Lucene, are vital for implementing search functionalities in systems.
Fuzzy search capabilities in search engines allow for user-friendly search results even with misspelled queries.
Time Series Databases, such as InfluxDB and openTSDB, are optimized for sequential data updates and bulk read queries in metrics monitoring systems.
Data Warehouses, like Hadoop, are used for offline reporting and analytics by aggregating data from various transactional systems.
The decision between relational and non-relational databases often hinges on the data structure and the need for transactional guarantees.
Relational databases like MySQL, Oracle, and SQL Server are suitable for structured data that requires ACID properties.
Document databases, such as MongoDB and Couchbase, are ideal for unstructured data and complex queries.
Columnar databases like Cassandra and HBase are best for handling ever-increasing data with a limited number of query types.
In real-world scenarios, a combination of databases is often necessary to meet both functional and non-functional requirements.
The transcript provides a cheat sheet for database selection based on different use cases, which can be a helpful guide in system design interviews.
Transcripts
Hi everyone. Let's talk about databases now. So in your system design interview
how good your design is and how well it can scale depends very much on the
choice of database that you've used. Now databases generally do not impact your
functional requirements, so whatever your functional requirement is you can
satisfy that by using any database that you want. But normally the
non-functional requirements are the ones which are impacted by the choice of
database. So let's just say you have certain query patterns, or certain kind
of a data structure or certain kind of a scale to handle, there are a different
set of databases optimized for different kinds of such things, right. So based on
the choice of database it would impact how well your design can scale up to the
requirements that are given as part of your non-functional requirements (NFRs) so in
this video what we'll do is we'll go over some very common use cases that
normally you might come across in your System Design Interview and then look at
some of the potential solutions or possible set of databases that you can
use to handle those scenarios.
Now normally the choice of database depends
on a couple of factors. The very first thing is the structure of the data that
you have. Whether it's a very structured data or a totally non structured data
that is one of the factors. The other factor that impacts the choice of
database is the query pattern that you have. And the next obvious one is the
amount of scale that you need to handle. So all these three factors would make a
difference in to the choice of database that you want. Now in this video we'll
look at some common obvious ones in the beginning and then towards the end we'll
look at the SQL-NoSQL comparison and a combination of certain kinds of
databases that can help you solve a real-world problem.
Let's talk about
Caching Solutions. So whichever system you are designing in an interview, you would
definitely have to use some caching solution there. Now there are a lot of
use cases for caching. Let's say you are querying a database and you do not want
to query the database a lot of times, you could cache the value in a cache, right.
Alternatively, if you are making a remote call to a different service and that is
having a high enough latency, you might want to cache the response of that
system locally at your end in a caching solution
And there could be a lot of use-cases for caching. Now generally
the way caching works is you have a key and you have a value. So key
normally is whatever your where clause is in the query or whatever your query param
or request params are when you're making an API call
and value is basically the response that
you are expecting from the other system, right. So all of these kind of values
would be stored in normally key value stores. Now very common used solutions
are "Redis". You could also use "Memcached", you could use "etcd". "Hazelcast" is
picking up a lot these days. So you could use any of them. Normally in the
videos that I make I tend to use Redis because that is a very battle-tested
solution and it is used by all the big companies in the world and it is fairly
stable. So in whatever the System Design you have, where you have to use a caching
solution, you could use a Redis. I'll just note it down so that you can refer it
Redis is the solution for caching. Now again this does not mean that you cannot
use anything else, you could use any similar solution and that would work just fine.
Next let's look at some File Storage Options. So let's just say you
are designing a system like Amazon where you are having various products that you
are selling. Now the sellers would be uploading product images, maybe products
videos, right. You need a data store to store those
images and videos. Similarly you could be potentially building a system like
Netflix which has videos all together. And you need a storage to support videos
So wherever you have a image/video kind of a thing, there we'll use something
called as a Blob Storage. Now these are not really databases. Databases are
fundamentally meant for things where you can query on. Now a file is not
something which you normally query on. You will just serve it as it is, right.
So then is when Blob Storages come in. Now there are a lot of providers for that
one of the most common ones and the and
a fairly cheaper (Cost Effective) one is Amazon S3 and it
is a fairly good system used by a lot of company, so whenever you have a
requirement of storing images/videos and anything of that sort, you could use
Amazon S3 as a data store. Now along with S3 you might want to use something
called as a Content Delivery Network which is often called as a CDN.
Now CDN is generally used for distributing the same image
geographically in a lot of locations. So let's take a simple example that you
have a product image that is stored in Amazon S3 as a primary data source and
there are a lot of people coming from throughout the globe who are accessing
that product. So you might want to distribute that image into various
servers across the globe so that individual people can query them in a
much faster way as compared to querying an S3 which is probably located in a
couple of locations. So for all Blob kind of content you would then be
using something like a for all blob you would be using something like the S3
plus a content delivery network and that should be fairly fine on all your
interview questions. Now the next very common use case that you might encounter
is for example if you're building a product like Amazon and you need to
provide text searching capabilities on various products. So the seller has uploaded
a product with a particular title some description and you want users to search
that, right. Now that search thing would be provided on a text of title and on
the text of description. Very similar use cases would be when you have a product
like Netflix to build and you want to... give the option for the users to
search or movie name, movie titles, genres, maybe cast and crew names and all of that.
Or alternatively you might be designing something like an uber or
Google Maps kind of a thing where you want to provide text searching
capability with support for fuzzy search. So for all of these kind of use cases
you would be using something called as a Text Search Engine. Now a very common
implementation of a Text Search Engine is provided by Elastic Search and Solr,
and both of them are built on top of something called an Apache Lucene
now Lucene fundamentally provides these text searching capabilities and that is
then being used by both of these products Elastic Search (ES) and Solr. So the next
use case and quickly write it down is your text search which is Elastic Search
and Solr. Now one more thing that they support is something called as a Fuzzy Search.
So what is that is.. I will quickly go over that. Let's say if you are searching
for the word Airport and let's say is the user typed in
"AIRPROT" with the wrong spelling, okay. this O and R are interchanged right. Now, If a
user searches for this and you do not return back any result, then that's a bad
user experience, right. So you need your database to be able to figure out that
the user did
not really meant this thing [AIRPROT], the user actually meant AIRPORT, right?
Now how does that database identify, so this word can be converted into the
correct spelling of airport by changing two characters, right.
R needs to be converted into O and O needs to be converted into R, right? so this is at
an edit distance of two. So you can provide a level of fuzziness that your
search engine needs to support. This has a fuzziness factor of two, which is the
Edit Distance. Normally there are a lot of other factors also that come in but this
is roughly how fuzzy searching is implemented in most of the solutions.
So wherever you have any search capabilities there you use either Elastic Search
or Solr. Now one very important thing about both
of these are, these are not databases. These are search engines. So the
difference between a Search Engine and a Database is whenever you write something
in a Database, Database gives you a guarantee that that data wouldn't be
lost. Now both of these data stores don't
give you any such guarantee. They claim that they are giving a good enough
availability and a redundancy and all of that but potentially data could be lost
so you should not keep any of these as your primary source of truth
Your primary data store should be somewhere else and it you could load that data
into either of these systems to provide the searching capabilities and these two
are very efficient at search. Next, let's look at what do you do if you want to
store some metrics kind of a data. So let's say if you are building a system
like Graphite, Grafana or Prometheus which is basically an application
metrics tracking system. So let's say if the use case that you are given is a lot of
applications are pushing metrics related to their throughput, their CPU
utilizations, their latencies, and all of that. And you want to build a system
to support that. Then is when comes something called as a
Time Series Database. Now think of Time Series Database as an extension of
relational databases but with not all the functionalities and certain
additional functionalities. So regular relational databases that you have would
have the ability to update a lot of records right or they would also give
you the ability to very random records but whenever you are building a metrics
monitoring kind of a system you would never do random updates. You would always
do sequential update in append-only mode. So if you have put an entry at time T1
the next entry would be a time T 2 which is greater than T1. The next entry would
be at time T3 which is greater than T1 and T2 right. So it's an append only
write mode. Also the read queries that you do, they are kind of bulk read
queries with the time range, right. You query for last few minutes of data or
few hours of data or few days of data, right. But you don't do a random read or
a random update. Now time series databases are optimized for this kind of
a query pattern and input pattern. So there are a lot of time series databases
InfluxDB is one of them, openTSDB is one of them so you could use
either of them if you have that kind of a use case. I will quickly write this
thing also. For time series, you use something called as an openTSDB, it stands for
Open Time Series Database. The next use case is when you have lot of information and you
want to store all of that information of a company in a certain kind of a data
store for various kind of analytics requirements. So let's just take an
example of something like an Amazon or Uber or any system design where you want
to provide analytics on all the transactions. You might want to provide
analytics like how many orders I'm having, what geographies are giving me
what revenues, which is the most sought after item, stuff like that. So where you
want to do analytics on the data of the whole company, there you need something
like a Data Warehouse. That basically is a large database in which you can dump
all of that data and provide various querying capabilities on top of the data
to serve a lot of reports. Now these are generally not used for transactional
systems, these are generally used for offline reporting.
So if you have that kind of use case then you can use something like a Hadoop
which can very well sit in for that purpose, where you put in a lot of data
from various transactional systems and then build a lot of systems that can
provide reporting on top of that data.
Now let's look at slightly tricky
scenarios where you might want to choose
between a relational and a non-relational database.
So the very first thing that helps you decide what kind of a database
do you want to use, is the structure of the data. So if you have a very
structured information then possibly a relational database makes sense. Now what
is a structured information? it would be an information that you can easily model
in form of tables and tables would have rows and columns of information.
For example if you want to store a user information something like a user
profile on any social network it would have name, email address, city, phone number,
and bunch of very standard information that each user will have
That would be a structured information. So if we try to make a flowchart of how
to decide which database to use this is how it would look like.
The very first choice that we need to make is whether we have a structured data or an
unstructured data. Now let's say if we have a structured data the next question
that we need to ask ourselves is, do we need any atomicity or transactional
guarantees from the database or not? So, let's just say that you're building a
payment system which supports a feature like somebody can transact money from
their account to somebody else's account, right. Now fundamentally at the very core
of it, it will have two queries, one of them will reduce the amount from the
person's account and the other query would add the amount into the
beneficiary's account, right. Basically reducing from account A and adding into
account B. Now your database should be able to provide you certain guarantees
which wrap both these query into transactional boundaries, saying either
both of them would execute or both of them would not execute. But it should
never happen that amount has been debited from account A but not credited
in into account B or credited into account A but not debited from account B.
Something of that sort should not happen. Also it should provide you some
consistency saying if you have done a transaction the next call that you made
to fetch the account balances it should reflect the amount. It should not be that
sometimes it is reflecting the correct amount
and sometimes it is not reflecting the correct amount.
If you have that kind of
requirement where in your building of payment system or an inventory
management system where you have the count of number of products that you
have while people are buying them and the count needs to reduce, for all of
those kind of scenarios, that would fall into that yes you need an atomicity
you need consistent, you need transactions, then basically you need to
use a relational database, okay. Now there are multiple providers of relational
database you could use any of them. Some of the very common ones are MySQL,
Oracle, SQL Server, Postgres and there are a lot of them. I've just mentioned
the common ones it does not mean that you cannot use any other database of
your choice. Feel free to use any database which provides you ACID Guarantees.
Now let's say you have relational data but you do not need
ACID Guarantee. Let's say you are just storing user information which does
not have any use case of these atomicity requirements. You could still choose
to use a relational database or you could choose to use a non relational
database, it wouldn't make too much of a difference, because normally you would be
easily be able to map a structured data into a NoSQL model, so either of
these scenarios would be fine. Now let's say you do not have structured
data - so what do you do? So there are a bunch of scenarios in which your use case
might fit in. So maybe you are trying to build a catalogue kind of a system for an
e-commerce platform like Amazon which has information of all the items that
are available on that platform, right. So let's just say if you are building that
catalogue for Amazon and if we take certain examples, so let's say there is
the item like shirt. Now each item would have certain attributes. A shirt would
have an attribute like a size: could be large, a color: could be red, something of
that sort. If you have a refrigerator, it would have a volume like
200 litres/400 litres or whatever. It would also have a power saving mode like
3-star/5-star/what not. So those are the attributes of a refrigerator, right.
Similarly, something like a milk, would have quantity and an expiry date, right.
Now normally when you are on an e-commerce platform you not only need to
see these attributes if it is just about seeing them you could
kind of dump it as a Json and store it in any databases. But normally you would
also want to query that. Now querying on a JSON or random attributes
is a bit tricky on the relational databases, but there are certain kinds of
databases that are optimized for those kind of queries. So these are the
databases where you have a lot of data not just data in terms of volume but in
terms of structure. So if you have lot of attributes that can come in and a wide
variety of queries that can come in, then you fall into this category, and if that
is the case then you need to use something called as a document DB.
Now there are a lot of providers of document DBs, MongoDB, Couchbase are some of them.
Earlier we looked at Elastic Search and Solr for text searching those are
also special cases of Document Database. Now let's just say your your data is not
relational and you do not have complex queries, you have a couple of
straightforward queries you could still use a document DB if it doesn't fall
into the third category. Now what is the third category? Let's say if you have an
ever-increasing data. What do I mean by ever-increasing data? So let us take an
example of Uber. So all the drivers of Uber are continuously sending location pings.
so let's stay there are some number of drivers and they kind of translate
into X number of location records per day. So there would be X number of
records inputted per day. But this X would not be a constant, it would be a
growing number why because the number of drivers of uber are increasing day by
day right so this data would become probably X one day one. Additively 1.1X
on day two, 1.2x on date three, so on and so forth.
So it would not be increasing in a linear fashion it would be increasing but in a
more than a linear fashion. So that is what I am calling an ever increasing data.
Plus if you have finite number of type of queries. So let's if you want to track
location pings of drivers the most important query that you will do is find
all the locations of a driver whose driver ID is something right. So if you
have less number of type of queries (maybe high volume) but
a large amount of data then these kind of
databases would be the best choice for you. This is something called as a
columnar DB or a Column oriented DB and Cassandra
and HBase are the most used and most stable options out there for these
kind of scenarios. Now there would be a lot of other alternate as well but these
are the most stable ones and have been battle tested again for a lot of years I
would recommend to use either of these two in such kind of scenario. Now in my
videos I generally prefer using Cassandra over HBase, the only reason being
Cassandra is not very heavy to deploy HBase generally tends to have a lot of
components that it comes with, but performance wise both of them are
roughly similar. Each of them have their pros and cons but in a design interview
it wouldn't matter which one do you use. Now, what if you don't have any of these what
if you don't need ACID, you don't need these wide variety of data types and
query types and you don't have an inner ever-increasing data? Then you can use
any data base of your choice. Then it would basically be a low scale system
having a very small number of queries on a very small number of attributes on a
very small size of a data set. You would generally not get these kind of
requirements in a system design interview but if you get you could use
any of these unless it's something that we've already talked about which are one
of the special cases in some other scenario. So if it's not that you
could use any of these data bases and that should be fine. But normally when
you are in a system design you would not get a design that you will be satisfied
by just one of these databases. So let's look at slightly more trickier
real-world scenarios. Now let's take an example of us building an e-commerce
platform something like an Amazon. Now when you are managing inventory on that
side you want to make sure that you are not
over selling items. So let's say there is just one quantity of a particular item
left and there are 10 users who want to buy that you want to have this ACID
properties there to make sure that only one of the users should be able to
commit the transaction other users should not be able to commit the
transaction, right. You can put constraints and all of that on there.
It would make sense to use a RDBMS database for the inventory management
system of a system like Amazon or maybe an order management system right
But if you look at the data of Amazon it is an ever interesting data. Why? because
the number of orders are additive each day new orders are coming in they cannot
purge the data due to lot of legal reasons plus the number of orders are
also increasing so it naturally fits into this model right with you
where I am recommending together Cassandra. So what you could use is a
combination of both of these databases. You could use MySQL or any other
RDBMS alternate for storing data about the orders it has just placed and not
yet deliver to the customer. Once the item is delivered to the customer you
can then remove from this RDBMS, and put it into Cassandra as a permanent store.
I walked through this implementation in much more detail in
the Amazon system Design, so I would recommend to have a look on how a
combination of database can become a very power choice there.
Now let's look at another example again taking an example of Amazon, let's say if you want
to build a reporting kind of a thing which lets you query something like
get me all the users who have bought sugar in last five days
now sugar is not just a product. There are a lot of sellers selling different
sugar alternates of different companies maybe, of different qualities maybe, right.
so sugar would then be like a lot of item IDs right and on top of those a lot
of item IDs there must be a lot of orders. Now again I am saying that
orders would either be in Cassandra or in this (RDBMS) right but if you are doing
random queries where some you might want to query on who bought sugar, who bought
TV, who bought TV of certain quality, who bought a fridge of certain quality, that
kind of logically goes into a model where I was recommending to use a Document DB.
now what you could do is you could store the querying part over
here (Document DB). You could basically say that I'll store a subset of order information
into a MongoDB which says that user ID so and so, had an order ID so-and-so, on
some particular date, which has these ten item ID in the certain quantity, right
On this database you could run a query which will return you a list of users
and list of order right and then you could take those Order IDs and query
on both of these systems (RDBMS+Cassandra) right so here we are using all the three systems in
combination to provide various querying capabilities like with user bought what
kind of use case, right. So in any real world scenario you would have to use a
combination of such databases to fulfill the functional and non-functional
requirements that you have. Now all of these being said this is just an
indicative cheat sheet of what database fits well in what kind of a scenario.
when you are in an interview you could use this to kind of get done with it but
if you are actually using it in a real-world system I would recommend to
read a bit more about it to choose which out of these few do you want to
actually use, right. Also there are a lot of other databases that I have not talked
about which are basically examples of RDBMS's or example of columnar
databases. It does not mean that these are the only options. There are a lot of
other options out there you could explore them and use those as well.
so yeah I think
this should be a good enough place for a database choice given a certain use-case.
関連動画をさらに表示
SQL vs NoSQL in 2024 Make the Right Choice (Difference Explained)
Choosing a Database for Systems Design: All you need to know in one video
What is a Database?
The Problem With UUIDs
Types of Databases | Criteria to choose the best database in the System Design Interview
Introdução a bancos de dados SQLite com Python e módulo sqlite3
5.0 / 5 (0 votes)