Types of Databases | Criteria to choose the best database in the System Design Interview
Summary
TLDRThis video script offers a comprehensive guide on selecting the right database for distributed services. It emphasizes the importance of considering various criteria such as ease of learning, product maturity, data model, scalability, and cost. The script discusses different types of databases, including relational, NoSQL, document, column family, graph, time series, and NewSQL databases, highlighting their pros and cons. It also provides scenarios to illustrate when each type is most suitable, ultimately guiding viewers on making informed decisions based on their service's specific requirements.
Takeaways
- 📚 Choosing the right database for a distributed service is crucial as it involves trade-offs and depends on specific service requirements.
- 🔍 The decision is not as simple as choosing a relational database for structured data or a NoSQL database for semi-structured or unstructured data.
- 📈 Multiple criteria must be considered when selecting a database, including learning curve, product maturity, technical support, data model/schema, scalability, and cost.
- 🤔 Sometimes a choice must be made between two databases, neither of which is fully appropriate, and the more suitable option must be selected based on the given requirements.
- 🛠️ Different database types have their pros and cons, and the script discusses relational, key-value, document, column family, graph, time series, and NewSQL databases.
- 🔑 Key-value stores are simple NoSQL databases suitable for scenarios with primary key access and where scalability and flexible schema are needed.
- 📄 Document databases are a superset of key-value stores, allowing for a flexible schema and the ability to create secondary indexes on document attributes.
- 🗂️ Column family databases are optimized for accessing data based on columns and are suitable for scenarios requiring aggregation queries on large datasets.
- 🌐 Graph databases excel in scenarios involving complex relationships and multi-hop queries, making them ideal for fraud detection and recommendation systems.
- ⏱️ Time series databases are specialized for storing and retrieving timestamped data, which is useful for IoT and financial systems.
- 🔄 NewSQL or distributed SQL databases combine the ACID properties of relational databases with the scalability and sharding support needed for large-scale systems.
Q & A
What is the primary consideration when selecting a database for a distributed service?
-The primary consideration is choosing the most appropriate database based on the requirements of the service, as migrating databases later can be costly and risky.
Why is it not as simple as choosing a relational database for structured data or a NoSQL database for semi-structured or unstructured data?
-Because the decision is not binary and depends on multiple criteria that need to be weighed in parallel, such as the specific needs and trade-offs of the service.
What are some of the criteria used to select a database?
-Criteria include ease of learning curve, database product maturity and technical support, data model or schema, scalability, QD and data access pattern, cost, and ACID or BASE requirements.
What does ACID stand for in the context of database transactions?
-ACID stands for Atomicity, Consistency, Isolation, and Durability, which are properties of database transactions intended to guarantee data validity.
What is the difference between ACID and BASE systems?
-ACID systems prioritize consistency and guarantee data validity through a set of transaction properties, while BASE systems prioritize availability and are eventually consistent, often used in distributed databases.
Why are relational databases considered intuitive and straightforward for data representation?
-Relational databases represent data in the form of tables with rows as records and columns as attributes, making it easy to establish relationships between different data points.
What are some limitations of relational databases?
-Limitations include difficulty in scaling horizontally, manual sharding, rigid schema that is hard to change, and not optimized for queries that require graph traversal.
What scenarios are key-value data stores a good fit for?
-Key-value data stores are suitable for scenarios where all data access is through a primary key, scalability is important, and there is no need for complex queries like joins.
What are some examples of NoSQL databases?
-Examples of NoSQL databases include MongoDB, CouchDB, Azure Cosmos DB, Apache Cassandra, and Amazon DynamoDB.
Why are document databases considered a superset of key-value databases?
-Document databases are considered a superset of key-value databases because they allow the database to examine the internal data of the value, unlike key-value stores where the value is opaque.
What scenarios are column family databases or column-oriented databases suitable for?
-Column family databases are suitable for scenarios where you only need to query a subset of columns, and column-oriented databases are efficient for analytic scenarios due to better compression and faster access to a subset of columns.
What are some use cases for graph databases?
-Use cases for graph databases include fraud detection in payment systems, recommendation systems in social networks, and any scenario where complex multi-hop queries are required.
Why are time series databases optimized for storing and serving time series data?
-Time series databases are optimized for storing and serving time series data because they allow fast insertion and retrieval of large amounts of timestamped data, supporting complex analysis on this type of data.
What is the difference between NewSQL or distributed SQL databases and traditional relational databases?
-NewSQL or distributed SQL databases are relational databases with implicit support for sharding and scalability, making them suitable for systems that require ACID compliance and handle large volumes of data and high throughput.
How should one decide whether to use a relational database for semi-structured data?
-One should decide based on whether the data can be accessed using an ID and stored in a BLOB or CLOB field within a relational database table, considering factors like performance and the need for complex queries.
What are the implications of updating a count or similar attribute in a database system?
-Updating a count or similar attribute can change the system from read-heavy to write-heavy, affecting the choice of database and its ability to handle the increased write load efficiently.
Outlines
🔍 Selecting a Database for Distributed Services
The video begins by addressing the complexity of choosing a database for distributed services, given the multitude of technologies available. It emphasizes the importance of making informed trade-off decisions, as each database is optimized for specific scenarios. The role of software architects and senior engineers in selecting an appropriate database is highlighted, considering the costly and risky nature of migrating databases later in the development process. The speaker outlines various criteria for database selection, including the type of data (structured, semi-structured, or unstructured) and the required consistency model (ACID or eventual consistency). The paragraph stresses the need to weigh different database options against multiple criteria and service requirements before making a decision.
📈 Criteria for Database Selection
This paragraph delves into the specific criteria used to select a database, starting with the ease of learning curve, where the familiarity and adaptability of developers to a database play a significant role. The maturity of the database product and the availability of technical support are also crucial, with examples like Oracle database highlighting the importance of community size and language support. The data model or schema, which could range from structured SQL to unstructured data like media files, is another key factor. The paragraph also touches on scalability, the balance between read and write operations (QD), and cost considerations. It discusses the necessity of choosing between ACID compliance and eventual consistency based on system requirements, introducing the CAP theorem and the trade-offs between consistency, availability, and partition tolerance.
🗃️ Types of Databases and Their Use Cases
The speaker provides an overview of different database types, starting with relational databases like Oracle, MySQL, and Microsoft SQL Server, which are based on the relational model and are intuitive for data representation in tables. The limitations of relational databases, such as vertical scaling and the complexity of sharding, are discussed. The paragraph then introduces key-value data stores, which are simple NoSQL databases similar to hash tables, and are ideal for scenarios with primary key access and flexible database schemas. Examples of key-value stores like Redis, Memcached, and Amazon DynamoDB are given, along with scenarios where they are suitable, such as session information storage and shopping cart management.
📄 Document Databases and Their Applications
The paragraph discusses document databases, where the basic unit of storage is a document, often in JSON or XML format. Unlike relational databases, documents can have varying schemas, and secondary indexes can be created on document attributes. Document databases are beneficial for content management systems, e-commerce applications with flexible schemas, and any application requiring the storage of user-generated content. Examples of document databases include MongoDB, CouchDB, and Azure Cosmos DB, which are suitable for scenarios where a flexible data model is necessary and where data can be self-describing.
📊 Column Family and Column-Oriented Databases
The speaker explains column family databases, also known as wide-column databases or BigTable databases, which store data in columns rather than rows, allowing for efficient access when querying a subset of columns. Column-oriented databases are a subset that stores all data column-wise, offering benefits in data compression and access efficiency. The main difference between document and column family databases lies in the query model and the ability to address individual columns for updates. Suitable scenarios for column family databases include systems requiring aggregation queries on large datasets. Examples include Apache HBase, Cassandra, and Google's BigTable.
🌐 Graph and Time Series Databases
This paragraph introduces graph databases, which represent data as a network of nodes and edges, ideal for scenarios requiring traversal of complex relationships and multi-hop queries. The speaker contrasts this with the use of relational databases or document stores for simpler, single-hop relationships. Graph databases are particularly useful in fraud detection, recommendation systems, and social networks. The paragraph also mentions time series databases, optimized for storing and querying timestamped data, which is common in IoT and financial markets. These databases enable fast insertion and retrieval for complex time series analysis.
🚀 NewSQL and Distributed SQL Databases
The final paragraph of the script discusses NewSQL or distributed SQL databases, which can be seen as relational databases with built-in support for sharding and scalability. These databases are particularly suitable for systems where ACID compliance is mandatory and data volume or throughput is too large for a single machine. Examples include financial systems and large-scale billing services. The speaker also revisits the scenario of designing a URL shortening service, discussing the database selection process based on data structure, query load, and scalability requirements. The importance of understanding the system's needs and choosing a database that can accommodate growth and performance is emphasized.
🔑 Summary of Database Selection Criteria
The video concludes with a summary of the key points discussed regarding database selection. It clarifies that a simplified decision tree based on data structure or ACID properties is not sufficient for choosing the right database. Instead, a more complex matrix or table is needed to determine the most suitable database for a given scenario. The speaker advises starting with a relational database and considering other types like key-value, document, column-family, graph, and time series databases based on specific requirements. The importance of understanding the nature of data, whether structured, semi-structured, or unstructured, and the system's need for scalability, performance, and consistency is reiterated.
Mindmap
Keywords
💡Distributed Service
💡Database Trade-offs
💡ACID Properties
💡Relational Database
💡NoSQL Database
💡Scalability
💡Data Model/Schema
💡Consistency
💡CAP Theorem
💡NewSQL Database
💡Graph Database
💡Time Series Database
Highlights
Importance of selecting an appropriate database for distributed services due to the high cost and risk of migration.
Trade-off decisions in choosing a database, as each product is optimized for specific scenarios.
Criteria for database selection includes learning curve, product maturity, technical support, and data model/schema.
Scalability as a key criterion, assessing how well a database can scale to support increased load and throughput.
Understanding read and write access patterns to prioritize database selection.
Cost considerations in database selection, including licensing, maintenance, downtime, and hiring experienced personnel.
The necessity of choosing between ACID compliance and eventual consistency based on system requirements.
Overview of different types of databases: relational, key-value, document, column-family, graph, and time-series databases.
Advantages and limitations of relational databases, including their intuitive relational model and vertical scaling limitations.
Key-value stores' suitability for scenarios with primary key access and flexible schema requirements.
Document databases' flexibility in handling different schemas and the ability to create secondary indexes.
Column-family databases' efficiency in accessing and storing data based on columns, beneficial for aggregation queries.
Graph databases' optimization for storing and querying data with complex relationships and multi-hop queries.
Time-series databases' specialization in handling timestamped data for fast insertion and retrieval.
NewSQL or distributed SQL databases as a blend of relational databases with built-in support for sharding and scalability.
Practical scenario analysis for choosing the right database type in services like URL shortening and social networking.
The video's call-to-action for viewers to engage with the content and subscribe for more informative videos.
Transcripts
hello everyone welcome to my YouTube channel in this video we are going to discuss how to
select a database for your distributed service nowadays you can find a large number of database
Technologies having such a large number of product and technologies means having more trade-off decisions
to make given that each database product is optimized for certain scenarios it rests on the
software Architects and Senior software Engineers to pick appropriate database with these trade-offs
in mind as it relates to the requirements of the service choosing the most appropriate database
based on the requirements of the service is one of the most important architectural and system
design decision because later if you find that the selected database is not appropriate for all your
requirements then migrating within the database is very costly and risky process and it could take
a lot of time choosing an appropriate database is not based on a simple decision tree where you can
just decide simply that if we have structured data then we will use relational database if
we have a semi-structured or unstructured data we will use maybe something like nosql and also it's
not as simple as saying that since we need ACID requirements we will choose relational database
and if we don't need acid requirements we can live with eventual consistency for example then
let's choose a nosql database there are multiple different criteria that we have to actually go
through in parallel and we have to decide after going through all of those criterias and we have
to weigh different choices of databases that we have that how those databases will fulfill
all those criteria or requirements that we have and after that we have to come up with
an appropriate database for our service you also need to understand that sometimes you may be in
a situation where you will have to choose between two databases and none of them is appropriate for
your requirements but in that case you have to see which of the database among these two databases is
more appropriate as compared to the other for your requirements before discussing different
types of databases now let's discuss what are different criterias that we use to select a
database based on the requirements of our service the very first criteria is ease of learning curve
this criteria refers to the ease with which new developers Architects dbas and other users of
the database can learn and adopt the use of a particular database most developers tend
to select the database which they have already worked on in the past and have some experience
of or if they think they can easily learn that database easily choosing a database based on past
experience is okay if it fulfills all the other different criterias that we are going to discussed
now and also it fulfills all the different important requirements of your service the second
criteria that we use to select a database is the database product maturity and technical support
this characteristic refers to how mature the database is what is the size of the database
community and how easy it is to hire someone with that database experience it also sometimes refers
to how many and which programming languages the database support we can take musician database
like Oracle database for example this database has been here for more than five decades and
we can easily find a lot of documentation on the internet about it there's also a lot of
different communities on Internet that we can find on Facebook or other communities where people have
worked on the listening databases and so it's very easy to get some help from the community
if we need in a relational database the third criteria that we use to select a database is
based on our data model or schema of our data the data model or schema is one of the most important
criteria for choosing a database this signifies that our database is student based on how well it
can provide support to the software developers to model and store the data of the service
data can be structured like SQL schema or semi-structured like a Json or XML object
or it could be totally unstructured data light for example some media files some images some videos
audios Etc the data could be relationship data for example in in case of Twitter system design
we have a follower and follow a relationship or it could be a friendship relationship in case of
let's suppose Facebook choosing a right database for the light model is also a very important Key
System design decision the fourth criteria for choosing database is scalability this criteria
refers to your degree and the ease with which a database can be scaled to support increased
load and throughput while minimizing the data access latency it means that while choosing
database we have to see whether the database can be scaled horizontally vertically or both
this also relates to the volume of data that needs to be stored in our service sometimes the volume
of data is small enough to be stored in a single machine but then other times the volume data could
be so huge that it cannot be stored in a single machine another important criteria to select a
database is the QD and data access pattern this criteria refers to whether a system is read heavy
or write heavy and so the database is selected to prioritize either read override or right over read
if needed this is not a binary choice but rather it is more of a scale towards which direction the
database optimizes then another very important criteria for choosing a database is the cost
sometimes you see two different databases with almost similar capabilities in this case it
depends on the overall cost associated with using each database based on which you select
one of the two databases this cost comprises of several expenses for example the cost of
buying a database license maybe it's a one-time license or maybe it's a subscription then there's
a periodic maintenance that needs to be performed on the database using the periodic maintenance is
their downtime that a service need to incur or not also what would be the cost to hire people
who have experience with such a database another important criteria to select database is based on
whether your system need asset requirement or your system is a base system the S8 where a stands for
atomicity C stands for consistency I stand for isolation and d stand for durability is a set
of properties of a database transaction intend to guarantee data validity despite errors power
failures and other mishaps in the context of a database the sequence of operations that satisfies
the asset properties are called transactions for example transfer of funds from One bank account to
another bank account this would involve multiple sequence operations for example debating money
from one account and crediting money to another account and all of these operations need to be
done within a single transaction to make sure that your system State remains valid on the other hand
the base systems are those systems which do not meet the acid properties it does not mean the base
systems lack all the different asset properties there could be base systems that fulfill some
criterias of acid properties for example a database will be considered base if it fulfills
properties like atomicity and isolation similarly there could be a base database system that fulfill
durability and isolation but not consistency however one thing is guaranteed for base systems
that they always prefer availability over strong consistency in a cap theorem the base systems are
always eventually consistent as the name suggests so now the choice of database depends on whether
the system require an asset compliant database or not ideally every system wants asset properties
however the asset property is though very desirable are not always possible to achieve due
to other non-functional requirements imposed by the system which affect the choice of a database
we just discussed what are different criterias to select a database now we will go through different
common types of databases that are available right now and we will see what are their pros and cons
before going through an example in which we will see how we will choose a database in a
particular example or scenario the very first type of databases which are available since more
than five decades are relational databases some examples of relational databases are
Oracle database MySQL Microsoft SQL Server Etc relational databases are based on relational
model which is very intuitive and straightforward way of representing data in the form of tables
in relational databases each row in a table is a record with a unique ID called the primary
key The Columns of the table holds different attributes of the data and each record usually
has a value for each attribute making it easy to establish relationships between different data
points relational database allow flexible data modeling the relational databases even allow a
modeling of key value data and graph data which we will discuss later in our other different
types of databases however it does not mean that a Json database can be used in all the scenarios
it depends on several factors to decide whether a relational database is a good fit for a particular
scenario or not relational databases are generally vertically scaled using large machines this is one
of the limitation of the relational databases because sometimes the data in the system could
be so huge that it cannot be stored in a single machine there are other scenarios as well where
even if the data can be stored in a single machine the number of read and write queries that comes
to a database are so large that they cannot be handled by a single machine in that case the
only option that we have is to Shard a relational database but usually relational databases does not
support implicit sharding the sharding needs to be done manually in that case and also usually when
we Shard relational database then we store the data in the sharded relation database in a shared
nothing architecture which means that all the data for a particular tenant we store in a single shot
we does not distribute the data for a single tenant in all the different shards that we
have created for our relational database another limitation of relation database is
that they have usually a rigid schema changing the schema of oscillational data in a relational
database require changing the table construct in the relation database thus changing this K
minus additional database is usually a painful time consuming and thought provoking process
that is why in such cases we have to look at other different database types such as nosql
databases that we are going to discuss now before we go into discussion about different
types of nosql databases I have a question for you guys let me know in the comment below
whether the evolution databases can be distributed or not and if yes how and if not why not now
another type of a very common data store that is used in different Services is a key value
data store a key value data store is the simplest form of a nosql database it is similar to a hash
table data structure and it is primarily used in scenarios where all access to a database
is through a primary key the key value to test scores are a possible Choice when all the access
your data is through a primary key the relational database constructs like joins or where or order
by are not required as they are not supported by the key value data store when the scalability is
an important requirement of the system and since key will address store allows easier scaling and
higher throughput the key value data stores can easily be shared using consistent hashing we
also use key value data store in scenarios where database schema is flexible and whatever we store
in the blob value is OPEC to the database and only understood by the application it doesn't
mean that if the key value data store is a nosql data store it does not support asset requirements
there are different keyword just so that offer asset requirements but usually most of the time
they offer those asset requirements only for a single key operation that is Operation which only
touches a single key also we need to understand that the consistency in a key value does store is
a bit different than the consistency in ovulation database and the consistency in a key value data
store is associated with the replication of a key value data to all the nodes or replicas in a shard
usually the key value data stores are good fit for following scenarios the first scenario is
it's about storing session information generally every web session is unique and is to assign
a unique session ID applications that store the session information on the disk or in a relational
database will greatly benefit from moving to a key value data store almost every user has a unique
user ID and then some attributes like username and other information all this information can
be stored in a key value store in order to get access to that information we only just need
a unique user ID and that's it another scenario where key value stores are good fit is a shopping
cart scenario e-commerce websites usually have a shopping cart tied to a specific user in that
case in order to clearly see if the shopping cart associated with a user we can use key value data
store to store the shopping cart information and we can access the shopping cart using a unique ID
of the shopping cart that could be associated with the user ID as well at the same time we also need
to understand different scenarios via the key values data stores are not a good fit such as
if the application need to have relationships between different entities and objects in the
data then the key value data store is not a good fit another scenario where the keyword data store
are not a good fit are the scenarios where we have to query or scan a large amount of data
Key weather gestures are usually optimized to queue the data based on a primary key and now if
we need to run a security in which we are scanning a large amount of data keyword resistors on in
that case not a good fit for such scenarios also the key elevator store does not support searching
based on the value that is if you want to search some data based on particular value in a data we
cannot search using a key value data store some examples of keyword data stores include radius
man cache Oracle coherence Amazon Dynamo DB Etc now the second type of a nosql database
is a document database the basic unit of storage in the document database is the document itself
these documents are usually self-describing hierarchical three data structures usually
stored in the form of either XML Json or Json Etc nowadays the most popular format used is Json the
documents is stored in a document database could be similar to each other but they do not need to
have reflected the same format the document database are like key value databases which
is store documents in the value part of the key value data store however we can consider
document databases as a super set of key value data stores where the value is not opaque to the
database and database can examine the value content which is not the case in a key value
data store now the question is how the document database is different from relational database
we need to understand that in a relational database all the records need to have same schema
and they need to have same set of attributes if you are storing them in a single table while in
case of a document database different documents can have different schema also as compared to the
key value test show there is one difference which is basically in case of a document data store we
can create second indexes on different attributes of the document which is not possible in case of
a key value data store the document databases are good fit for the following scenarios since
document databases have no predefined schema and usually understand Json documents they work well
in content Management systems or applications for publishing websites managing user comments
user registration profiles and other web facing documents e-commerce applications often need
to have flexible schema for products and orders as well as ability to resolve their data models
without expensive database refracting or database migration document database are suitable for such
scenarios the examples of document databases include mongodb couchdb and Azure Cosmos DB
by the way I have added detailed chapter in my course the distributed systems I interview Bible
that you can find the link here in which I have discussed all the different types of databases
via pros and cons and different scenarios where we can use them for more information
nowadays another type of databases which we use in some specific scenarios which we call
column family databases these databases are sometimes also known as white column databases
or a big table databases these usually have those with varying number of columns where
each column is a name value appear the column family databases allows access to data based
on columns rather than buy those and even sometimes they store the data by columns
there is a class of databases which we call column now databases or column oriented databases this is
a subset of column family databases in which we store all the data in column wise as compared to
row wise in case of other types of data stores when a column family data store is also column
oriented data store then the benefits include more efficient access to data when we are queuing only
one or few columns that is a subset of columns and there's also more options for data compression in
case of a column oriented databases however the column oriented databases are less efficient for
inserting new records the main difference between a column family database and a document database
is in the query model they can both store similar data structures that is you can actually install a
Json in a column family database as well but the document stores usually give you query by value
capability whereas column family databases usually do not another difference between a
document database and a column family database is that in the document database you usually insert
or update the whole document which is typically a Json whereas in column family databases like
Cassandra you can address individual columns and insert and update them individually
so the columnar databases are good fit for scenarios where you only need to query a subset
of the columns or very few columns also we use columnar databases in scenarios where you need
to run aggregation queries against a large amount of similar data on the other hand the columnar
databases are not suitable for scenarios such as online transaction processing scenarios or in
scenarios where you have to run queries against a small set of those some examples of column family
stores are Apache Edge base Cassandra and Google's big table here I have two questions for you
the first question is that usually column oriented databases are more suitable for analytic scenarios
why is this the case and the second question is the column oriented databases usually provide
better compression why is this the case let me know in the comments below what you guys think
another type of database which is getting popular day by day is graph database the graph databases
represent data as graph which is nothing but a network of nodes and edges between them
the edges between the nodes actually represent relationships between those nodes in a graph
database the storage is optimized for storing the graph and also is optimized for graph diversal we
can store a graph like data structure in a relation database as well and also in some
nosql database like documentdb Etc but those databases are not optimized for learning the
queries that require diversing the graph and so the graph databases are more suitable for
such scenarios where we have qds which require traversing the graph as we will see later now
let's discuss once again let me know in the comments below what do you think whether we
can use graph database in such scenario or not so in my one of my previous video in Twitter
system design which you can find the link here in one of the max surveys we defined a user relation
table since in our design we are only required to query a single hop relationship that is find
all the followers of a particular user and also the relationship is of a single type
such data could be stored in a relational database another option is to store this data in a nosql
key value database which provides a non-unique secondary index on the following user ID
if storing such data in a graph database all the possible will be an Overkill usage of
graph database in this case the only reason for storing such information in the graph database
is that either we plan to introduce more relationship types or we plan to introduce
multi-hop qds like find all the followers of all the followers of a particular user
or find all the followers of our user a who are also the followers of user B in the future so
that's why you need to understand that even though you might be working with some data which require
graft reversing or I would say some sort of graph traversing but using a graph database for such a
may not be that useful if the relationship between all the different entities is just a single type
of relationship and the queries that you are running only require a single hop relationship
the graph databases are usually used in following scenarios the first scenario where the gravity
bases are being used a lot is flawed detection in payment systems such as stripe with a graph
database you can process purchase and financial transactions in almost real time which means
you can prevent fraud for example with a graph database you can easily tag whether a particular
email address and credit card are related to other Fallen charges or not you can also differentiate
accounts via a single email address is associated with multiple people you can also find scenario we
have various people are associated with a single IP address even though they have multiple physical
addresses in different accounts another usage of graph database is in recommendation systems
for example in Facebook the graph database can be used to store information about various users and
their friends and their interest in a social graph this graph can be traversed to suggest friends and
other services to a user the other social network and machine learning and AI scenarios where the
graph databases can be used which I have covered in more detail in the course one thing to remember
is that the glav databases are not suitable for scenarios when the queries don't include a
specific starting point for example if you need to run frequent table scans then grav databases
are not suitable for such scenarios also when you need to look up a single data just by a key
in those cases graph databases are not suitable and you may be well off with using a key evaluator
store in that case also the graph databases are not suitable for scenarios where you have to store
a large chunk of information and such as blobs and media files Etc with every node let me know
in the comments below why is this the case why the graph database is not suitable for such scenarios
then there is another class of databases which we called time series databases these databases are
optimized for storing and serving time series data a Time series data is some data which is
associated with a timestamp the time series data is often continuous flow of data like measurement
form sensors in iot or intraday stock prices a Time series database LED you store a large
amount of timestamped data in a format that allows fast insertion and fast retrieval to
support complex analysis on time series data here we need to understand that relational databases
can also be used to restore time series data but decision database are not optimized for common
time series workloads we have discussed in Greater detail how a Time series database store time sees
data in a format which enables quicker table of timestamp data as compared to other traditional
databases in our course then lastly is a new class of database basis which we call new SQL
or distributed SQL databases you can consider new SQL databases as relational databases with
implicit support for sharding and scalability and these databases are more suitable for financial
systems and other systems like for example billing service in Amazon or Apple App Store where asset
requirement is a must but also the scale of data is so big to be stored in a single machine all the
throughput required is so big to be handled by a single machine now let's consider a scenario
you must have seen my previous video on designing tiny URL service which you can find the link here
in that service in one of the micro service we come up with this schema where we want to store
a URL mapping between the short URL and a long URL now the question is what type of database
do we need to store this information so the very first thing is that we will always start with a
relational database and check whether we can store this information in a relational database or not
and in that case we have to first decide whether the data is structured data is semi-structured or
is totally unstructured data and it's totally possible we could say that this could be a
semi-structured data what we could also do is that we can store all this information in a single blob
field or glob field in a relation database so our relational database could be of the format where
it has only two columns one is the ID column and the other is a blob or glob field where we store
the rest of the information and that could be a unstructured information form the perspective
of a database and we use deletion database for this purpose however now we have to see how much
data are we going to store let's say in next six months or a year or two years which means that
we need to figure out what would be the growth of data that we are expecting in the system and
then the second thing is what would be the query load what be the rate of flights what would be
the rate of reads that would be happening and can we use a solution database in that case or
not and of course since it's a tiny URL service we know that it's a read heavy system the number
of lights will be way less as compared to number of leads in the system we will only perform the
rights when we want to actually insert a new short URL to Big URL mapping in this table for example
we could also maybe think about other things like if we can still use resolution database so in that
case how to actually support scalability we could do some other things for example we can still use
ovulation database in our service but we can actually put a cache in front of our relation
database to support heavy read so now it all depends on the types of requirements what are the
different requirements we have in our system that can be fulfilled by a particular type of database
or not now let's just take an example that we also want to introduce something like a count here
how do you say integer and let's suppose this is the count of z-axis to this database and
suppose this is now a requirement which doesn't even make sense basically but let's suppose you
want to also see how many times a particular short URL for successed so every time now the
read happen and someone tries to get the big URL using a short URL you also incrementing
this count initially the system was a seat heavy system but now since with every read you are also
updating discount so now this system becomes a wide heavy system basically and now you have to
think what would be the best database in that case whether it's a relational database or it is a key
value data store that you would use or you will use a document data store or a column valuator
store if you use a little database here or even if a document database whenever we update this
count what we will be doing we will be updating the whole record itself in that case it would be
way better if we use a column oriented database because in a column oriented databases you see
access and store based on columns so in that case whenever a URL get that you can just go and update
the count of it without touching the rest of the data now let's take another example suppose we
have a graph of different objects and entities and their relationships the objects or the nodes are
different types of nodes we'd have we have people in the nodes we also have like things like banana
and we may also have cities like Berlin similarly that the types of relationships are also different
we have translation shape we have a follow-up relationship we have life relationship we have
visit relationship Etc in this scenario where you know we have we have different type of entities
and they have different types of relationships between them the graph databases are the most
suitable choices of course you can store this information in a relation database as well as
the document data store as well where you can actually install this follow likes visits all
different types of relationships in a relationship column and maybe it might work with some simple
queries like if you want to say give me all the users which Jason likes for example which is a
single hobby it may work in a in a database if you want to run some complex ways like give me all the
followers of Json friends then graph databases are best fit in order to run such type of queries so
let's summarize again what we have discussed in this video so the first thing we discuss is that
we can't just come up with a simplified decision tree based on whether we need a structured data
or semi structure or structured whether we need acid on acid Properties or not and then based on
that we can just say we can just use relational or a particular type of nosql database usually
in that case we have to construct a complex Matrix actually or a table to determine which
database is more suitable for our scenarios people usually think that we can't use relation databases
for unstructured data but that's not the case as long as you can actually access your unrestricted
data using an ID you can define a table with just an ID column and a blob or Club column
and you're going to store all your unstructured data in that clock column in a position database
so this is a long assumption that you can't use relational databases for unstructured data
you can but basically you have to look at other things so Vision databases are I would say are
always the preferable choice you should always start with a digital database because for example
if you are working on a product catalog service in Amazon for example where you keep tag of all the
different products now different Porters will have different attributes or baby infant milk
will definitely have different attributes than a TV or some other electronic devices so now if a
data is semi-structured what is the best place to store it whether it's relation database or
not usually here one other thing is that if the data is unstructured data like a media file a
video file then usually we store them in object storage service and we just store the URI for
those objects in our databases we usually don't install big you know blobs in our database because
every time you update an attribute you will also be writing that Big Blob of data as well and which
is usually banned for the database performance we see the relation databases you can just short them
as well but then you have to share the relation database manually and usually in that case we
use a shared nothing architecture to Shard those solution databases where we put all the tenant
information in a single shot then we have a key value data stores keyword response are like simple
hash table data structures where you have a key and then a value value is a blob in a key value
data store the value is opaque to the database the database does not look into the the values and and
so in that case if you need to also run queries in which you have to look into the OPAC value in
a keyword test tool then the obvious choice is a document data store because a document data store
is is like a superset of a key value data Store where the database also can examine the internal
data of a value itself and so you can create secondary indexes on those attributes in the value
part in the documentdb then we also discussed there are certain scenarios where we only have
to access maybe for read or write only a small subset of columns and for those scenarios column
Orient databases are a good fit basically because you don't have to read or write the whole record
or the whole row and you can only access and only update or write the columns you are interested in
then we see the AV also have a graph database and there are scenarios via
the glove databases are the best choices where these are the cases where you know
you are working with data which have multiple different types of relationships among them
and also you want to run queue this which require multiple hops not a single hop but
multiple Hops and so in that case the graph databases are more suitable choices to be used
if you want to know more about these things please look into my course there's a chapter where you
will find more details about it in which I have discussed in detail how to select a database in
the end again thanks a lot if you are still here with me and I will really appreciate if you find
this video useful then please do like this video and also do subscribe to this channel if you
haven't subscribed yet I will be adding more videos in future soon thank you and take care
関連動画をさらに表示
Which Database Model to Choose?
SQL vs NoSQL in 2024 Make the Right Choice (Difference Explained)
Types of Databases: Relational vs. Columnar vs. Document vs. Graph vs. Vector vs. Key-value & more
Relational vs. Non-Relational Databases
Learn What is Database | Types of Database | DBMS
What is a Database?
5.0 / 5 (0 votes)