Which Database Model to Choose?

High-Performance Programming
6 Mar 202324:38

Summary

TLDRThis video script explores the challenges of data modeling for scalable applications, comparing various database types. It highlights the limitations of traditional relational databases and introduces alternatives like key-value, column-family, document, and graph databases, each with their own advantages and use cases. The script discusses performance, scalability, consistency, and the importance of choosing the right database model for specific needs, emphasizing the trade-offs between speed, complexity, and data integrity.

Takeaways

  • 😌 The biggest challenge in app development isn't coding but data modeling at scale to avoid performance issues.
  • 📊 Traditional relational databases still dominate the market with 72% share, using tables and relationships well-suited for most apps but can be a bottleneck with growing data complexity.
  • 🔄 Alternative data models like graph and wide-column stores can handle complexity and scale better, but choosing the right one depends on the project's unique needs.
  • 🔑 Key-value databases are simple, fast for data retrieval, and well-suited for in-memory storage, providing sub-millisecond response times.
  • 💡 While in-memory storage like RAM is fast, it's not practical for all database types due to cost and the need for data persistence on disk.
  • 🚀 Key takeaway: Key-value databases are optimized for high performance and low latency, making them ideal for caching frequently used data.
  • 📚 Wide-column stores are interesting for their storage in column families and are highly partitionable, allowing horizontal scaling but not optimized for analytical queries.
  • 🔍 Document databases excel at storing related information in a single document, simplifying data handling but risking data duplication and inconsistency if not managed carefully.
  • 🔗 Relational databases are best for transactional processing with strong ACID guarantees, ensuring data integrity and consistency, but can struggle with horizontal scaling.
  • 🌐 Graph databases are powerful for complex, multi-hub relationships, offering fast queries by traversing relationships directly without joins, but require expertise to manage.
  • 🛠️ Each database type has its use cases and limitations; choosing the right one involves understanding the project's requirements for data structure, scalability, and query complexity.

Q & A

  • What is the biggest challenge in app development according to the script?

    -The biggest challenge in app development is not writing code, but figuring out how to model data in a way that works at scale to prevent issues like slow performance, data inconsistencies, and difficulties in adding new features.

  • Why might traditional relational databases become a bottleneck as data grows in size or complexity?

    -Traditional relational databases can become a bottleneck because they use tables and relationships to model data, which may not scale efficiently as data size or complexity increases.

  • What are some alternative data models mentioned for handling complex data?

    -Alternative data models mentioned include graph databases, which can handle complexity, and wide-column databases, which can scale data at high levels.

  • How does a key-value database model data and what are its advantages?

    -A key-value database models data as a collection of key-value pairs with unique identifiers, allowing for fast access. It uses a hash table to store keys and pointers to data values, making data retrieval very fast and efficient.

  • Why are key-value databases often stored in memory and what is the benefit?

    -Key-value databases are often stored in memory due to their simple model and small data set, which allows for blazing-fast data retrieval, sometimes with sub-millisecond response times.

  • What are the limitations of storing all database types in memory?

    -Limitations include the cost of memory, the need to persist data on disk for mission-critical apps to prevent data loss in case of a crash, and the fact that larger data sets can slow down the system regardless of the speed of the storage medium.

  • How do wide-column databases differ from traditional relational databases?

    -Wide-column databases store data in column families and are not optimized for analytical queries that require filtering across multiple columns, joins, or aggregations. They can only be searched using the primary key, unlike relational databases.

  • What is the significance of the primary key in wide-column databases?

    -The primary key in wide-column databases consists of one or more partition keys and zero or more clustering keys. It is used to distribute data across multiple nodes and sort data within a partition, enabling horizontal scaling.

  • Why are document databases a good match for object-oriented programming?

    -Document databases are a good match for object-oriented programming because they allow data to be stored in a format that can be naturally represented as an object, such as JSON, without the need for translation.

  • What are the main benefits of using a graph database?

    -Graph databases excel at handling complex, multi-hub relationships between entities, allowing for fast and efficient querying of densely connected data without the need for expensive join operations.

  • What are some trade-offs of using a document database for transactional processing?

    -Document databases may not be the best choice for transactional processing due to the lack of enforced referential integrity, which can lead to data inconsistencies if changes to one document are not reflected in related documents.

Outlines

00:00

🤖 Data Modeling Challenges and Database Options

The paragraph discusses the critical challenge of data modeling at scale, emphasizing that it surpasses the complexity of coding an app. It highlights the limitations of traditional relational databases when dealing with large or complex datasets, which can lead to performance issues. The script introduces alternative data models like graph and wide-column databases, which offer scalability and complexity management. It promises an exploration of various database types, their advantages, and disadvantages, to help make an informed decision based on unique project needs. It also touches on the concept of key-value databases, their efficiency in handling unstructured data, and their use of hash tables for fast data retrieval, noting that these databases are often stored in memory for rapid access.

05:01

🔑 Key-Value Stores: In-Memory Performance and Limitations

This paragraph delves into the specifics of key-value stores, their suitability for caching due to their ability to quickly access data using unique keys, and their support for various data types. It points out that these databases are optimized for high-performance applications with low latency requirements. However, it also notes the simplicity of the key-value model, which makes it unsuitable for complex data structures and dynamic queries involving multiple tables. The paragraph also mentions Memcached and Redis as examples of key-value stores, highlighting Redis's capabilities for multi-model data storage and its design for high performance and horizontal scalability, but also noting the trade-offs involved in achieving strong transactional consistency.

10:01

📚 Wide-Column Stores: Horizontal Scaling and Data Partitioning

The focus shifts to wide-column stores, which organize data into column families and are optimized for horizontal scaling. The paragraph explains the concept of primary keys, consisting of partition and clustering keys, and how they enable data distribution across multiple nodes. It discusses the challenges of querying random attributes and the need for data modeling that anticipates query patterns to avoid full table scans, which can be slow. The paragraph also addresses the issue of data duplication and the trade-offs involved in the denormalized form of data storage in wide-column databases, which can lead to inconsistencies.

15:05

📝 Document Databases: Flexibility and Denormalization

The paragraph introduces document databases, which store related data in a single document, as an alternative to the strict rules of relational databases. It discusses the benefits of this model, such as ease of handling data, faster data retrieval, and the elimination of the need for joins. However, it also warns of the potential for data duplication and the resulting inconsistencies if not managed carefully. The paragraph highlights the importance of choosing the right use case for document databases and the need for proper indexing and constraints to maintain data consistency and optimize query performance.

20:06

🔗 Relational Databases: ACID Transactions and Data Integrity

This paragraph underscores the enduring dominance of relational databases, particularly in industries like finance and e-commerce, due to their ability to model relational data clearly and maintain data integrity through normalization. It explains the process of normalization and its importance in organizing data to prevent duplication and ensure consistency. The paragraph also discusses the challenges of scaling relational databases horizontally and the complexities involved in maintaining data consistency when partitioning data. It concludes by emphasizing the strength of relational databases in transactional processing,得益于 their ACID (Atomicity, Consistency, Isolation, Durability) guarantees, which ensure the reliability and integrity of stored data.

🌐 Graph Databases: Navigating Complex Relationships

The final paragraph explores graph databases, which represent entities as nodes and relationships as edges, allowing for direct storage of connections between data points. It illustrates the efficiency of graph databases in handling queries involving densely connected data, as they eliminate the need for expensive join operations. The paragraph also touches on the challenges of managing and maintaining graph databases, especially at scale, and the need for expertise in dealing with complex graphs. It concludes by discussing the scenarios where graph databases excel, such as in data centers where complex multi-hub relationships need to be traversed quickly and efficiently.

Mindmap

Keywords

💡Data Modeling

Data modeling is the process of creating a representation of data structures and their relationships within a database. In the video, it is identified as a critical challenge in app development, as it determines how data is organized and accessed, impacting scalability and performance. The script discusses the importance of choosing the right data model to avoid issues like slow performance and data inconsistencies.

💡Relational Databases

Relational databases are a type of database that stores data in tables with relationships between them. They are mentioned in the script as having a significant market share and being suitable for most apps due to their use of tables and relationships. However, the script also notes that these databases can become a bottleneck when dealing with large or complex data sets.

💡Graph Databases

Graph databases are a type of NoSQL database that stores data in nodes and edges, representing entities and the relationships between them. The script highlights their ability to handle complex data structures and perform queries that involve relationships more efficiently than relational databases, making them ideal for certain use cases like social networks.

💡Column-Family Stores

Column-family stores, also known as wide-column stores, are a type of NoSQL database that organizes data into column families. The script points out that these databases excel in scalability and performance for read-heavy workloads but may not be as suitable for write-heavy workloads or complex transactions due to their eventual consistency model.

💡Key-Value Stores

Key-value stores are a type of NoSQL database that stores data as a collection of key-value pairs. The script emphasizes their simplicity and efficiency for fast data retrieval, making them ideal for caching and high-performance applications. Examples from the script include Memcache and Redis.

💡In-Memory Storage

In-memory storage refers to the practice of holding data in RAM for faster access. The script discusses the benefits of in-memory storage for key-value databases, which can provide sub-millisecond response times due to the speed of RAM. However, it also mentions the trade-offs, such as the need for persistence and the limitations of data size that can be held in memory.

💡Data Consistency

Data consistency refers to the accuracy and reliability of data across a system. The script addresses the challenges of maintaining consistency in different types of databases, especially in distributed systems like column-family stores, where eventual consistency is common, and the potential for data duplication and inconsistency arises.

💡Horizontal Scalability

Horizontal scalability is the ability of a system to handle increased load by adding more nodes or machines. The script highlights the strengths of column-family stores and document databases in this area, noting that they can scale out easily to accommodate growing data and traffic, unlike traditional relational databases.

💡Document Databases

Document databases are a type of NoSQL database that stores data in documents, often in JSON or BSON format. The script explains that these databases allow for flexible data modeling and are good for handling semi-structured data. They are also noted for their ability to store related data in a single document, which simplifies queries but can lead to data duplication.

💡ACID Properties

ACID properties refer to a set of properties (Atomicity, Consistency, Isolation, Durability) that guarantee reliable processing of database transactions. The script discusses how relational databases are known for their strong ACID compliance, which ensures data integrity, especially important for financial and e-commerce applications.

💡Data Integrity

Data integrity ensures that data is accurate, consistent, and remains unchanged over time. The script mentions that normalization in relational databases helps achieve data integrity by organizing data into separate tables and avoiding duplication. It contrasts this with the potential for data inconsistency in denormalized databases like document databases.

Highlights

The biggest challenge in app development is data modeling for scalability, not just coding.

Relational databases dominate the market with a 72% share, using tables and relationships for data modeling.

Alternative data models like graph and wide-column databases can handle complexity and scale better than relational databases.

Key-value databases are simple, fast, and efficient for data retrieval using unique identifiers.

In-memory storage provides blazing-fast data retrieval but is limited by RAM size and cost.

Storing entire databases in CPU cache memory is impractical due to high cost and data size limitations.

Key-value databases are well-suited for in-memory storage, offering faster responses.

Memcache and Redis are examples of key-value stores, with Redis offering multi-model database capabilities.

Wide-column stores are optimized for high partitionability and horizontal scaling.

Document databases like MongoDB are ideal for object-oriented programming and denormalized data storage.

Document databases can struggle with maintaining data consistency across related entities due to lack of referential integrity.

Relational databases excel in transactional processing with strong ACID guarantees.

Graph databases are optimized for querying complex relationships and can perform faster than relational databases for such tasks.

Graph databases require expertise to manage and can be challenging to distribute across multiple nodes.

The benefits of graph databases become more evident with complex, multi-hub relationships between entities.

Scaling a relational database horizontally can be difficult due to the reliance on relationships between tables.

Data modeling is crucial for maintaining data integrity and preventing issues like data inconsistency.

Transcripts

play00:00

the biggest challenge when writing an

play00:02

app isn't writing code but rather

play00:04

figuring out how to model data in a way

play00:06

that works at scale if you don't put

play00:09

enough thought into it your app could

play00:10

suffer from slow performance data

play00:12

inconsistencies and difficulties in

play00:14

adding new features not good we have the

play00:17

old school relational databases which

play00:19

are still leading the space with a

play00:21

market share of 72 percent they use

play00:23

tables and relationships to model data

play00:25

which is great for most apps however

play00:28

when your data starts growing in size or

play00:31

complexity it can become a bottleneck

play00:33

that's when you might want to consider

play00:35

alternative data models like graph

play00:37

databases which can handle complexity or

play00:40

white column databases that can scale

play00:42

data at astonishing levels of course

play00:45

with so many options available it can be

play00:47

tough to know which one is the best fit

play00:49

for your project but don't worry we'll

play00:51

break down the different types of

play00:53

databases with their pros and cons so

play00:55

you can make a decision that works for

play00:57

you so let's start the journey to find

play00:59

the perfect database for your unique

play01:02

needs

play01:05

imagine we have large amounts of

play01:08

semi-structured data and we assign it a

play01:11

set of unique identifiers we just

play01:13

created a collection of key value pairs

play01:15

for fast access so this model is

play01:18

flexible enough for unstructured data

play01:23

this type of database implements a hash

play01:26

table to store unique Keys along with

play01:28

the pointers to the corresponding data

play01:30

values since the data structure is

play01:32

basically an index it's very fast and

play01:34

efficient for data retrieval it uses a

play01:37

hash function to quickly calculate the

play01:39

location for storage based on the key

play01:42

then it uses the same key to quickly

play01:44

locate the corresponding value in memory

play01:47

in constant time

play01:54

since the model is so simple and the

play01:56

data set is rather small these databases

play01:59

are often stored in memory this makes

play02:01

data retrieval blazing fast sometimes

play02:04

with sub millisecond response time other

play02:07

data models such as relational and

play02:09

document based are not as suited for

play02:12

in-memory storage this is because they

play02:15

tend to have more complex data

play02:17

structures with fields and columns and

play02:19

also relationships and that can require

play02:21

more memory and processing power to

play02:24

handle but how much data can we store in

play02:27

memory since Ram is so fast why don't we

play02:29

load all database types in memory some

play02:32

people may argue that today we can store

play02:35

huge amounts of data in memory and there

play02:37

are database clusters with zillions of

play02:39

nodes that keep data in memory let's

play02:42

consider that the cost is not a problem

play02:44

although we should take a glance at it

play02:46

first we should consider that for

play02:48

Mission critical apps we would need to

play02:50

persist data on disk as well because in

play02:53

case of a crash we would lose some or

play02:56

all the data there are two main ways to

play02:58

synchronize the Ram with the disk but

play03:01

they both significantly affect the

play03:03

response time from nanoseconds to

play03:05

milliseconds second no matter how fast

play03:08

the storage medium is in the end the

play03:10

size of the data will make the system

play03:12

slower why don't we store the entire

play03:15

database in the CPU cache memory this is

play03:18

considered to be the fastest first

play03:20

because the cost will be very high

play03:22

second because the size of the data

play03:24

determines how fast the data is

play03:27

retrieved that's why even the CPU cache

play03:29

memory has three layers as the rule of

play03:32

thumb if we want blazing fast responses

play03:34

for a set of data the size should be

play03:37

relatively small so key takeaway number

play03:40

three is that key value databases are

play03:43

well suited to be stored in memory which

play03:45

in turn provides faster responses

play03:48

finally in this category we can mention

play03:50

memcache and radius although nowadays

play03:53

redis offers the possibility of

play03:56

multi-modal database

play04:00

simply put key value stores are not

play04:03

designed for complex data structures so

play04:06

if you need to execute Dynamic queries

play04:08

or perform complex aggregations based on

play04:12

multiple tables then you should look at

play04:14

document or relational databases

play04:20

foreign

play04:23

databases like redis are designed for

play04:26

high performance and horizontal

play04:28

scalability rather than strong

play04:30

transactional consistency although

play04:32

Reddit supports executing multiple

play04:34

commands as a single Atomic transaction

play04:37

using the feature of multi-command

play04:39

transactions or using lower scripting it

play04:42

doesn't support the full acid by default

play04:44

it requires some tricks and

play04:47

configurations to reach the acid

play04:49

properties and they usually come with

play04:51

trade-offs

play04:55

and finally key value stores are not

play04:57

well suited for data warehousing this is

play05:00

because they are not designed to store

play05:02

large amounts of historical data and

play05:05

they don't provide features such as data

play05:07

compression and indexing

play05:15

traditional SQL databases were designed

play05:17

for functionality rather than speed at

play05:20

scale so a cache is often used to store

play05:23

the replies of housely queries from the

play05:25

relational database to reduce latency

play05:27

and significantly increase throughput

play05:30

caching it's all about quickly accessing

play05:33

frequently used data and key value

play05:35

stores are perfectly designed to do just

play05:37

that key value stores are perfect for

play05:40

caching because they can quickly

play05:42

retrieve data using a unique key rather

play05:45

than searching through a large data set

play05:47

also key value stores allow for many

play05:50

data types as value including linked

play05:52

lists and hash tables furthermore they

play05:55

are stored in memory which further

play05:57

increases the access speed so key value

play06:00

databases are optimized for high

play06:03

performance and low latency applications

play06:05

however this data model might be too

play06:08

simple for other use cases so we'll move

play06:11

on to the next data model in terms of

play06:13

complexity

play06:17

key value stores are fun and simple next

play06:20

with white column stores things start to

play06:22

get interesting this databases stored

play06:25

data in column families although they

play06:27

look similar to the tables in a

play06:29

relational database they are not

play06:31

actually tables we'll realize this one

play06:34

will try to make a query on a random

play06:36

attribute and we won't be able to do it

play06:38

this is because we can search only by

play06:40

using the primary key similar to the key

play06:42

value stores so this model is not

play06:45

optimized for analytical queries that

play06:47

requires filtering across multiple

play06:49

columns tables joins or aggregations

play06:58

speaking of the primary key this is one

play07:00

of the most important concept of white

play07:02

column databases a primary key consists

play07:05

of one or more partition keys and zero

play07:08

or more clustering Keys sometimes called

play07:11

sort keys for instance in Cassandra each

play07:14

data set is partitioned by a partition

play07:16

key which is a combination of one or

play07:18

more columns basically we have a tool

play07:21

integrated in our data model to split

play07:23

the data set and distribute it on

play07:25

multiple nodes we see that white columns

play07:28

databases are highly partitionable and

play07:30

allow for horizontal scaling at the

play07:32

magnitude that other types of databases

play07:35

cannot achieve so here the partition key

play07:38

is used to distribute data on multiple

play07:40

partitions or nodes and the clustering

play07:42

key is used to sort data within a

play07:45

partition so a key takeaway here is that

play07:48

white columns databases are highly

play07:50

partitionable

play07:54

white columns databases are storing data

play07:56

in the normalized form this means that

play07:59

all data related to a particular item is

play08:02

stored together in a single row rather

play08:04

than being spread out across multiple

play08:07

tables this allows for faster data

play08:09

retrieval and easier querying you don't

play08:12

have to flip back and forth between

play08:14

multiple tables and do joins to get all

play08:17

the information you need all information

play08:19

is in one place however this will be at

play08:22

the cost of potentially having some

play08:24

duplicates and duplicating data is the

play08:27

root of all data inconsistencies among

play08:30

other problems we'll see next so the key

play08:33

takeaway here is that white columns

play08:35

databases are storing data in the

play08:37

normalized form

play08:41

trying to find a row for a random

play08:43

attribute it's like trying to find a

play08:45

needle in a haystack but instead of a

play08:47

needle you're looking for a specific

play08:49

piece of data and instead of a haystack

play08:51

you're looking on the entire cluster

play08:53

that can have hundreds of nodes you

play08:55

probably know that scanning a full table

play08:57

can be a really slow process now imagine

play09:00

that you have to scan hundreds of tables

play09:02

to find a piece of data here to avoid

play09:05

this problem we'll make use of the

play09:07

category attribute as a partition key

play09:09

this means that if you know that you're

play09:11

going to need to search by a specific

play09:13

attribute you'll have to model the data

play09:15

in a way that puts that attribute as a

play09:18

partition key basically you will

play09:20

partition all data based on that

play09:22

attribute but what if you need to filter

play09:24

data by multiple individual attributes

play09:27

then you'll have to create a new table

play09:29

for each query pattern this can create a

play09:32

lot of duplicated data in addition to

play09:35

the denormalization duplication but

play09:37

that's okay because white columns

play09:39

databases are really fast for rights

play09:42

Jokes Aside if you need to do a lot of

play09:45

filtering or analytic queries white

play09:47

columns stores are not the best option

play09:53

for transaction processing consistency

play09:55

is key however by default white columns

play09:58

databases are eventually consistent this

play10:00

means that the data will be eventually

play10:03

consistent across all the nodes in the

play10:05

cluster and it doesn't guarantee that

play10:07

all nodes will be consistent at the same

play10:10

time it's normally much more expensive

play10:12

in terms of latency and availability to

play10:15

work with transactions in such an

play10:17

environment that's why white columns

play10:19

databases such as Cassandra offer the

play10:22

option of lightweight transactions

play10:24

however these are still quite expensive

play10:26

in multi-node environments where

play10:28

multiple round trips are necessary

play10:30

between the coordinator and the other

play10:32

nodes so white column cdbs are not the

play10:36

best option for acid transactions

play10:40

adding new nodes to a Cassandra cluster

play10:43

is as simple as adding new blocks of

play10:45

Legos and it's the same for removal data

play10:48

partitioning is embedded into the data

play10:50

model which means it can be easily

play10:52

distributed across multiple nodes in the

play10:55

cluster this makes horizontal scaling a

play10:57

breeze but what happens with the

play10:59

existing data when a new node is added

play11:01

is the whole data redistributed in order

play11:04

to maintain an even distribution of data

play11:06

not really because that would be way too

play11:08

costly Cassandra uses the concept of

play11:11

consistent hashing and virtual nodes to

play11:13

minimize the amount of data that needs

play11:15

to be moved around the cluster this

play11:17

algorithm also ensures that the data is

play11:19

evenly distributed across all nodes we

play11:22

have a separate video on consistent

play11:24

hashing and virtual nodes so please

play11:26

check it if you want to find out more so

play11:28

if adding a node is so simple we can

play11:31

scale horizontally as much as we want in

play11:34

fact it has been reported that Apple are

play11:37

using 1000 Cassandra clusters with 300

play11:40

000 nodes and storing 100 petabytes of

play11:43

data for multiple use cases such as

play11:45

iCloud and Siri so white columns

play11:48

superpower is horizontal scalability

play11:53

white columns databases are considered

play11:55

to be good for rights for two main

play11:57

reasons first it uses a right optimized

play12:00

storage architecture which allows it to

play12:02

handle huge number of overrides very

play12:04

quickly for instance Cassandra uses a

play12:07

technique called log structured storage

play12:09

which allows it to write data on disk in

play12:12

large sequential blocks because of this

play12:14

principle it doesn't have to spend time

play12:16

to look where the data is stored in real

play12:18

time it will deal with it later in

play12:20

batches reason number two for fast

play12:23

rights is because of its partitioned

play12:25

architecture which allows for rights to

play12:27

be executed in parallel on multiple

play12:29

nodes at the same time

play12:34

instead of spreading data across

play12:36

multiple tables and then join them back

play12:38

together like in a scavenger hunt a

play12:40

document database puts all the

play12:42

information related to an entity in a

play12:44

single document a document database is

play12:47

the classic example of denormalization

play12:49

using something like mongodb it's like

play12:51

giving your data a break from all the

play12:54

strict rules and regulation of a

play12:56

traditional relational database instead

play12:59

of splitting data into multiple tables

play13:01

and establishing relationships between

play13:03

them you just store or related

play13:05

information within a single document

play13:07

this is truly a more convenient way to

play13:10

handle data but sometimes this may lead

play13:12

to some duplication of data and if data

play13:15

duplication gets out of hand you'll

play13:18

enter into the hell of data

play13:19

inconsistencies where if one copy of the

play13:22

data is updated it may not be updated in

play13:24

other copies leading to conflicts and

play13:26

inconsistent information data

play13:29

duplication can lead to all sorts of

play13:31

problems in a chain so you just need to

play13:34

be careful to choose the right use case

play13:36

for the document database if you have a

play13:39

lot of relations between different

play13:41

entities then documentdb might not be

play13:44

the best choice

play13:47

the ability to store data in any format

play13:50

allows for fast prototyping and it

play13:53

eliminates the need to spend time on

play13:55

defining the schema and creating tables

play13:57

so this speeds up development however

play14:00

without proper constraints it can be

play14:02

difficult to maintain consistency in

play14:04

data across different documents and this

play14:07

can limit the types of queries that we

play14:09

can perform on the data therefore for

play14:12

more complex use cases you would still

play14:14

need to think carefully about how you

play14:16

want to model your data and ensure that

play14:18

you have the appropriate indexes and

play14:20

constraints in place

play14:24

document databases often have more

play14:27

advanced indexing capabilities they

play14:29

support secondary indexes with the

play14:31

following types simple compound

play14:33

geospatial unique or full text indexing

play14:37

so make sure to index your data

play14:39

correctly and understand the performance

play14:41

implications of different types of

play14:43

indexes without proper indexes mongodb

play14:46

can have poor performance especially

play14:48

when working with large sets of data

play14:51

with indexes it's easier to optimize

play14:53

queries and improve performance this

play14:55

will allow you to perform complex

play14:57

queries on huge amounts of data like no

play14:59

other data model

play15:04

if you need to handle a lot of complex

play15:06

relationships a documented database may

play15:09

not be the best choice in fact document

play15:11

databases such as mongodb recommend

play15:14

embedding documents instead of using

play15:16

one-to-many or one-to-one relationships

play15:19

this is the general rule unless there is

play15:21

a compelling reason not to do so but you

play15:24

can actually model some relations in a

play15:27

document database but you will not have

play15:29

the same level of features and integrity

play15:31

second joining data from multiple tables

play15:34

can be a resource intensive operation

play15:36

this can slow down query performance and

play15:39

this is where relational databases

play15:41

sometimes struggle

play15:43

as the size of the data grows join

play15:45

operations become more and more

play15:47

expensive now in a documented database

play15:50

like which is highly scalable this

play15:53

could mean a significant performance

play15:54

impact on the entire database system

play16:03

furthermore maintaining data

play16:04

consistencies between related entities

play16:06

can be a difficult task in document

play16:08

database this is because there is no

play16:11

enforced referential integrity and

play16:13

changes to one document may not be

play16:15

reflected in others related documents

play16:21

a document oriented database is the

play16:24

perfect match for object-oriented

play16:26

programming one side can express the

play16:28

model in its natural language usually an

play16:31

object that can be represented as a Json

play16:33

and the other side can understand it

play16:35

without any translation this is not the

play16:38

case with object-oriented programming

play16:40

and relational databases for decades it

play16:43

has been attempted to close the gap with

play16:45

different Frameworks and tricks but they

play16:48

just don't mix so well

play16:50

so documented databases are easy to

play16:52

scale they provide indexing powerful ad

play16:55

hoc queries and analytics and they also

play16:58

have some features for transactional

play17:00

support

play17:01

foreign

play17:05

databases have been the dominant choice

play17:07

for data storage for decades and their

play17:09

popularity only continues to grow

play17:11

despite the rise of alternative

play17:13

databases such as nosql relational

play17:16

databases remain unstable in many

play17:18

Industries especially in finance and

play17:21

e-commerce there are several reasons for

play17:23

their continuous dominance first all

play17:26

data in most applications is relational

play17:28

customers make orders orders contain

play17:31

products and products are found in

play17:33

stores and so on

play17:36

furthermore the relational model with

play17:39

its tables rows and columns provide a

play17:41

clear and straightforward way to model

play17:43

the data making it easy for developers

play17:45

to work with

play17:49

before making use of relational

play17:51

databases you need to model your data

play17:53

according to the strict rules of

play17:55

normalization or you can just roll on

play17:58

your intuition and you might learn the

play18:00

hard way why some things need to be done

play18:02

in a certain way normalization is the

play18:04

process of organizing data in Separate

play18:07

Tables it's like organizing your closet

play18:09

and just as you might separate your

play18:11

shirts from your pants normalization

play18:13

involves breaking up data into smaller

play18:16

more manageable pieces these rules help

play18:18

to prevent clutter and duplication and

play18:21

improve data Integrity but what does

play18:23

data integrity mean just like a tidy

play18:25

room gives you a peace of mind that

play18:27

everything is in place data Integrity

play18:29

gives you peace of mind that your data

play18:32

is consistent accurate and not damaged

play18:34

or lost this sounds easy to achieve

play18:37

until you have hundreds of concurrent

play18:39

transactions with a lot of cash involved

play18:45

scaling horizontally a relational

play18:47

database can be a difficult task to

play18:49

achieve although there are solutions for

play18:52

scaling a relational database such as

play18:54

replication and sharding they usually

play18:56

require a significant added complexity

play18:58

both in terms of infrastructure and

play19:00

administration to be able to scale a

play19:03

database you need to partition it

play19:04

however relational databases rely on

play19:07

relationships between tables and

play19:09

partitioning the data can break these

play19:11

relations making it difficult to ensure

play19:14

data consistency and integrity so if you

play19:17

need to store large amounts of data

play19:19

especially less structured data than a

play19:22

noise skill database might be more

play19:24

suitable

play19:28

when it comes to transactional

play19:29

processing relational databases are the

play19:32

best in town a big part of their success

play19:34

can be attributed to their

play19:36

well-established acid guarantees we have

play19:39

atomicity consistency isolation and

play19:42

durability and these four ensure the

play19:45

reliability and integrity of stored data

play19:48

while other database models also support

play19:50

the acid properties for transactions

play19:52

relational databases are still

play19:54

considered the best option this is

play19:56

because the structures of tables and

play19:59

relationships makes it easier to enforce

play20:01

consistency and maintain data Integrity

play20:03

which is critical for transactions in

play20:06

particular cases other database models

play20:08

May struggle to comply to all acid

play20:10

properties for instance ensuring

play20:13

consistency and isolation can be

play20:15

difficult because multiple transactions

play20:17

may be executed concurrently now if we

play20:20

consider a Distributive system like a

play20:22

white column database with many nodes it

play20:25

can be even more challenging to ensure

play20:27

that each transaction has a consistent

play20:29

view of the data and things get more

play20:31

complex when networking connections

play20:33

might fail or one node might

play20:36

successfully complete its part of the

play20:37

transaction and then be required to roll

play20:39

back its changes because a failure

play20:41

occurred on another node however the

play20:44

trade-off for strong consistency is not

play20:47

being able to scale as much or as easy

play20:50

foreign

play20:53

database data is stored as a connected

play20:56

graph the nodes in the graph represent

play20:58

entities such as tweets users tags and

play21:01

the edges represent the relationships

play21:04

between these entities such as follows

play21:06

or mention let's say we want to get the

play21:09

top 10 tags used in all messages by a

play21:12

certain user in a relational database we

play21:14

would have to do a join between the tags

play21:16

and the Tweet tables which will

play21:18

basically result in a separate table

play21:20

however in graph stores relationships

play21:23

between nodes are stored directly on the

play21:25

nodes rather than Separate Tables

play21:27

because of this principle graph

play21:29

databases don't need to compute the

play21:31

relationships between data at query time

play21:33

the connections are already there stored

play21:36

on the nodes because of this queries

play21:38

with densely connected data are orders

play21:40

of magnitude faster

play21:42

graph databases eliminate the need for

play21:45

expensive join operations making data

play21:47

maintenance a breeze

play21:51

this model is powerful enough to cover

play21:53

the most complex data structures for

play21:55

instance neo4j was used to build a

play21:58

Knowledge Graph at Nasa however to

play22:00

properly manage and maintain a graphs

play22:02

database it requires a certain level of

play22:05

expertise unlike other types of

play22:07

databases graph DBS can be challenging

play22:09

to learn and manage especially when

play22:11

dealing with large intricate graphs so

play22:14

be prepared to invest some time and

play22:16

effort for getting up to speed

play22:21

now graph databases are pretty difficult

play22:23

to model on a single node but what

play22:25

happens when you need to distribute the

play22:27

graph on multiple nodes well you'll just

play22:30

need to consider a lot of stuff such as

play22:32

how to distribute the edges across the

play22:34

nodes or how to balance the graph data

play22:36

evenly and if these are not hard

play22:39

problems then what if some node is

play22:41

failing or what about Dynamic node

play22:43

addition or removal

play22:45

and the list goes on

play22:53

while graph databases are optimized for

play22:55

traversing and querying relationships

play22:58

they may not be the best choice for

play23:00

write heavy workloads in order to

play23:02

support a high volume of Rights you need

play23:04

to write to multiple nodes in parallel

play23:06

however the overhead of maintaining the

play23:09

graph structure connected the cross

play23:11

nodes will slow down the scaling pretty

play23:13

quickly and therefore the right

play23:15

throughput and there is also a high risk

play23:18

of the time consistency and conflicts

play23:20

other models such as key value or white

play23:23

columns are much more suitable for write

play23:26

heavy loads graph databases can become

play23:29

quite large and unmanageable especially

play23:32

when dealing with complex relationships

play23:34

so be prepared to invest in some serious

play23:36

Hardware resources if you want to use a

play23:39

graph database

play23:46

foreign

play23:49

benefits of a graph database become more

play23:52

pronounced when dealing with complex

play23:53

multi-hub relationship between entities

play23:56

for example in a data center scenario it

play23:59

may be necessary to Traverse several

play24:01

relationships to find all the switches

play24:03

of a particular Data Center and then

play24:05

another Hub to find all the interfaces

play24:08

of that data center in a graphs database

play24:10

this can be achieved in a single

play24:12

traversal making the query much faster

play24:14

and more efficient in contrast

play24:16

relational databases typically store

play24:18

relationships between entities as

play24:20

foreign keys in Separate Tables

play24:22

requiring expensive join operation to

play24:24

Traverse the relationships between

play24:26

entities this can result in slow and

play24:28

complex queries particularly when

play24:30

dealing with densely connected data

Rate This

5.0 / 5 (0 votes)

Related Tags
Data ModelingApp PerformanceRelational DBNoSQLGraph DBScalabilityKey-ValueColumn FamilyDocument DBCachingACID Properties