What is Azure Cosmos DB? | Azure Cosmos DB Essentials Season 1

Microsoft Azure
15 Sept 202106:18

Summary

TLDRWelcome to the Azure Cosmos DB essentials series with Mustafa, the technical product marketing manager. This series covers Cosmos DB concepts for developers of all levels. Season one explores the basics, starting with an introduction to Azure Cosmos DB, a NoSQL database designed for horizontal scaling and high availability. It discusses the importance of partitioning and replication for performance and fault tolerance. Cosmos DB's schema-less nature allows for flexible data models, supporting SQL API, graph data model via Gremlin API, and APIs for MongoDB and Cassandra. The series promises to showcase real-world use cases in upcoming episodes.

Takeaways

  • 🌐 **Introduction to Azure Cosmos DB**: Mustafa introduces Azure Cosmos DB as a NoSQL database service.
  • 📈 **Big Data Challenges**: Discusses the evolution from gigabytes to petabytes and the need for quick data access.
  • 🔄 **Horizontal Scaling**: Explains how Cosmos DB scales horizontally to handle large data volumes and high throughput.
  • 🔑 **Partitioning**: Highlights the importance of partitioning in improving database latency and throughput.
  • 🌐 **Replication**: Describes how replication within and across regions enhances fault tolerance and availability.
  • 📚 **Schema-less**: Emphasizes Cosmos DB's schema-less nature allowing for flexible data structures.
  • 💾 **Data Models**: Cosmos DB supports multiple data models including document, key-value, graph, and wide-column store.
  • 🔍 **SQL API**: Cosmos DB offers a SQL API for document and key-value data models.
  • 🌐 **Global Distribution**: Cosmos DB can be geo-replicated across multiple Azure regions for higher availability.
  • 🔄 **Consistency**: The series will cover the concept of consistency in a distributed database in later episodes.

Q & A

  • What is the main focus of the Azure Cosmos DB essentials series?

    -The series focuses on discussing Cosmos DB concepts, tips, and tricks for both beginners and expert level Azure Cosmos DB developers.

  • Who is the presenter of the Azure Cosmos DB essentials series?

    -Mustafa, the technical product marketing manager for Azure Cosmos DB.

  • What will be covered in season one of the series?

    -Season one will cover the basics, starting with an introduction to Azure Cosmos DB, diving into use cases, discussing consistency in a distributed database, and ending with hybrid transactional and analytical processing and synapse link.

  • What is Azure Cosmos DB?

    -Azure Cosmos DB is a NoSQL database service that offers horizontal scaling, replication, partitioning, and a flexible database schema.

  • How does Azure Cosmos DB handle the demands of big data?

    -Azure Cosmos DB handles big data by scaling horizontally, which allows it to uncap the throughput and volume of data in the database, serving requests concurrently across multiple nodes.

  • What are the two kinds of partitions in Azure Cosmos DB?

    -There are two kinds of partitions in Azure Cosmos DB: logical partitions, which are logical groups across items referenced through a partition key, and physical partitions, which are actual units of storage physically located in the Azure region.

  • Why is the partition key important in Azure Cosmos DB?

    -The partition key is important because it tells the database engine where to look for data, which is a crucial part of optimizing the database for performance gains.

  • What are the two types of replication in Cosmos DB?

    -There are two types of replication in Cosmos DB: replication within a region, which improves fault tolerance, and geo-replication outside of a region, which increases availability.

  • How does Cosmos DB's schema-less nature benefit developers?

    -Cosmos DB's schema-less nature allows developers to work with structurally variable data without issues, making it easier to adapt and evolve applications as use cases change.

  • What data models does Cosmos DB support?

    -Cosmos DB supports multiple data models including document, key-value, graph, and it provides APIs for SQL, MongoDB, and Cassandra to cater to different developer preferences and use cases.

  • How does the graph data model work in Cosmos DB?

    -The graph data model in Cosmos DB uses the Gremlin API, which models complex systems with many-to-many relationships using edges and vertices, and traversal to navigate these relationships.

  • What are the benefits of using Cosmos DB for modern applications?

    -Cosmos DB offers benefits such as high performance, availability, flexible schema, and support for multiple data models, making it suitable for modern applications that require high volumes of variable data without compromising on performance or availability.

Outlines

00:00

🌐 Introduction to Azure Cosmos DB

The video script introduces the Azure Cosmos DB essentials series with Mustafa, the technical product marketing manager for Azure Cosmos DB. The series aims to cover concepts, tips, and tricks for developers of all levels. Season one will focus on the basics, starting with an introduction to Azure Cosmos DB, its capabilities, and use cases. The script discusses the evolution of data demands and how Azure Cosmos DB, a NoSQL database, addresses these with horizontal scaling, replication, partitioning, and flexible database schema. It highlights the importance of choosing the right partition key for optimizing database performance and the two types of replication within and outside of Azure regions for fault tolerance and availability. The schema-less nature of Cosmos DB is also emphasized, allowing for flexible data structures and easy adaptation to changing data needs.

05:02

🔍 Exploring Data Models in Cosmos DB

The second paragraph of the script delves into the various data models supported by Cosmos DB, which are designed to cater to different types of applications and data structures. It mentions the SQL API that allows for fast and cost-effective point reads when the document ID and partition key are known, making it an optimization tool for applications. The Gremlin API for the graph data model is introduced, which uses edges and vertices to model complex systems with many-to-many relationships. The script also discusses the MongoDB API and Cassandra API, which are designed for developers familiar with these open-source databases, allowing them to use their existing tools, SDKs, and drivers. The MongoDB API leverages document storage, while the Cassandra API is a wide-column store optimized for analytics. The paragraph concludes by summarizing the benefits of Cosmos DB, including its horizontal scaling, replication for availability, and flexible schema for ease of use.

Mindmap

Keywords

💡Azure Cosmos DB

Azure Cosmos DB is a globally distributed, multi-model database service provided by Microsoft Azure. It is designed to allow developers to work with various data types, including document, key-value, wide-column, and graph, through a series of APIs. The service is highlighted in the video as a solution for big data demands, emphasizing its ability to scale horizontally and handle large data volumes with high availability and performance.

💡Horizontal Scaling

Horizontal scaling, also known as scaling out, refers to the process of adding more machines or nodes to a system to increase its capacity or performance. In the context of the video, horizontal scaling is a key feature of Azure Cosmos DB that allows it to handle more data and traffic by distributing the workload across multiple nodes, as opposed to vertical scaling, which involves increasing the capacity of a single machine.

💡Partitioning

Partitioning in databases is the process of dividing a database into distinct, smaller parts called partitions. The video explains that Azure Cosmos DB uses both logical and physical partitions. Logical partitions are groups of items within a dataset referenced by a partition key, while physical partitions are actual units of storage in the cloud. Partitioning is crucial for improving database latency and throughput by distributing data across nodes.

💡Replication

Replication in the context of databases means creating copies of data to ensure redundancy and improve fault tolerance. The video describes two types of replication in Azure Cosmos DB: within a region, where data is replicated four times, and geo-replication, which extends data replication across different Azure regions. This feature enhances the database's availability and durability.

💡Consistency

Consistency in distributed databases refers to the ability to ensure that data is the same across all nodes at any given time. Although not explicitly detailed in the provided script, consistency is a concept that is important when discussing distributed databases like Azure Cosmos DB, where ensuring that all users see the same data at the same time can be challenging due to replication and partitioning.

💡NoSQL

NoSQL is an abbreviation for 'Not only SQL' and refers to a category of database management systems that do not adhere to the traditional relational database structure. The video emphasizes NoSQL databases like Azure Cosmos DB for their ability to scale horizontally and handle non-relational data structures, which is particularly useful for modern applications dealing with large and varied data sets.

💡Schema-less

Schema-less databases do not enforce a strict structure on the data that is stored. In the video, it is mentioned that Cosmos DB is schema-less, allowing different documents in the same collection to have different structures. This flexibility is beneficial for developers as it accommodates varying data types and structures without the need for predefined schemas.

💡Document Model

The document model is one of the data models supported by Azure Cosmos DB, which treats data as a set of documents. Documents are typically represented in JSON or BSON format and can vary in structure. The video mentions that Cosmos DB's SQL API supports document storage, allowing for flexible data modeling and efficient storage of semi-structured data.

💡Key-Value Store

A key-value store is a type of database that pairs every piece of data with a unique key, making it easy to retrieve the data quickly. The video script highlights that Cosmos DB supports key-value pairs and can perform fast point reads when the document ID and partition key are known, which is beneficial for applications requiring quick data access.

💡Graph Data Model

The graph data model is another data model supported by Azure Cosmos DB, accessible via the Gremlin API. It is used to represent complex systems with many-to-many relationships, using nodes (vertices) and edges to model data. The video script uses the analogy of a web or network to describe graphs, emphasizing their utility in modeling interconnected data.

💡Hybrid Transactional and Analytical Processing (HTAP)

HTAP refers to the ability of a database to perform both transactional processing (OLTP) and analytical processing (OLAP) simultaneously on the same data set. Although not directly mentioned in the script, HTAP is a concept that could be related to the video's theme, as it is a feature that Cosmos DB supports, allowing for real-time analytics on operational data.

Highlights

Introduction to Azure Cosmos DB essentials series by Mustafa, the technical product marketing manager.

Season one focuses on the basics of Cosmos DB, including use cases, consistency, and hybrid transactional and analytical processing.

Cosmos DB is a NoSQL database designed for horizontal scaling, replication, partitioning, and flexible database schema.

The evolution from simple applications to big data demands faster and more frequent data retrieval.

Azure Cosmos DB addresses big data demands with horizontal scaling capabilities.

Horizontal scaling is achieved through partitioning and replication, differentiating Cosmos DB from relational databases.

There are two types of partitions in Cosmos DB: logical and physical.

The partition key is crucial for optimizing database performance by directing the database engine to the correct data location.

Partitioning data across nodes improves database latency and throughput.

Replication within a region enhances fault tolerance, while geo-replication across regions increases availability.

Cosmos DB is schema-less, allowing for flexible and variable data structures.

Flexible schema and horizontal scaling make NoSQL databases like Cosmos DB attractive for modern operations.

Cosmos DB supports multiple data models, including SQL API for document and key-value support.

The graph data model in Cosmos DB uses Gremlin API to model complex systems with many-to-many relationships.

Cosmos DB provides APIs for MongoDB and Cassandra, allowing developers to use familiar tools and drivers.

The MongoDB API in Cosmos DB leverages documents, while the Cassandra API is a wide-column store optimized for analytics.

The first episode concludes by emphasizing Cosmos DB's horizontal scaling, NoSQL features, and ease of working with variable data.

Transcripts

play00:00

(smooth music)

play00:14

- Welcome to the Azure Cosmos DB essentials series.

play00:17

My name is Mustafa and I'm the technical product

play00:19

marketing manager for Azure Cosmos DB.

play00:22

Now, throughout this series,

play00:23

we'll discuss Cosmos DB concepts,

play00:25

tips, tricks for both beginners

play00:28

and expert level Azure Cosmos DB developers.

play00:31

Season one will focus on the basics.

play00:33

We'll start with this episode, What is Azure Cosmos DB,

play00:36

then we'll dive into some use cases,

play00:38

talk about the concept of consistency

play00:40

in a distributed database,

play00:41

and then the last episode of the season we'll cover

play00:43

hybrid transactional and analytical processing

play00:46

and synapse link.

play00:47

So in this episode, let's start from the beginning.

play00:50

I'll introduce you to Cosmos DB our NoSQL database.

play00:53

We'll discuss at a high level horizontal scaling,

play00:56

replication, partitioning, and database schema.

play00:59

And then we'll dive into some of the data models

play01:00

that Cosmos DB supports.

play01:02

So, let's get started.

play01:04

Applications used to be relatively simple.

play01:07

We would have an app, an API layer

play01:09

and a slow drip of data dropping into our database.

play01:12

Now occasionally, we may have needed to surface that data

play01:15

and look up device information in a catalog,

play01:18

or look up an appointment on a calendar.

play01:20

But often the data sat in the database

play01:22

until it was called on however many days, months,

play01:25

or even years later.

play01:26

In the age of big data, where data sets have now grown

play01:29

from a few gigabytes to tens of gigabytes and even petabytes

play01:33

for some of our most demanding applications.

play01:35

Data is pouring into our databases

play01:37

and applications are demanding that data be surfaced quickly

play01:40

and frequently to fulfill user expectations

play01:43

of a personalized and fast application experience.

play01:46

Now, Azure Cosmos DB has boomed as the answer

play01:49

to the big data demands of cloud native applications.

play01:52

Because non-relational or NoSQL databases

play01:55

like Azure Cosmos DB scale horizontally

play01:58

rather than vertically, we can essentially uncap

play02:01

the throughput and volume of data in our database.

play02:04

Rather than upgrading hardware on a single node

play02:07

to serve requests faster, Cosmos DB distributes that work

play02:11

across multiple node so requests can be served concurrently.

play02:15

Scaling out verse scaling up is the biggest difference

play02:18

between relational databases and non-relational databases

play02:21

like Cosmos DB.

play02:23

Horizontal scaling or scaling out

play02:25

is the consequence of two techniques:

play02:27

partitioning and replication.

play02:30

Now there are two kinds of partitions in Azure Cosmos DB:

play02:33

logical and physical.

play02:35

A physical partition is an actual unit of storage

play02:38

that physically sits in the Azure region that we dictate.

play02:41

This is a piece of hardware holding our data

play02:43

somewhere in the cloud.

play02:44

Logical partitions are logical groups across items

play02:48

within our data set, and we reference these groups

play02:50

through something called a partition key.

play02:52

The partition key is important

play02:54

because it tells our database engine

play02:56

where to look for our data.

play02:57

Choosing the right partition key

play02:59

is a simple but important part of optimizing our database,

play03:02

and when done correctly,

play03:04

we can expect to see performance gains proportional

play03:06

to the additional number of nodes serving our requests.

play03:10

Put simply, partitioning our data across nodes

play03:13

improves database latency and throughput.

play03:16

Now like with partitions,

play03:17

there are also two kinds of replication in Cosmos DB:

play03:20

replication within a region

play03:22

and geo-replication outside of a region.

play03:25

Now within a region, our data is replicated four times

play03:28

as a redundancy measure, improving fault tolerance.

play03:31

Outside of a region, our data is geo-replicated

play03:34

into any additional Azure regions we select

play03:37

resulting in higher availability.

play03:39

Replication within a single region

play03:41

increases fault tolerance.

play03:42

Geo-replication into additional Azure regions

play03:45

increases availability.

play03:46

Now these two techniques, replication and partitioning,

play03:50

are in large part what makes horizontal scaling possible.

play03:52

Now what makes NoSQL attractive to modern op devs,

play03:55

performance and availability gains aside,

play03:57

are the relaxed constraints

play03:59

around keeping our data relational.

play04:01

Cosmos DB is schema-less meaning structure is not enforced

play04:04

on any data entering the database.

play04:06

Two documents in the same collection

play04:08

can have completely different structure without any issue.

play04:12

And as a result, developers can land

play04:14

structurally variable data into the database

play04:17

and later build, change or add functionality

play04:20

as a use for that data evolves.

play04:22

This combination of flexible schema and horizontal scaling

play04:26

are what makes NoSQL different.

play04:28

Cosmos DB makes it easy to ingest

play04:30

and resurface high volumes of variable data

play04:33

without compromising on performance or availability.

play04:36

In addition to flexible schema,

play04:38

Cosmos also supports multiple data models.

play04:41

Our team has built Cosmos to be the hub

play04:43

for all NOSQL data types.

play04:45

Cosmos has a SQL API with document

play04:47

and key value support built in.

play04:49

Documents and key value pairs go hand in hand

play04:51

in most NoSQL databases because documents

play04:54

are just key value pairs in which the key is a document ID

play04:58

and the value is a JSON object that we just call a document.

play05:01

Now to reap the performance benefits

play05:04

of a true key value store, with a SQL API,

play05:07

we can do fast and cost-effective point reads

play05:09

as long as we know the document ID and partition key

play05:12

for the item we want to read.

play05:14

This is a great optimization tool for applications.

play05:17

The next data model, the graph data model,

play05:20

is accessible via the Gremlin API,

play05:21

which uses edges and vertices as well as well as traversal

play05:25

to model complex systems with many to many relationships.

play05:28

Think of a graph as a web or a network of nodes.

play05:31

The API is for Mongo DB and Cassandra respectively,

play05:35

are created so that developers

play05:36

familiar with both open-source databases

play05:38

can work with the tools, SDKs and drivers

play05:42

that they're already used to.

play05:43

API for Mongo DB also leverages documents,

play05:46

and Cassandra API is a wide-column store,

play05:48

which orients data efficiently in columns instead of rows

play05:51

to optimize for analytics.

play05:53

That's it for our first episode.

play05:54

Cosmos DB is our horizontally scaling, NoSQL database.

play05:58

Partitioning makes it fast, replication makes it available

play06:02

and flexible schema makes it simple to work with.

play06:04

Thanks for joining us, and tune into our next episode

play06:06

on use cases to see some examples of how Cosmos DB

play06:10

is being used in real applications.

play06:12

(smooth music)

Rate This

5.0 / 5 (0 votes)

関連タグ
Azure Cosmos DBNoSQL DatabaseHorizontal ScalingData ModelingCloud NativeBig DataDistributed DatabaseSchema-lessGeo-ReplicationDeveloper Tips
英語で要約が必要ですか?