Databricks Unity Catalog: A Technical Overview

Pathfinder Analytics
29 Jan 202417:29

Summary

TLDRThis video offers an in-depth introduction to Databricks Unity Catalog, emphasizing its centralized approach to access control, auditing, lineage, and data discovery across workspaces. It contrasts Unity Catalog's robust governance capabilities with the more limited features of traditional Hive metastore setups. The presenter demonstrates Unity Catalog's functionality, including managing permissions, data lineage tracking, and federating queries across data sources, showcasing the platform's efficiency and ease of use.

Takeaways

  • 😀 Unity Catalog is a centralized system for access control, auditing, lineage, and data discovery across Databricks workspaces.
  • 🔒 Prior to Unity Catalog, access control and user management were decentralized, with each workspace having its own Hive metastore and limited governance capabilities.
  • 🌐 Unity Catalog introduces a shared metastore across multiple workspaces, enhancing data governance and privacy controls compared to the traditional Hive metastore.
  • 📚 The Unity Catalog metastore supports functionalities like data discovery and lineage tracking, and is designed to work with cloud object storage solutions like Amazon S3 and Azure Data Lake Storage.
  • 🛠️ Unity Catalog offers a single interface to administer data access policies, using standard SQL to grant and revoke permissions on catalog objects.
  • 👥 It automatically captures user-level audit logs, providing transparency on who accesses the data and how it is used.
  • 🔍 Data lineage is provided by tracking and visualizing the flow of data across different datasets and processes within the platform.
  • 🏷️ Users can tag and document data assets and search for them based on those tags, enhancing data discovery capabilities.
  • 📊 The main administrative roles in Unity Catalog include Account Admin, Metastore Admin, and Workspace Admin, each with specific responsibilities and permissions.
  • 🗄️ Unity Catalog organizes data objects hierarchically, starting with catalogs, followed by schemas (or databases), and then tables, views, functions, and volumes for non-tabular data.
  • 🔗 It enables Lakehouse query federation, allowing users to query data from external sources like Snowflake and even other Databricks workspaces without migrating data to a unified system.

Q & A

  • What is Unity Catalog in Databricks?

    -Unity Catalog is a feature in Databricks that provides centralized access, control, auditing, lineage, and data discovery capabilities across Databricks workspaces.

  • How was data governance managed before Unity Catalog in Databricks?

    -Prior to Unity Catalog, data governance and management in Databricks were controlled at a workspace level, with each workspace having its own Hive metastore, leading to a decentralized and fragmented approach.

  • What are the limitations of the traditional Hive metastore in Databricks?

    -The traditional Hive metastore had limitations such as basic security features and a reliance on access control lists for managing permissions, without robust data governance and privacy controls.

  • How does Unity Catalog's approach to Access Control and user management differ from the traditional approach?

    -Unity Catalog uses a centralized approach for Access Control and user management, with a shared metastore across multiple workspaces, as opposed to the decentralized approach of the traditional Hive metastore.

  • What functionalities does Unity Catalog's metastore support that the Hive metastore does not?

    -Unity Catalog's metastore supports functionalities such as data discovery and lineage tracking, and is designed to work with cloud object storage like Amazon S3 and Azure Data Lake Storage, unlike the Hive metastore which works with the Hadoop Distributed File System.

  • What are the main administrative roles in Unity Catalog?

    -The main administrative roles in Unity Catalog are Account Admin, Metastore Admin, and Workspace Admin, each with different levels of permissions and responsibilities.

  • How does Unity Catalog enable data access policies administration across workspaces?

    -Unity Catalog offers a single place to administer data access policies across all workspaces using standard SQL commands to grant and revoke permissions on Unity Catalog objects.

  • What is the significance of the three-level namespace in Unity Catalog?

    -The three-level namespace in Unity Catalog (catalog, schema, table) allows for a more structured and organized way to reference data, as opposed to the two-level namespace (database, table) in the traditional Hive metastore.

  • How does Unity Catalog support data lineage and what benefits does it provide?

    -Unity Catalog supports data lineage by tracking and visualizing the flow of data across different datasets and processes within the platform, providing transparency and understanding of data relationships and transformations.

  • What is the purpose of the 'Lakehouse query Federation' feature in Unity Catalog?

    -Lakehouse query Federation allows Unity Catalog to access and query external databases, enabling federated queries across different data sources without the need to migrate data to a unified system.

  • What are the differences between a managed table and an external table in Unity Catalog?

    -In Unity Catalog, a managed table is of the Delta format and can only be managed within Databricks, while an external table can have multiple formats such as Delta, Parquet, ORC, Avro, CSV, JSON, or text and can reference data stored outside of Databricks.

  • How does Unity Catalog enable sharing of data assets?

    -Unity Catalog allows for data sharing through features like Delta sharing, which enables sharing of data assets outside the organization, and external data access using storage credentials and connections to query against multiple data sources.

  • What is the role of the Account Admin in the context of Unity Catalog?

    -The Account Admin in Unity Catalog can create and link metastores to workspaces, assign Metastore Admins, configure storage credentials, and manage user group and service principal permissions.

Outlines

00:00

🚀 Introduction to Databricks Unity Catalog

This paragraph introduces the concept of Databricks Unity Catalog, which is a centralized solution for access control, auditing, lineage, and data discovery across Databricks workspaces. Prior to Unity Catalog, governance and data management were decentralized, with each workspace having its own Hive metastore. The limitations of this approach led to a fragmented governance model relying on access control lists. Unity Catalog changes this by offering a centralized approach to access control and user management, with a shared metastore across multiple workspaces. It supports advanced functionalities like data discovery and lineage tracking and is designed to work with cloud object storage solutions.

05:00

🔑 Unity Catalog Features and Administrative Roles

The second paragraph delves into the specific features of Unity Catalog, such as the ability to administer data access policies across all workspaces using standard SQL, automatic capture of user-level audit logs, and data lineage tracking. It also discusses the tagging and documentation of data assets for searchable access. The paragraph outlines the main administrative roles within Unity Catalog: account admins, metastore admins, and workspace admins, each with specific responsibilities and privileges. The structure of Unity Catalog's data objects is also explained, including the metastore, catalog, schema, and various other objects like tables, views, functions, and machine learning models.

10:03

🗺️ Navigating Unity Catalog in a Databricks Workspace

This paragraph provides a practical demonstration of navigating Unity Catalog within a Databricks workspace. It explains how to access the account console, view workspaces and their assigned meta stores, and manage user permissions. The paragraph also describes the different types of catalogs in Unity Catalog, including the traditional Hive metastore, system catalog, and user-created catalogs. It shows how to use a foreign catalog to query data from external sources like Snowflake, illustrating the concept of Lakehouse query federation. The process of creating schemas and managing tables within those schemas is also covered, along with the ability to execute queries using both Python and SQL interfaces.

15:05

🔍 Comparing Unity Catalog with Non-Unity Catalog Workspaces

The final paragraph contrasts a Unity Catalog-enabled workspace with a non-Unity Catalog workspace. It highlights the limitations of the legacy Hive metastore, such as the lack of lineage information, fewer features for data sharing, and a two-level namespace instead of the three-level namespace provided by Unity Catalog. The paragraph demonstrates how tables are managed and referenced in a non-Unity Catalog workspace, showing the differences in functionality and the propagation of permissions across workspaces. The comparison serves to emphasize the enhanced capabilities and centralized management offered by Unity Catalog.

Mindmap

Keywords

💡Unity Catalog

Unity Catalog is a centralized system for managing data access, control, auditing, lineage, and discovery across Databricks workspaces. It is a key feature of the video, as it represents a shift from a decentralized to a centralized approach for data governance and management. The script mentions that prior to Unity Catalog, each workspace had its own Hive metastore, leading to fragmented governance capabilities. With Unity Catalog, there is a shared metastore across multiple workspaces, enhancing the robustness of data governance and privacy controls.

💡Data Governance

Data governance refers to the overall management of the availability, usability, integrity, and security of the data used in an organization. In the context of the video, data governance was previously managed at the workspace level with limitations due to its decentralized nature. Unity Catalog introduces a more comprehensive and centralized approach to data governance, allowing for better control and auditing of data access and usage.

💡Hive Metastore

The Hive Metastore is a component of the traditional Hadoop ecosystem that manages metadata for the Hadoop Distributed File System (HDFS). In the script, it is mentioned that each workspace previously had its own Hive metastore, which was decentralized and had limitations in terms of data governance capabilities. The introduction of Unity Catalog has led to a shift towards a shared metastore that supports broader functionalities.

💡Data Lineage

Data lineage is the ability to track and visualize the flow of data across different datasets and processes within a platform. This is a significant feature of Unity Catalog, as it provides transparency and understanding of how data is transformed and used. The script illustrates this by showing how Unity Catalog can track the lineage of a table, from its creation to its use in various datasets and processes.

💡Cloud Object Storage

Cloud object storage is a type of storage that is designed to store unstructured data in a scalable and durable manner. In the video, it is mentioned that Unity Catalog's metastore is designed to work with cloud object storage solutions such as Amazon S3 and Azure Data Lake Storage, which contrasts with the Hive metastore's design for the Hadoop Distributed File System.

💡Access Control

Access control is the selective restriction of access to a computer, system, or network resource based on the identity of the user or group. The video discusses how Unity Catalog centralizes access control, allowing for the administration of data access policies across all workspaces. This is a significant improvement over the previous decentralized approach where access control lists were used at the workspace level.

💡Metastore Admin

A metastore admin is a role within Unity Catalog that has extensive privileges over the metastore itself. The script explains that metastore admins can perform various administrative tasks such as managing the metastore, assigning permissions, and configuring storage credentials. This role is crucial for the centralized management of the shared metastore in Unity Catalog.

💡Data Assets

Data assets refer to the valuable data resources within an organization, such as tables, views, and models. The video script mentions that Unity Catalog allows for tagging, documenting, and searching data assets based on tags, which enhances the discoverability and management of these assets.

💡Lakehouse Query Federation

Lakehouse query federation is the ability to query data stored in different platforms or locations as if they were in a single system. The script provides an example of how Unity Catalog can create a foreign catalog connected to a Snowflake data warehouse, allowing for federated queries across different data sources without the need to migrate data to a unified system.

💡Delta Sharing

Delta sharing is a feature that allows for the sharing of data assets outside of an organization. The video script discusses this feature in the context of Unity Catalog, where data can be shared using storage credentials and external locations, enabling access to data stored in cloud object storage and facilitating collaborative work across different organizations.

💡Three-Level Namespace

A three-level namespace is a hierarchical structure used to organize and reference data objects, consisting of catalog, schema, and table. The video script explains that Unity Catalog uses a three-level namespace, which is different from the traditional two-level namespace (database and table) used in the legacy Hive metastore. This allows for a more granular and organized approach to data management.

Highlights

Introduction to Unity Catalog in Databricks and its key features.

Unity Catalog provides centralized access, control, auditing, lineage, and data discovery across Databricks workspaces.

Prior to Unity Catalog, governance and data management were decentralized and managed at the workspace level.

Unity Catalog introduces a centralized approach to Access Control and user management with a shared metastore.

The Unity Catalog metastore supports broader functionalities like data discovery and lineage tracking compared to the Hive metastore.

Unity Catalog offers a single place to administer data access policies using standard SQL.

Automatic capture of user-level audit logs for data access recording.

Data lineage is provided by tracking and visualizing data flow across datasets and processes.

Tagging and documentation of data assets with a search interface for asset retrieval based on tags.

Administrative roles in Unity Catalog include Account Admin, Metastore Admin, and Workspace Admin with distinct responsibilities.

Metastore is the top-level container for metadata in Unity Catalog with a three-level namespace.

Catalogs, schemas, and tables form the object hierarchy in Unity Catalog for referencing data.

Models for machine learning and other data objects like storage credentials and external locations are part of Unity Catalog.

Demonstration of Unity Catalog enabled workspaces and comparison with non-Unity Catalog enabled workspaces.

Permissions in Unity Catalog are propagated to all workspaces assigned to the same metastore.

Lineage graph in Unity Catalog provides transparency of data flow and relationships between datasets.

Delta sharing and external data access features in Unity Catalog for sharing and querying data across platforms.

Upcoming tutorial on enabling Unity Catalog in Databricks workspaces in the next video of the series.

Transcripts

play00:00

hey everyone and welcome to this

play00:01

overview video on datab Brick's Unity

play00:04

catalog this video is the first video as

play00:07

part of a wider Unity catalog series on

play00:09

my channel in this initial video I'll

play00:11

introduce you to Unity catalog and

play00:14

discuss some of its most important

play00:16

features so what is Unity catalog Unity

play00:20

catalog provides centralized Access

play00:22

Control auditing lineage and data

play00:26

Discovery capabilities across datab

play00:28

bricks

play00:28

workspaces prior to Unity catalog on

play00:31

data bricks everything related to

play00:33

governance and data management was

play00:35

controlled at a workspace level so

play00:38

Access Control user management and the

play00:40

hive metastore were completely

play00:42

decentralized and they were managed

play00:44

individually at a workspace level each

play00:47

workspace had its own Hive metastore

play00:50

this approach had its limitations so

play00:53

governance capabilities were somewhat

play00:55

fragmented it relied primarily on access

play00:57

control lists for managing permissions

play00:59

as at a basic

play01:01

level data management was also

play01:03

decentralized with individual teams

play01:06

managing access to their own data assets

play01:08

at a workspace level you also had basic

play01:11

security features such as network

play01:13

security encryption and authentication

play01:16

mechanisms however comprehensive data

play01:18

governance and privacy controls were not

play01:21

as

play01:21

robust with unity catalog it's

play01:24

completely different as you can see from

play01:26

the image on the screen there is a

play01:28

centralized approach in Access Control

play01:31

and user management and there's a meta

play01:33

store that's shared across multiple

play01:36

workspaces the unity catalog metastore

play01:38

is different to The Hive

play01:41

metastore it supports a broader range of

play01:43

functionalities such as data Discovery

play01:46

and lineage tracking while the hive

play01:48

metastore is designed to work with the

play01:50

Hadoop distributed file system the unity

play01:53

catalog metastore has been designed to

play01:55

work with Cloud object storage such as

play01:57

Amazon S3 and aure data Lake storage

play02:00

there are numerous features in unity

play02:03

catalog Unity catalog offers a single

play02:05

place to administer data access policies

play02:08

across all of your workspaces you can

play02:10

use standard an csql to Grant and revoke

play02:13

permissions on Unity catalog objects

play02:16

Unity catalog automatically captures

play02:19

user level audit logs that record access

play02:22

to your data it provides data lineage by

play02:24

tracking and visualizing the flow of

play02:26

data across different data sets and

play02:28

processes Within the platform you can

play02:31

also tag and document data assets and

play02:33

then use a search interface to search

play02:35

for these data assets based on those

play02:38

tags the main administrative roles in

play02:40

unity catalog are account admin metast

play02:44

store admin and workpace admins account

play02:47

admins can create and Link metast stores

play02:50

to workspaces they can assign metast

play02:52

store admins and configure storage

play02:54

credentials among other things metastore

play02:57

admins have extensive privileges over

play02:59

the Met to store itself and then

play03:01

workspace admins have Pro adding users

play03:04

and groups to a workspace they can

play03:07

delegate workspace admin roles and they

play03:09

can manage job ownership and the

play03:11

handling of workspace

play03:13

objects so the image on the screen

play03:16

depicts the unity catalog data

play03:19

objects the metas store is the top level

play03:21

container for metadata each meta store

play03:25

exposes a thre level name space so you

play03:28

have catalog schema and table that's how

play03:31

you reference your data catalog is the

play03:34

first layer of the object hierarchy

play03:36

schemas also known as databases are the

play03:39

second layer of the object hierarchy so

play03:42

they store objects such as tables and

play03:44

Views you then have within each schema

play03:48

tables views functions and volumes so

play03:52

volumes will store your non-tabular data

play03:55

you also have models models refer to

play03:57

machine learning models

play04:00

you can also see other objects in the

play04:01

diagram too storage credentials and

play04:03

external locations work together for

play04:06

managing data access shares and

play04:08

recipients are used to distribute data

play04:10

assets within Unity catalog and then

play04:13

connections enable Unity catalog to

play04:15

access and query external databases

play04:18

providing a seamless way to Federate

play04:20

queries across different data sources

play04:23

okay so that's enough of the theory let

play04:25

me now show you one of my Unity catalog

play04:28

enabled workspaces and then I'll compare

play04:30

that with one of my nonunity catalog

play04:32

enabled workspaces so you can see the

play04:35

differences for yourself okay so I'm in

play04:38

one of my Unity catalog enabled datab

play04:40

break workspaces the account console is

play04:43

where the administration of your meta

play04:44

stores across all workspaces in your

play04:47

organization occurs to access the

play04:49

account console under your username

play04:52

click on manage account so under my user

play04:55

under my username I can click on manage

play04:58

account

play05:00

only account admins can access this so

play05:04

on the workspaces tab here you can see

play05:07

all workspaces in your organization so I

play05:10

have three workspaces and you can see

play05:12

the relevant information such as the

play05:14

meta store that they're assigned to so

play05:16

these two workspaces have been assigned

play05:18

to meta stores so that implies that

play05:21

these two are unity catalog enabled and

play05:23

this one is not Unity catalog enabled

play05:25

because it's not assigned to a meta

play05:28

store on the data

play05:30

tab you can see all of your meta stores

play05:33

so I have one meta store if I click into

play05:36

that you can see the details for The

play05:37

Meta store so it belongs to the UK South

play05:40

Region and here is the meta store admin

play05:43

so I can also as the account

play05:46

admin edit this meta store admin and

play05:50

then on the workspaces tab here you can

play05:52

see the workspaces assigned to this meta

play05:54

store so there are

play05:56

two on the user management tab you can

play06:00

change user group and service principal

play06:02

permissions you can inherit these from

play06:05

your cloud provider you can also assign

play06:08

account admins here so if I click on a

play06:10

user go on their roles you can assign

play06:12

them as account admin only account

play06:15

admins can assign account admins so keep

play06:17

that in

play06:18

mind okay so back in my workspace on

play06:21

catalog Explorer you can see all of the

play06:24

cataloges present in my

play06:27

workspace so I'll start start with the

play06:29

hive metastore this isn't really a

play06:31

proper catalog it reflects the

play06:33

traditional Hive metastore allowing

play06:35

users to access and manage the metadata

play06:37

of tables and databases that were

play06:39

previously managed by The Hive metastore

play06:42

using the old approach this is

play06:44

particularly useful for workspaces that

play06:46

need to be migrated to Unity catalog

play06:49

from the traditional Hive metastore so

play06:52

this Hive metastore catalog is workspace

play06:55

specific unlike the other normal

play06:57

cataloges which can be shared across

play06:59

multi multiple

play07:00

workspaces this main catalog here is

play07:03

automatically

play07:06

created you also have this system

play07:08

catalog here this contains metadata

play07:11

about the unity catalog itself such as

play07:13

information about tables views schemers

play07:16

and other data assets the schemers in

play07:18

this catalog is used for administrative

play07:20

and monitoring

play07:22

purposes you also have this samples

play07:24

catalog this is created by default and

play07:27

just contains sample data for you to

play07:28

play around

play07:30

with you'll also notice this catalog

play07:33

called snowflake Forum if I click into

play07:35

it you can see that's created using a

play07:37

connection to snowflake so this data is

play07:41

actually stored outside of data bricks

play07:43

and I am creating a foreign catalog so I

play07:46

can actually query data from

play07:49

Snowflake this is known as Lakehouse

play07:51

query Federation and you can do this on

play07:54

multiple platforms including

play07:56

snowflake you can even use this to

play07:58

connect to data bricks workspaces

play08:01

outside of your organization so moving

play08:03

on this HR catalog is one that I've

play08:06

created when I click into

play08:08

it there are five schemas or databases

play08:12

you have the bronze silver and gold

play08:15

schemas that I've created and then there

play08:17

are two by default one is called default

play08:19

and one is called information schema

play08:21

these are created by default for each

play08:24

catalog that you create so let me let me

play08:26

click into the bronze schema here are

play08:29

the tables in this

play08:31

schema so previously when referencing

play08:34

tables in the traditional Hive meta

play08:36

store you specify the database and the

play08:38

table Unity catalog has a three-level

play08:41

name space so if I want to query this

play08:44

country's table I would specify it by

play08:46

doing HR which is the catalog dot the

play08:50

schema which is bronze dot the table

play08:53

name so let me quickly Show You by

play08:55

opening a notebook so I'll just add a

play08:58

notebook

play09:01

make sure it's connected to a cluster it

play09:03

is so to read that table I can just do

play09:06

spark. read.table and I'm using python

play09:11

right now so I can just reference hr.

play09:14

bronze

play09:16

dot countries and then run

play09:20

this and then I can actually display

play09:23

that like

play09:28

so

play09:32

so here's the data using an SQL cell I

play09:34

can just simply type select star from HR

play09:38

do

play09:40

bronze.

play09:41

countries and then run

play09:44

this and that's worked as well as you

play09:47

can

play09:48

see so back in

play09:52

catalog and if I go back to this table

play09:56

you can see the details for this table

play09:58

so this table is a managed table you can

play10:02

have managed and external tables managed

play10:05

tables can only be of the Delta format

play10:09

external tables can have multiple

play10:11

formats such as Delta par orc Avro CSV

play10:15

Json or

play10:17

text so since this is a manage table it

play10:20

can only be Delta and here under the

play10:22

details you can see the path to the

play10:24

storage location you can see the metast

play10:26

store ID that it's Associated to and the

play10:29

table ID as well you can also see

play10:32

information such as who it's created by

play10:34

and other useful bits of information as

play10:36

well you can also manage permissions to

play10:39

this table as well by going on this

play10:40

permissions

play10:41

tab so you can do that using this UI

play10:45

where you can grant and revoke certain

play10:47

permissions and you can also do that

play10:49

using an csql on notebooks as well so

play10:52

this user has select privileges on this

play10:55

table so this is user one the user also

play10:58

has

play10:59

has use schema Privileges and then they

play11:03

also

play11:04

have use catalog privileges as well so

play11:08

to be able to query this table the user

play11:10

also needs permissions on the cataloges

play11:12

and the schemas for that table as well

play11:15

furthermore at the catalog level notice

play11:17

workspaces so I can assign this catalog

play11:20

and all of the contents of that catalog

play11:22

to multiple workspaces so right now all

play11:25

workspaces that are a part of this meta

play11:28

store can access this catalog and all of

play11:31

the permissions that I apply to this

play11:32

catalog will be propagated to all of the

play11:35

workspaces that have access so if I

play11:37

uncheck this I can now specify which

play11:40

workspaces have access and you'll notice

play11:43

right now I'm no longer able to access

play11:45

this because it's been crossed out so if

play11:47

I assign this to workspaces these

play11:49

workspaces will now have access so I can

play11:52

assign and then when I refresh this this

play11:55

should change from no access to show me

play11:59

the contents of that catalog and as you

play12:01

can see that's the

play12:03

case

play12:05

great so just to reiterate that the

play12:07

permissions that you give to your

play12:09

cataloges schemers and tables and other

play12:12

data assets will be applied to all

play12:15

workspaces that the data is shared

play12:17

to so now if I go to this countries

play12:20

table again notice columns you can

play12:24

actually add comments and

play12:26

tags these comments and tags

play12:29

can then be used in the global search

play12:32

icon to search for the data so if I have

play12:34

sensitive information I can add a

play12:36

comment saying this is sensitive

play12:37

information and I can search for that

play12:38

using that specific tag further along

play12:41

you can see lineage this shows the flow

play12:43

of the data you can see under

play12:46

tables this is the downstream table that

play12:49

this table has been referenced in and

play12:52

that is in the silver

play12:55

layer you can also see the notebooks

play12:58

that have being referencing this table

play13:01

and then you can see workflows pipelines

play13:04

paths and queries that have referenced

play13:06

the table as

play13:07

well you can also see the lineage graph

play13:11

so you can see here is the table that

play13:13

I'm currently selecting and here is the

play13:15

downstream table that it's linked to and

play13:17

to give you a better example of this let

play13:19

me go on one of my gold tables so on

play13:21

employee details if I go to

play13:23

lineage see lineage graph and then I

play13:26

expand this I can now see the full

play13:30

[Music]

play13:32

lineage of this specific data model so

play13:36

the bronze layer to the silver layer to

play13:38

the gold layer so this silver employees

play13:41

table has been joined with this

play13:44

Department's table to create this

play13:46

employee details table you can see all

play13:48

of the columns the data types and other

play13:50

useful metrics as well so it gives you

play13:53

full transparency of the data and the

play13:55

flow of that

play13:57

data so the there's a lot of useful and

play14:00

Powerful features on Unity catalog that

play14:02

you can see there are also features that

play14:05

allow you to share your

play14:07

data so Delta sharing allows you to

play14:10

share data assets outside of your

play14:14

organization under external data you can

play14:16

access data stored in Cloud object

play14:18

storage using storage credentials and

play14:20

external

play14:21

locations and then using connections you

play14:24

can run queries against multiple data

play14:26

sources such as snowflake MySQL postgrad

play14:29

and even other data bricks workspaces

play14:31

outside of your metast store without

play14:33

needing to migrate all of the data to a

play14:35

unified system so you can see I've

play14:38

already created a connection to a

play14:40

snowflake data warehouse and that is how

play14:43

I have connected to this foreign catalog

play14:46

on snowflake using that connection so

play14:49

now you know what Unity catalog can do

play14:53

let me show you one of my non-unity

play14:55

catalog enabled workspaces so you can

play14:57

compare the two

play14:59

so let me go back to the account console

play15:02

I'll go to my workspaces and this is the

play15:04

workspace that has not been assigned to

play15:06

a metastore as you can see so let's open

play15:12

that so I'll go to

play15:14

[Music]

play15:16

catalog so to be able to access the

play15:18

information let me just start the

play15:20

serverless warehouse and that will just

play15:21

take a moment to spin

play15:27

up

play15:29

and now

play15:30

notice when I select a specific

play15:35

table I have the details I have

play15:38

permissions but I don't have lineage

play15:40

information you also don't have

play15:42

additional features such as external

play15:44

data and Delta

play15:46

sharing and this Legacy Hive metast

play15:48

store also uses a two-level namespace so

play15:51

you simply have the database and then

play15:53

the tables so it does not use the three-

play15:55

level namespace so when you reference

play15:57

tables you just type HR doth table

play16:01

net so that will be the database do the

play16:04

table so where I was able to have a

play16:07

separate database for my bronze silver

play16:09

and gold tables in unity catalog here I

play16:13

have all of the tables stored in the

play16:15

same database of course I can have a

play16:17

separate database for each layer as well

play16:19

but this is how I've done it in this

play16:20

instance I've prefixed the table with

play16:23

the type of table that it is so you can

play16:24

see bronze silver and gold

play16:27

tables however you still can't see the

play16:30

link between these tables or how they

play16:32

flow between each other in general

play16:34

there's much less functionality compared

play16:36

to Unity catalog you can manage

play16:38

permissions but the permissions apply

play16:40

only to each workspace unlike Unity

play16:42

catalog where you can set permissions

play16:44

that are propagated to each workspace

play16:46

assigned to the same metast

play16:48

store okay so in this video I've

play16:51

summarized the main features of unity

play16:53

catalog which are centralized Access

play16:56

Control auditing lineage and data

play16:59

Discovery capabilities to name a

play17:01

few I've also showed you a Unity catalog

play17:05

enabled workspace so you can get a

play17:06

better understanding of these features

play17:08

in practice and I've also shown you a

play17:10

non-unity catalog enabled workspace so

play17:13

you can get a comparison of the two in

play17:15

the next video of this Unity catalog

play17:17

series I'll show you how to enable Unity

play17:19

catalog on your as your data bricks

play17:21

workspace so if you found this video

play17:24

useful then please give it a like And

play17:25

subscribe to my channel for more content

play17:27

like this

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
Unity CatalogData ManagementDatabricksCentralized ControlData GovernanceHive MetastoreCloud StorageData LineageData AccessData SecurityLakehouse Query
هل تحتاج إلى تلخيص باللغة الإنجليزية؟