11 Catalog, External Location & Storage Credentials in Unity Catalog |Catalog with External Location
Summary
TLDRIn this video, you learn how to create catalogs in Databricks Unity Catalog and manage data storage for managed tables. The tutorial covers defining external locations, setting up storage credentials, and understanding the hierarchical fallback mechanism from metastore to catalog and schema levels. Viewers see step-by-step demonstrations using both the UI and SQL commands to create catalogs with default and external locations, manage clusters for Unity Catalog, and safely drop catalogs using CASCADE. By the end, you understand how to organize and control data storage in Unity Catalog, laying the foundation for creating schemas and loading tables in subsequent lessons.
Takeaways
- π Unity Catalog uses a hierarchical object model: Metastore β Catalog β Schema β Tables.
- π Managed tables inherit their storage location from Schema β Catalog β Metastore based on a fallback mechanism.
- π When creating a Metastore, specifying a location is optional, but without it, catalogs must have a defined location.
- π You can create a catalog with or without an external location; without an external location, tables default to the Metastore location.
- π External locations are defined in Databricks to store catalog-managed table data separately in cloud storage.
- π Storage credentials are required to connect Databricks to external storage accounts, enabling use of external locations.
- π Azure Blob containers can be structured with directories (e.g., `ADB/catalog`) to serve as root paths for external catalog storage.
- π Catalogs can be created using either the Databricks UI or SQL commands (`CREATE CATALOG`).
- π To drop a catalog with all contained schemas and tables, the `CASCADE` option must be used to remove everything recursively.
- π Clusters must be enabled for Unity Catalog before executing catalog or table commands; legacy clusters need to be edited and restarted.
- π `DESCRIBE CATALOG <catalog_name> EXTENDED` provides detailed information about a catalog, including its storage location and schemas.
- π Using external locations allows separation of managed table storage from the Metastore, providing better data organization and flexibility.
Q & A
What is the hierarchy of objects in Unity Catalog?
-The hierarchy in Unity Catalog is: Metastore β Catalog β Schema β Table. Managed table data is stored according to the nearest defined location in this hierarchy.
What determines where the data for a managed table is stored?
-Data storage for a managed table is determined by the location defined at the most specific level: schema location first, then catalog location, and finally metastore location if no other location is specified.
Is specifying a location mandatory when creating a metastore?
-No, specifying a location for a metastore is optional. However, if no metastore location is provided, specifying a location at the catalog level becomes mandatory.
What is an external location in Unity Catalog?
-An external location is a defined storage path outside the default metastore location where managed table data for a catalog or schema can be stored.
How do you create a new catalog without specifying an external location?
-You can create a catalog without an external location using the UI by navigating to the Catalog tab and clicking '+ Add Catalog', or via SQL using: `CREATE CATALOG <catalog_name>;`. By default, managed table data will be stored in the metastore location.
What is the purpose of a storage credential in Databricks?
-A storage credential acts as a bridge between Databricks and external storage, allowing Databricks to access and use the storage location for managed table data.
How do you create an external location using SQL?
-You create an external location using SQL with the syntax: `CREATE EXTERNAL LOCATION <location_name> URL '<storage_url>' WITH STORAGE CREDENTIAL <credential_name>;`.
How do you create a catalog with an external location for its managed tables?
-After defining a storage credential and creating an external location, use SQL: `CREATE CATALOG <catalog_name> MANAGED LOCATION '<external_location_url>' COMMENT '<comment>';`. This ensures all managed tables under the catalog use the specified external location.
What does the CASCADE option do when dropping a catalog?
-Using CASCADE when dropping a catalog recursively deletes all tables, schemas, and other objects under the catalog, along with the catalog itself.
Can a catalog have multiple schemas by default?
-Yes, when a catalog is created, Databricks automatically provides two default schemas: `default` and `information_schema`.
What happens if you try to run a SQL command on a cluster not enabled for Unity Catalog?
-The command will fail with an error stating that Unity Catalog is not enabled. You must enable Unity Catalog on the cluster before executing such commands.
What is the fallback mechanism for managed table storage locations?
-The fallback mechanism ensures that if a location is not defined at the schema level, the catalog location is used, and if the catalog location is not defined, the metastore location is used.
Outlines

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowMindmap

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowKeywords

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowHighlights

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowTranscripts

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowBrowse More Related Video

13 Managed & External Tables in Unity Catalog vs Legacy Hive Metastore | UNDROP Tables in Databricks

Data Federation with Unity Catalog

Databricks Unity Catalog: Catalogs and Schemas

AI-powered Documentation, Search and Discovery

01 Notebooks y acceso a ADLS

83. Databricks | Pyspark | Databricks Workflows: Job Scheduling
5.0 / 5 (0 votes)