13 Managed & External Tables in Unity Catalog vs Legacy Hive Metastore | UNDROP Tables in Databricks
Summary
TLDRIn this video, the differences between managed and external tables in Databricks Unity Catalog are demonstrated. The tutorial covers setting up external locations in ADLS, creating both managed and external tables, and inspecting their properties. It highlights how Unity Catalog handles table drops differently from the legacy Hive metastore, with managed table data retained for 7–30 days and external table data always preserved. The video also introduces the powerful 'UNDROP' feature, allowing recovery of dropped tables within 7 days. Viewers learn practical steps for table creation, metadata management, and leveraging Unity Catalog’s enhanced recovery features to safeguard data efficiently.
Takeaways
- 😀 Unity Catalog in Databricks differentiates between managed tables and external tables in terms of data storage and deletion behavior.
- 😀 Managed tables store data in the Databricks meta store, while external tables store data in external locations such as ADLS.
- 😀 Before creating an external table, an external location must be set up in ADLS and linked to Databricks using a storage credential.
- 😀 Managed tables, when dropped, have their metadata removed immediately but the data files are retained temporarily (7–30 days) before automatic deletion.
- 😀 External tables, when dropped, retain their data permanently in the external location; only metadata is removed.
- 😀 The 'UNDROP TABLE' feature in Unity Catalog allows recovery of dropped tables within a 7-day retention period.
- 😀 Both managed and external tables support the UNDROp feature, but external table data always remains intact in the external storage.
- 😀 The Databricks UI provides functionality to create catalogs, schemas, and external locations, as well as verify permissions through 'test connect'.
- 😀 Using `DESCRIBE EXTENDED <table_name>` helps verify table type (MANAGED or EXTERNAL) and storage location.
- 😀 Unity Catalog improves over legacy Hive Metastore by delaying deletion of managed table data, offering a safety net for accidental deletions.
- 😀 When creating external tables, specifying the `LOCATION` in the CREATE TABLE statement is mandatory to point to the external storage path.
- 😀 Recovery of dropped tables can also be done using the table ID, providing flexibility in case of name changes or duplicates.
- 😀 The video demonstrates practical steps for creating, dropping, and recovering tables in Unity Catalog using a notebook attached to a Databricks cluster.
Q & A
What is the main difference between managed and external tables in Unity Catalog?
-Managed tables store both metadata and data within the Metastore, while external tables store metadata in the Metastore but keep data in an external location such as ADLS.
What is an external location in Databricks and why is it needed?
-An external location is a path in storage (like ADLS) configured in Databricks to store external table data. It is needed to properly link the table metadata in Unity Catalog with the actual data stored externally.
How do you create an external location in Databricks?
-Go to Catalogs → External Locations → Create External Location. Provide a name, select a storage credential, and specify the storage URL of the ADLS folder.
What is the purpose of the 'UNDROP' feature in Unity Catalog?
-UNDROP allows you to restore a dropped table within 7 days of deletion, helping recover accidentally dropped tables before data is permanently removed.
How does Unity Catalog handle data removal for managed tables after dropping?
-When a managed table is dropped, the data is not immediately removed. Unity Catalog retains it for 7–30 days, allowing restoration via UNDROp if needed.
Does dropping an external table remove its data from the storage location?
-No, dropping an external table only removes the metadata in Unity Catalog. The data remains in the external storage location permanently.
How can you check the properties and storage location of a table in Unity Catalog?
-Use the command `DESCRIBE EXTENDED <table_name>`. This shows the table type, storage location, and other metadata details.
Why is the data for managed tables not immediately removed in Unity Catalog?
-This delayed deletion allows a recovery window of up to 7 days, giving users the ability to restore accidentally dropped tables using the UNDROP feature.
What command is used to restore a dropped table in Unity Catalog?
-You can restore a dropped table by using `UNDROP TABLE <table_name>` or `UNDROP TABLE <table_ID>` within the 7-day retention period.
What are the key steps to create and test a managed and external table in Unity Catalog?
-1) Attach a cluster and create a notebook. 2) Use a specific catalog and schema. 3) Create a managed table with `CREATE TABLE` and insert records. 4) Create an external table with `CREATE TABLE ... LOCATION`. 5) Verify table properties using `DESCRIBE EXTENDED`. 6) Optionally drop and test UNDROp restoration.
What is the benefit of using external tables in Unity Catalog?
-External tables allow data to be stored outside the Metastore, providing flexibility in storage management, persistence, and shared access without affecting the original data location.
Where can you find official documentation for the UNDROp feature?
-The official documentation is available on Databricks Azure documentation by searching 'UNDROP in Databricks', which explains how to recover dropped tables within the 7-day retention period.
Outlines

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowMindmap

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowKeywords

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowHighlights

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowTranscripts

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowBrowse More Related Video

11 Catalog, External Location & Storage Credentials in Unity Catalog |Catalog with External Location

AI-powered Documentation, Search and Discovery

Data Federation with Unity Catalog

Databricks Unity Catalog: A Technical Overview

Data Lineage with Unity Catalog

Tutorial - Databricks Platform Architecture | Databricks Academy
5.0 / 5 (0 votes)