Structured and unstructured data storage

Qwiklabs-Courses

22 Nov 202403:08

Summary

TLDRThis video explains the differences between unstructured and structured data in the cloud. Unstructured data, like documents, images, and audio, is challenging to process using traditional methods and is typically stored in cloud storage. On the other hand, structured data is organized in tables and can be easily analyzed with SQL queries. The video also covers various cloud storage options for both data types, such as Cloud SQL, Spanner, Firestore, BigQuery, and Bigtable, each suited to specific transactional or analytical workloads. Organizations increasingly focus on leveraging unstructured data for competitive insights.

Takeaways

😀 Unstructured data refers to information stored in non-tabular formats, such as documents, images, and audio files.
😀 Around 80% of all data is estimated to be unstructured, making it a crucial area for analysis and storage.
😀 Cloud Storage is typically best suited for unstructured data due to its flexibility and scalability.
😀 Unstructured data includes a variety of content types, such as emails, documents, photos, videos, presentations, and web pages.
😀 Organizations are focusing on mining unstructured data to gain competitive advantages and valuable insights.
😀 Structured data is organized into tables, rows, and columns, and is easy to access, process, and analyze.
😀 Common examples of structured data include names, addresses, contact numbers, dates, and billing information.
😀 Structured data is ideal for use with programming languages and can be manipulated efficiently.
😀 Transactional workloads are associated with Online Transaction Processing systems and involve fast data inserts and updates.
😀 Analytical workloads come from Online Analytical Processing systems and require the reading of entire datasets, often for complex queries like aggregations.
😀 Cloud solutions like Cloud SQL, Spanner, Firestore, BigQuery, and Bigtable offer tailored services for handling structured data in transactional or analytical contexts.

Q & A

What is the difference between unstructured and structured data?
-Unstructured data is stored in a non-tabular form, such as documents, images, and audio files, while structured data is organized into tables, rows, and columns, like in spreadsheets or relational databases.
What percentage of all data is considered unstructured?
-It is estimated that around 80 percent of all data is unstructured.
Why is unstructured data difficult to process with traditional methods?
-Unstructured data lacks internal identifiers, which makes it challenging to search, process, or analyze using traditional methods.
What are some examples of unstructured data?
-Examples of unstructured data include email messages, documents, photos, videos, presentations, and web pages.
What are the key characteristics of structured data?
-Structured data is organized, easy to capture, access, and analyze, typically fitting into rows and columns in spreadsheets or relational databases.
What are the two types of structured data?
-The two types of structured data are transactional workloads and analytical workloads.
What are transactional workloads and when are they used?
-Transactional workloads originate from Online Transaction Processing systems and are used when fast data inserts and updates are required to build row-based records, typically for maintaining a system snapshot.
What is the role of SQL in handling transactional data?
-SQL is used for querying transactional data, particularly when accessing it through systems like Cloud SQL or Spanner, which support structured query languages for efficient manipulation of data.
What is the difference between Cloud SQL and Spanner?
-Cloud SQL works best for local to regional scalability, while Spanner is designed to scale a database globally.
What is BigQuery used for, and what makes it suitable for analytical workloads?
-BigQuery is used for analyzing petabyte-scale datasets and is suitable for analytical workloads because it can handle complex queries and large-scale data aggregations efficiently.