GCP - BigQuery

Ashutosh Mishra
23 Aug 202121:50

Summary

TLDRThis script offers an in-depth look at Google's BigQuery, a fully managed, serverless data warehouse solution designed for analytical use cases. It highlights BigQuery's ability to handle large-scale data analysis with its separation of storage and compute, allowing for scalability and cost-effectiveness. The script also covers various data ingestion methods, unique features like machine learning integration, and compares BigQuery with other cloud data warehouse solutions, emphasizing its serverless advantage and ease of use.

Takeaways

  • 😀 Relational databases are characterized by ACID properties, supporting atomicity, consistency, isolation, and durability.
  • 🔗 Cloud SQL is a managed SQL variant that offers vertical scaling, while Cloud Spanner provides horizontal scaling along with Cloud SQL's features.
  • 📚 NoSQL databases offer flexible schemas and come in various types such as wide column, key-value pair, document, and case-based databases.
  • 🌐 Bigtable is a wide column database with an HBase interface, making it suitable for big data projects.
  • 📊 For analytical and business intelligence use cases, data warehouses like BigQuery are essential for ingesting and analyzing data from various sources.
  • 🛠️ BigQuery is Google Cloud's serverless, petabyte-scale, and cost-effective analytics data warehouse designed for OLAP use cases.
  • 🌐 BigQuery's architecture decouples storage and compute, allowing independent scaling and providing flexibility and cost control.
  • 💾 Storage in BigQuery is managed by Colossus, Google's global storage system optimized for reading large amounts of structured data.
  • 🔍 BigQuery's compute is powered by Dremel, which executes SQL queries and manages the execution tree, mixers, and slots for processing power.
  • 📈 BigQuery offers unique features like multi-cloud capabilities with BigQuery Omni, built-in machine learning with BigQuery ML, and integration with BI tools through BI Engine.
  • 🌍 BigQuery provides public datasets, allowing users to query up to one terabyte of data per month at no cost from a repository of over 200 high-demand datasets.

Q & A

  • What are the key features of relational databases?

    -Relational databases support ACID properties which include atomicity, consistency, isolation, and durability. They also support relational hierarchy.

  • What are the differences between Cloud SQL and Cloud Spanner in Google Cloud?

    -Cloud SQL is a managed SQL variant that provides vertical scaling, while Cloud Spanner offers everything Cloud SQL provides plus horizontal scaling.

  • What types of NoSQL databases are mentioned in the script?

    -The script mentions wide column, key-value pair, and document-based NoSQL databases.

  • Which Google Cloud services are used for NoSQL databases?

    -Bigtable, which is a wide column database, and Memorystore, which is a managed in-memory data store, are mentioned as Google Cloud services for NoSQL databases.

  • What is the purpose of a data warehouse in the context of the script?

    -A data warehouse is used for business intelligence use cases where all data is ingested at one place for reporting tools to provide actionable insights. It supports analysis on both batch and real-time data.

  • What is BigQuery and how does it differ from traditional data warehouses?

    -BigQuery is Google Cloud's fully managed, serverless, and petabyte-scale analytics data warehouse designed for OLAP use cases. It differs from traditional data warehouses by decoupling storage and compute, allowing independent scaling and offering a serverless architecture.

  • How does BigQuery's architecture support its serverless nature?

    -BigQuery's architecture decouples storage and compute, connected via a petabit network, allowing it to scale independently on demand without the need for managing any infrastructure.

  • What are the components of BigQuery's architecture?

    -BigQuery's architecture includes Dremel for compute, Colossus for storage, Jupiter for the petabit network, and Borg for orchestration.

  • How does BigQuery handle data ingestion?

    -BigQuery allows data ingestion through streaming, batch loading, or bulk data uploads. Data can be accessed via SQL compliant clients, REST API, web UI, CLI, and client libraries in multiple languages.

  • What are some unique features of BigQuery?

    -BigQuery offers multi-cloud capabilities with BigQuery Omni, built-in machine learning with BigQuery ML, integration with Vertex AI, BI Engine for accelerating BI workloads, connected sheets for analyzing data in Google Sheets, geospatial data types, federation to process external data sources, and access to public datasets.

  • How does BigQuery compare to other cloud data warehouse solutions like AWS Redshift and Snowflake?

    -BigQuery is a true serverless solution with no need to manage nodes or infrastructure, offering on-demand or flat-rate pricing based on slots, and native AI/ML support with Google Cloud services.

Outlines

00:00

🗂️ Introduction to Databases and BigQuery

This paragraph introduces the concept of databases, focusing on relational databases and their ACID properties. It differentiates between Cloud SQL and Cloud Spanner for relational data, and Bigtable and Memory Store for NoSQL data. The paragraph then transitions to analytical use cases, emphasizing the need for a data warehouse and business intelligence solutions like BigQuery. BigQuery is described as a fully managed, serverless data warehouse that scales to petabyte levels and is cost-effective, supporting OLAP operations and various analytical features.

05:02

🛠️ BigQuery Architecture and Usage

The speaker delves into BigQuery's architecture, highlighting its serverless nature where storage and compute are decoupled. BigQuery's ability to ingest data through streaming or batch loads and access it via various interfaces like SQL clients, REST API, and CLI is discussed. The architecture's flexibility and cost control are emphasized, contrasting traditional data warehouse solutions. The paragraph also covers BigQuery's underlying technology stack, including Dremel for compute, Colossus for storage, Jupiter for networking, and Borg for orchestration, and the importance of efficient query writing to manage costs.

10:02

📊 BigQuery Data Ingestion and Query Execution

This section explains how data is loaded into BigQuery, either by creating a new table or modifying an existing one. It outlines the process of using gsutil to upload data to Cloud Storage and then transferring it to BigQuery. The paragraph also mentions the user-friendly interface of BigQuery, where SQL queries can be executed to analyze large datasets quickly. The efficiency of BigQuery in reading only the necessary columns for a query is highlighted, along with the ease of use and the lack of operational overhead for users.

15:03

🌐 BigQuery Features and Integrations

The paragraph discusses the unique features of BigQuery, including multi-cloud capabilities with BigQuery Omni, built-in machine learning with BigQuery ML and integration with Vertex AI, and the BI Engine for accelerating BI workloads. It also covers the ability to analyze large datasets in Google Sheets through Connected Sheets, support for geospatial data types, and federation capabilities to process external data sources. The availability of public datasets in BigQuery is mentioned, allowing users to query up to one terabyte of data per month for free.

20:06

🏅 BigQuery as a Cloud Data Warehouse Solution

The final paragraph compares BigQuery with other cloud data warehouse solutions like AWS Redshift, SQL Data Warehouse, and Snowflake. It emphasizes BigQuery's serverless design, which eliminates the need for upfront investment and operational management. BigQuery's flexible pricing model based on slots, reduction in operational expenses, and native AI/ML support are highlighted as reasons why BigQuery stands out as a clear winner among cloud data warehouse solutions.

Mindmap

Keywords

💡Relational Databases

Relational databases are a type of database that stores and retrieves data based on the relational model. They use tables, rows, and columns to organize data and relationships between tables are defined through keys. In the video, relational databases are discussed in the context of their ACID properties, which stand for Atomicity, Consistency, Isolation, and Durability, ensuring reliable data handling.

💡ACID

ACID is a set of properties that guarantee that database transactions are processed reliably. The acronym stands for Atomicity, Consistency, Isolation, and Durability. The video emphasizes that relational databases support ACID, which is crucial for maintaining data integrity and ensuring that transactions are processed accurately and completely.

💡Cloud SQL

Cloud SQL is a fully managed database service provided by Google Cloud. It supports MySQL, PostgreSQL, and SQL Server. The video mentions Cloud SQL as a managed SQL variant that offers vertical scaling, meaning it can increase the capacity of a single server to handle more workload.

💡Cloud Spanner

Cloud Spanner is a fully managed, horizontally scalable, relational database service by Google Cloud. Unlike Cloud SQL, which offers vertical scaling, Cloud Spanner provides horizontal scaling, allowing it to distribute the database across multiple servers, thus scaling out to handle more significant loads and larger datasets.

💡NoSQL

NoSQL databases are non-relational databases that are more flexible with their data storage and retrieval. They can handle a variety of data models, including key-value, document, wide column, and graph. The video discusses NoSQL in the context of its flexible schema and suitability for different types of data storage needs.

💡Bigtable

Bigtable is a fully managed, scalable NoSQL database service by Google Cloud that is optimized for large analytical and operational workloads. It is mentioned in the video as a wide-column database with an HBase interface, making it suitable for big data projects.

💡Data Warehouse

A data warehouse is a system used for reporting and data analysis. It centralizes data from one or more disparate sources, providing a single point of access for business intelligence tools. The video discusses the need for data warehouses in the context of ingesting all data for reporting and analysis.

💡BigQuery

BigQuery is Google Cloud's fully managed, serverless data warehouse that enables scalable analysis over petabytes of data. The video describes BigQuery as a solution for OLAP (Online Analytical Processing) use cases, providing built-in machine learning, geospatial analysis, and business intelligence capabilities.

💡Serverless Architecture

Serverless architecture refers to a computing model where the cloud provider dynamically manages the allocation and provisioning of servers. In the context of BigQuery, as mentioned in the video, this means that there is no need for users to manage any infrastructure, allowing them to focus on querying and analyzing data without worrying about operational overhead.

💡Data Transfer Services

Data Transfer Services in BigQuery allow for the automated transfer of data from various SaaS applications and partners into BigQuery. The video explains how these services can be used to ingest data from sources like Google Maps, YouTube, and other SaaS products into BigQuery for analysis.

💡BI Engine

BI Engine is an in-memory analysis service in BigQuery that accelerates BI workloads. It enables sub-second query response times and high concurrency for popular BI tools. The video discusses how BI Engine can be used to integrate with BI tools like Tableau for faster data analysis.

Highlights

Relational databases support ACID properties: atomicity, consistency, isolation, and durability.

Cloud SQL is a managed SQL variant with vertical scaling, while Cloud Spanner offers horizontal scaling.

NoSQL databases offer flexible schemas and include wide column, key-value pair, and document-based databases.

Bigtable is a wide column database with an HBase interface, suitable for big data projects.

Memorystore is a managed in-memory data store, and Filestore is a document NoSQL used for mobile and web clients.

Data warehouses and business intelligence use cases require the ability to ingest and analyze data from various sources.

BigQuery is Google Cloud's fully managed, serverless data warehouse for OLAP use cases.

BigQuery's architecture decouples storage and compute, allowing independent scaling.

Dremel, part of BigQuery's architecture, is a multi-tenant service for executing SQL queries.

Colossus is Google's global storage system, optimized for reading large amounts of structured data.

BigQuery's pricing is based on the number of slots used, which determines processing power.

BigQuery can be accessed through various methods including GCP console, command line, REST APIs, and client libraries.

Data can be loaded into BigQuery from Cloud Storage using gsutil or bq command, or directly through the web UI.

BigQuery's SQL interface is intuitive, providing query results and statistics on data processed and time taken.

BigQuery ML allows running machine learning models using SQL dialect, simplifying AI integration.

BI Engine accelerates BI workloads by providing sub-second query response times for popular BI tools.

Connected Sheets enables analysis of BigQuery data in Google Sheets without SQL knowledge.

BigQuery GIS supports geospatial analysis, combining BigQuery's serverless architecture with location intelligence.

BigQuery can federate external data sources, processing data in various formats without moving it into BigQuery.

Public datasets in BigQuery offer over 200 high-demand datasets from different industries for free querying up to 1 TB per month.

BigQuery is a clear winner in cloud data warehouse solutions due to its serverless design and native AI/ML support.

Transcripts

play00:00

hi

play00:01

let's revisit what we have learned so

play00:03

far

play00:04

when it comes to databases we have

play00:06

relational databases

play00:08

they have acid support it's atomic in

play00:10

nature consistency isolation and it

play00:13

provides you durability

play00:15

and it also supports relational

play00:17

hierarchy

play00:18

when it comes to relational we have two

play00:20

options cloud sql and cloud spanner

play00:23

cloud sql user which is a managed sql

play00:26

variant

play00:28

it provides vertical scaling

play00:30

whereas cloud is panel provides

play00:32

everything

play00:34

cloud sql provides plus it provides

play00:37

horizontal scaling

play00:39

on the nosql side

play00:42

nosql flexible schemas wide column key

play00:45

value pair document databases case-based

play00:48

databases different options

play00:50

we looked at bigtable which is a wide

play00:53

column db

play00:54

uh it also has hbase interface so it's a

play00:57

good adoption for a lot of big data

play01:00

projects

play01:01

memory store which is a managed radish

play01:03

in-memory data store file store which is

play01:06

a document nosql and then it's also used

play01:09

for mobile and web client

play01:12

now that leaves us to

play01:16

analytical

play01:17

kind of use cases so there is a

play01:20

requirement for building a data

play01:22

warehouse

play01:23

and business intelligence use cases

play01:25

where you want to ingest all of the data

play01:28

at one place so that all of the

play01:31

reporting tools can connect to that data

play01:34

and

play01:35

give you actionable insights

play01:37

also

play01:39

you want to do analysis on batch as well

play01:42

as real time data

play01:45

you want to create a store

play01:47

where

play01:48

you can sync everything source from

play01:51

everywhere so it's got nothing to do

play01:53

with the specific structure of data

play01:56

nothing to do with

play01:58

again specific way to define that data

play02:01

but

play02:02

any data in any format

play02:04

you should be able to sync it in

play02:07

and make it available for others to use

play02:09

it

play02:11

you need a place where you can run pi

play02:13

reports machine learning uh machine

play02:16

learning models so typically um in the

play02:18

on-prem world we used to have a lot of

play02:21

solutions for data warehousing

play02:25

but uh on cloud

play02:28

there are some specific offerings from

play02:31

different cloud

play02:31

[Music]

play02:34

platforms for their cloud data

play02:36

warehouses

play02:37

in google or within google the option is

play02:41

bigquery so let's let's try to

play02:43

understand this space now

play02:46

so what is bigquery bigquery is google

play02:49

cloud's fully managed by fully managed

play02:53

what it means that it's serverless is

play02:56

completely serverless you don't have to

play02:58

manage any infrastructure behind it it's

play03:01

a petabyte to scale

play03:03

and cost effective analytics data

play03:05

warehouse

play03:08

if you ca if you understand the oltp or

play03:10

olap

play03:12

space

play03:13

it's meant for olap kind of use cases

play03:16

analytical use cases

play03:18

that helps you manage and analyze data

play03:21

with built-in features like machine

play03:23

learning geospatial analysis and

play03:26

business intelligence

play03:29

now let's look at

play03:31

the architecture of bigquery

play03:34

so bigquery's serverless architecture

play03:38

decouples

play03:39

storage and compute

play03:42

so you can see this is the storage bit

play03:45

of it this is the compute bit of it

play03:48

and they are connected via a petabit

play03:52

network

play03:55

you can ingest data

play03:58

as in stream you can

play04:00

load batch or bulk data

play04:03

and these are the different ways you can

play04:05

access this

play04:06

uh through sql compliant clients rest

play04:09

api web ui or cli and then you have

play04:13

client libraries in in about seven seven

play04:16

languages

play04:19

now

play04:20

the decoupling of storage and compute

play04:23

actually allows bigquery to scale

play04:26

independently on demand

play04:28

this

play04:29

structure offers both immense

play04:32

flexibility

play04:34

and cost control for customers

play04:37

because they don't need to keep their

play04:40

expensive computer resources up and

play04:43

running all the time

play04:44

and this is very different

play04:47

from a traditional node based cloud data

play04:50

warehouse solutions or even

play04:52

on premises

play04:53

mpp based systems this approach allows

play04:58

customers

play04:59

to bring in any size of their data into

play05:02

data warehouse and start analyzing their

play05:05

data

play05:06

without worrying about

play05:08

database operations and system

play05:10

engineering

play05:12

now let's

play05:13

dig

play05:14

a bit deeper into

play05:16

uh

play05:16

bigquery architecture so under the hood

play05:20

uh bigquery employs a

play05:22

vast set of multi-tenant services driven

play05:25

by low-level google infrastructure

play05:28

technology like dremel colossus jupiter

play05:31

and borg so if you look at this

play05:33

architecture

play05:37

compute is dremel so dremel is compute

play05:42

it's a large multi-tenant cluster that

play05:46

executes sql queries so all of the sql

play05:50

queries they get executed at this level

play05:54

dremel turns sql into sort of an

play05:58

execution tree

play06:02

the

play06:03

leaves of the tree are called slots

play06:06

and to do the heavy lifting of reading

play06:09

data from storage and any necessary

play06:11

computation

play06:13

the branches

play06:15

of this tree are called mixers

play06:20

which perform the aggregation

play06:23

now any time

play06:25

um when when you're looking to

play06:28

purchase bigquery or use it for your

play06:30

organization it's basically these slots

play06:34

is how

play06:35

you know your pricing comes into picture

play06:38

so based on the different

play06:40

number of slots that you buy

play06:43

and you use those slots

play06:47

is uh basically this whole execution uh

play06:52

tree and

play06:53

and the leaves

play06:55

on that tree so

play06:57

which in turns is about the processing

play07:00

power of it then storage is colossus

play07:06

clausus which is the google's global

play07:08

storage system

play07:11

it leverages the columnar storage format

play07:14

and compression algorithm

play07:16

to store data

play07:18

and it's optimized for reading large

play07:21

amount of structured data

play07:23

then you have

play07:27

jupiter in between which is

play07:31

the petabytes network and that connects

play07:34

dremel and colossus which in the sense

play07:36

is compute and storage network

play07:40

and then bigquery is orchestrated by

play07:42

borg

play07:45

which is google's precursor to

play07:48

kubernetes so before google built

play07:50

kubernetes

play07:52

it was using borg

play07:54

and the mixers and

play07:56

the slots they are all run by blog which

play08:00

allocates hardware resources now the

play08:02

most important control that you can have

play08:05

on bigquery is basically how you write a

play08:08

query because the costing is determined

play08:11

by the amount of data it's processing

play08:15

so

play08:16

you need to be very diligent in terms of

play08:19

writing the specific queries

play08:21

so

play08:22

you can limit the amount of data

play08:24

processed so select star form is

play08:26

definitely

play08:27

not a good option for

play08:29

bigquery

play08:31

with that let's uh

play08:34

look at how you can use bigquery

play08:37

so

play08:38

bigquery can be accessed in multiple

play08:40

ways using the gcp console

play08:43

command line tool bq

play08:46

by by using rest apis and then you can

play08:49

use the client library such as java.net

play08:51

or python

play08:53

now while

play08:54

loading data into

play08:58

um

play08:58

big

play08:59

query you can either create a new table

play09:02

or append to or overwrite an existing

play09:05

table so this is a typical structure

play09:10

how you will look at loading data so

play09:13

here is your data you will use the

play09:14

gsutil command to put that data into

play09:17

cloud storage

play09:19

and then

play09:20

from there you can use bq tool to pull

play09:22

that data into bigquery

play09:25

similarly you can do that from web ui

play09:28

console you from web ui console you can

play09:31

load this data directly

play09:33

and then you'll you can use bigquery

play09:35

apis to actually query that data

play09:37

and that queried result can be used

play09:40

outside

play09:42

if you look at the interface so it's a

play09:46

very intuitive interface so you have a

play09:48

query editor like in fact you can think

play09:51

of any sql client you write a sql query

play09:56

uh it will give you result

play09:58

and at the same time it will give you

play10:01

that

play10:02

what was the time elapsed and what was

play10:04

the amount of data that it

play10:07

processed

play10:09

so

play10:10

as evident in in this particular example

play10:13

it takes less than two seconds to

play10:16

analyze uh in this particular case 28

play10:19

gigabytes of data and and return the

play10:22

results

play10:24

bigquery engine is actually very smart

play10:26

to read only the columns required to

play10:28

execute the query

play10:32

now

play10:34

the best thing about bigquery is this is

play10:37

what you use you bring your data and

play10:40

then you write queries

play10:42

and you know

play10:44

you you get your results you don't have

play10:46

to worry about where this query is

play10:48

running how it is running

play10:51

what sort of compute required to run

play10:53

this query or any of the operational

play10:57

overhead you just bring in your data and

play11:00

you execute your query

play11:03

with that

play11:04

let's uh look at what are the different

play11:09

ways you can actually bring data into

play11:12

into query so

play11:15

there are many different ways uh

play11:19

for file csv or json or every kind of

play11:22

data the

play11:24

process we looked at you pull that data

play11:27

into cloud

play11:28

storage using gsutil command

play11:31

or any of the client libraries and then

play11:33

from cloud storage

play11:35

you will push it to bigquery using bq

play11:38

command

play11:42

then

play11:44

you can use

play11:46

the data transfer services so bigquery

play11:48

has a data transfer services to transfer

play11:51

data from sas applications so you can

play11:54

use uh sas

play11:57

dds connectors to pull data from google

play12:00

maps youtube and uh

play12:03

there's a long list of sas products

play12:05

marketing products a lot of products

play12:08

from where

play12:09

using the connector you can ingest data

play12:11

into bigquery or you can use the partner

play12:14

dts connectors as well

play12:17

then apart from that

play12:18

data fusion is google's etl tool

play12:21

uh

play12:22

any database that it supports a plug-in

play12:25

or a connector you can use those

play12:27

connectors uh from a lot of different

play12:30

kind of databases to pull direct data

play12:32

directly into bigquery

play12:36

then from sap point of view you have sap

play12:39

data services that you can use to ingest

play12:41

data directly into bigquery and then

play12:44

apart from that you have the partner

play12:45

integration with lot of marketplace etl

play12:48

tools like informatica fight tran or

play12:51

confluent which you can use to push data

play12:54

into

play12:56

bigquery

play12:59

now what are the other

play13:02

some of the very unique features of

play13:04

bigquery

play13:05

bigquery provides

play13:07

multi-cloud capabilities in the sense of

play13:10

bigquery omni which is in preview it

play13:13

allows you to analyze data across clouds

play13:16

using standard sql and without leaving

play13:19

bigquery's familiar interface

play13:24

then

play13:24

it has built-in machine learning and ai

play13:27

integration so besides bringing machine

play13:30

learning to data with bigquery ml

play13:33

integration with vertex ai

play13:36

which is again the manage platform for

play13:39

entire machine learning life cycle and

play13:42

tensorflow enables you to train and

play13:45

execute powerful models on structured

play13:48

data in minutes just with just sql so

play13:51

this is very important feature that

play13:54

you can run machine learning model

play13:56

from

play13:57

sql dialect

play14:00

bi engine so

play14:02

to accelerate bi workloads so anytime

play14:06

you have data warehouse its uh

play14:08

bi tools will be integrating with that

play14:11

data

play14:13

like uh

play14:14

tableau so

play14:17

you can turn on bi engine it's an

play14:19

in-memory analysis service to achieve

play14:22

sub-second query response time and high

play14:25

concurrency for popular bi tools so any

play14:28

bi tool which uses odbc or jdbc

play14:30

connection you can hook that into

play14:32

bigquery

play14:34

uh through bi engine

play14:37

connected sheets it allows users to

play14:40

analyze billions of rows of live

play14:43

bigquery data in google sheets

play14:46

without

play14:48

knowing sql so it's a very handy tool

play14:50

for business users to play around with

play14:52

data

play14:54

geospatial data types so bigquery gis it

play14:58

combines the serverless architecture of

play15:00

bigquery with native support for

play15:03

geospatial analysis so you can

play15:06

augment your analytical workflows with

play15:09

location intelligence

play15:13

federation federation is very important

play15:15

bigquery can process external data

play15:18

sources in objective storage like cloud

play15:20

storage

play15:22

for different file formats like par k

play15:25

orc

play15:27

transactional databases like bigtable

play15:29

cloud sql or spreadsheets in in your

play15:32

google drive

play15:34

all this can be done without moving the

play15:36

data to bigquery

play15:39

the last one is public data sets

play15:42

and this is very this is a very useful

play15:45

feature google cloud's uh

play15:48

you know public data set repository

play15:50

offers a powerful

play15:53

data repository of more than 200 high

play15:56

demand public data sets from different

play15:58

industries

play16:00

and these data sets are available

play16:03

for you

play16:04

to import into or

play16:06

attach into your

play16:08

bigquery projects and you can

play16:10

straightaway start querying that

play16:12

and you can query up to one terabyte of

play16:15

data per month at no cost

play16:20

now with this let's let's quickly take a

play16:22

look at uh

play16:25

bigquery uh

play16:27

console

play16:29

this is a very familiar interface of

play16:31

bigquery

play16:33

you have sql workspace and the data

play16:36

transfer methods and you can also

play16:38

schedule queries sql workspace is

play16:41

is pretty much

play16:43

the project that you create

play16:45

and

play16:46

the

play16:47

200 plus

play16:49

uh public data sets that that's

play16:52

available to you it's very easy to load

play16:54

data into

play16:56

into your project you can just click on

play16:58

add data and add and you can follow

play17:00

through that in the lab section

play17:03

now when we come on to this site the sql

play17:06

query is like any sql compliant standard

play17:09

query you run this query and it will

play17:12

give you the results the same time it

play17:14

will give you all the stats

play17:16

that this query processed

play17:18

these many megabytes uh

play17:21

how long did it take for the query to

play17:23

execute and then you can save the

play17:25

results and explore the data out of the

play17:28

box itself

play17:30

so with that it's very easy to bring

play17:33

data into bigquery

play17:36

and that's it

play17:37

you don't have to worry about any

play17:39

infrastructure management or

play17:42

worrying about how the queries are run

play17:44

you can start writing your query

play17:47

and

play17:47

start getting the results and apart from

play17:50

that as we talked about it has some

play17:52

amazing and unique features

play17:55

to make your cloud data warehouse

play17:58

much more than

play18:00

what a typical analytical uh data

play18:03

warehouse solution will do

play18:05

let's look at uh perspective on all of

play18:09

the cloud data warehouse solutions

play18:10

that's available in the market

play18:14

so

play18:15

when

play18:16

we look at aws red shift

play18:20

red shift is based on concept of nodes

play18:24

which are again virtual nodes

play18:27

but you need to deploy configure and

play18:30

manage them

play18:31

so there is leader node and then you

play18:34

have compute nodes

play18:37

when you come to seo side as your sql

play18:40

data warehouse or the

play18:43

synapse analytics

play18:44

is again a cloud-based but you have

play18:47

control node

play18:48

and then you have compute nodes

play18:55

what it does is that it leverages a

play18:58

mpp architecture

play19:00

to process polybase t sql queries

play19:05

then you have snowflake

play19:08

which is a managed data warehouse as a

play19:10

service

play19:11

that can be deployed on aws or azure

play19:15

infrastructure

play19:16

snowflake also separates

play19:19

compute and storage resources and makes

play19:23

use of an mpp architecture behind the

play19:26

scenes

play19:27

but

play19:28

at the time of deployment

play19:31

you have to select a pre-configured

play19:34

virtual data warehouses in in various

play19:37

sizes

play19:38

like

play19:39

small

play19:40

to

play19:41

medium

play19:43

to large and extra largest so it's kind

play19:45

of a teaser sizing

play19:48

and it also provides you a separate

play19:50

virtual warehouse for ingesting the data

play19:54

now when we look at the

play19:56

google

play19:57

bigquery

play20:00

nothing you have to manage

play20:02

as long as you can bring in your data

play20:06

you bring in your data and then you just

play20:08

start writing queries so you don't have

play20:10

to worry about

play20:12

any of the underlying infrastructure

play20:15

or any operation

play20:17

or anything to do with node or node

play20:19

configuration so google bigquery is

play20:22

truly a serverless

play20:24

cloud data warehouse solution which

play20:26

gives you

play20:28

analytical capability

play20:30

and more

play20:33

so to put it into perspective why

play20:36

bigquery is a clear winner when it comes

play20:39

to cloud data warehouse solutions is

play20:41

elimination of upfront investment and

play20:44

planning

play20:45

bigquery serverless design is built

play20:48

monthly with flexible on demand or flat

play20:52

rate pricing which is based on slots

play20:56

it gives you reduction in operational

play20:58

expenses

play20:59

it eliminates the need to manage virtual

play21:04

enterprise data by house nodes as well

play21:06

as the need to monitor troubleshoot

play21:09

updates

play21:12

tune

play21:12

or any

play21:14

plan for growth

play21:18

it scales up or down as needed to meet

play21:21

the changing

play21:23

needs of your data

play21:26

it

play21:27

also reduces the time spent on

play21:30

like etl management or new schema

play21:32

modifications

play21:35

and it provides you native aiml

play21:39

support

play21:40

with native integration with most of its

play21:44

google's cloud services

play21:46

so i hope this was useful thank you

Rate This

5.0 / 5 (0 votes)

Связанные теги
BigQueryData AnalyticsServerlessCloud Data WarehouseSQL QueriesData StorageMachine LearningCloud SQLData TransferBI Integration
Вам нужно краткое изложение на английском?