Amazon Redshift Tutorial | Amazon Redshift Architecture | AWS Tutorial For Beginners | Simplilearn

Simplilearn
25 Feb 202022:37

Summary

TLDRThis video introduces Amazon Redshift, a cloud-based data warehouse service on AWS, emphasizing its high performance and cost-effectiveness. It covers the basics of AWS, the need for Redshift, its architecture, advantages, and use cases. The speaker guides viewers through creating an IAM role for Redshift, launching a cluster, and demonstrating data migration from S3 to Redshift using SQL Workbench/J. The tutorial aims to simplify the understanding of Redshift's setup and operations for data management.

Takeaways

  • 🌐 Amazon Redshift is a data warehouse service provided by Amazon Web Services (AWS), designed for collecting and storing large amounts of data.
  • 📈 AWS is a leading cloud service platform that offers secure cloud services and allows for pay-as-you-go pricing models.
  • 🛠 Traditional data warehouses were often challenging to maintain due to issues with network connectivity, security, and high maintenance costs.
  • 🚀 Amazon Redshift addresses these issues by offering a cloud-based solution that is scalable, cost-effective, and simplifies data management.
  • 🏢 Companies like DNA, a telecommunication company, have seen a significant increase in application performance by using Amazon Redshift for data management.
  • 💰 Amazon Redshift is considered cost-effective compared to other cloud data warehouse services and offers high performance.
  • 🔑 The service provides advantages such as high performance, low cost, scalability, availability, security, flexibility, and ease of database migration.
  • 🏭 The architecture of Amazon Redshift consists of a leader node that manages client applications and compute nodes that process data.
  • 📚 Redshift utilizes column storage and compression techniques to optimize query performance and reduce storage requirements.
  • 🏷️ Large enterprises such as Pfizer, McDonald's, and Philips rely on Amazon Redshift for their data warehousing needs.
  • 🔍 The script includes a demo that guides viewers through creating an IAM role, launching a Redshift cluster, and using the COPY command to move data from S3 to Redshift.

Q & A

  • What is Amazon Redshift?

    -Amazon Redshift is a cloud-based data warehouse service provided by Amazon Web Services (AWS) that is primarily used for collecting, storing, and analyzing large amounts of data using business intelligence tools.

  • Why was Amazon Redshift introduced?

    -Amazon Redshift was introduced to solve the traditional data warehouse problems that developers faced, such as time-consuming data retrieval, high maintenance costs, and potential loss of information during data transfer.

  • What are some advantages of using Amazon Redshift?

    -Some advantages of Amazon Redshift include high performance, low cost, scalability, availability across multiple zones, security features, flexibility in managing clusters, and ease of database migration.

  • How is Amazon Redshift different from traditional data warehouses?

    -Amazon Redshift differs from traditional data warehouses by being a cloud-based service that offers faster performance, lower operational costs, and the ability to scale resources on-demand without the need for hardware procurement.

  • What is the significance of column storage in Amazon Redshift?

    -Column storage in Amazon Redshift is significant because it optimizes query performance by making it easier and quicker to pull out data from specific columns when running queries.

  • What is compression in the context of Amazon Redshift?

    -Compression in Amazon Redshift is a column-level operation that decreases storage requirements and improves query performance by reducing the amount of data that needs to be read from disk.

  • Can you name some companies that use Amazon Redshift?

    -Some companies that use Amazon Redshift include LYA, Equinox, Pfizer, McDonald's, and Philips.

  • What is the purpose of creating an IAM role for Amazon Redshift?

    -Creating an IAM (Identity and Access Management) role for Amazon Redshift allows the service to access other AWS services, such as S3, by granting the necessary permissions in a secure manner.

  • How can data be transferred from an S3 bucket to an Amazon Redshift cluster?

    -Data can be transferred from an S3 bucket to an Amazon Redshift cluster using the COPY command, which allows for direct data loading into Redshift tables from S3.

  • What is the importance of the leader node in Amazon Redshift architecture?

    -The leader node in Amazon Redshift architecture is important as it manages the interaction between client applications and compute nodes, sends out instructions for database operations, and aggregates results before delivering them to the client application.

  • How can users connect to an Amazon Redshift cluster to run queries?

    -Users can connect to an Amazon Redshift cluster to run queries using SQL client applications like SQL Workbench/J, or directly through the AWS Management Console's query editor.

Outlines

00:00

🚀 Introduction to Amazon Redshift

The video introduces Amazon Redshift, a data warehouse service on AWS. The speaker, Akil, encourages viewers to subscribe and begins by explaining the basics of AWS as a cloud service platform. He then discusses the traditional challenges of data warehousing, such as geographical barriers, connectivity issues, and high maintenance costs. The introduction of Amazon Redshift is presented as a solution to these problems, offering a cloud-based, scalable, and cost-effective service for managing large datasets. The video promises to cover the advantages of Redshift, its architecture, associated concepts, company use cases, and a practical demo.

05:02

🔑 Benefits and Architecture of Amazon Redshift

This paragraph delves into the advantages of Amazon Redshift, highlighting its high performance, low cost, scalability, availability, security, and flexibility. The speaker explains that Redshift's architecture consists of a leader node and compute nodes, which together form a data warehouse cluster. The leader node manages client applications and sends instructions to the compute nodes for data processing. The compute nodes, scalable in number, are responsible for executing these instructions and returning results. The paragraph also touches on additional concepts like column storage and data compression, which contribute to Redshift's efficiency.

10:04

🏢 Companies Utilizing Amazon Redshift and Upcoming Demo

The speaker lists知名企业 that utilize Amazon Redshift, such as LYA, Equinox, Pfizer, McDonald's, and Philips, emphasizing the service's reliability and widespread adoption. The paragraph concludes with a teaser for an upcoming demo, which will guide viewers through creating an IAM role for Redshift, launching a sample Redshift cluster, assigning security groups, and using the AWS Management Console's query editor to run queries on the cluster.

15:05

🛠️ Step-by-Step Redshift Cluster Creation and Data Upload

The paragraph outlines the process of creating an Amazon Redshift cluster, starting with the creation of an IAM role to grant Redshift access to S3 services. It details the steps to launch a cluster, configure VPC security groups, and use the query editor to run SQL commands. The speaker also explains how to copy data from an S3 bucket to a Redshift table using the COPY command, which requires specifying the table name, S3 path, and IAM role ARN. The paragraph provides a practical approach to using Redshift for data storage and querying.

20:06

🔄 Data Migration and Query Execution in Redshift

This final paragraph focuses on the migration of data to Redshift and the execution of queries on the uploaded data. The speaker demonstrates the creation of a 'sales' table in Redshift and the use of the COPY command to transfer data from an S3 bucket to this table. After the data migration, the speaker executes a query to retrieve results from the 'sales' table, ensuring the data has been correctly uploaded. The paragraph concludes with a reminder for viewers to subscribe to the channel for more AWS-related content and a call to action for certification through Simply Learn's YouTube channel.

Mindmap

Keywords

💡Amazon Redshift

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud provided by Amazon Web Services (AWS). It is designed to make it simple and cost-effective to efficiently analyze all data using fast and flexible analytic capabilities. In the video, Amazon Redshift is the central topic, with discussions on its advantages, architecture, and use cases, illustrating its role in modern data warehousing and analytics.

💡AWS

AWS stands for Amazon Web Services, which is one of the largest cloud service providers in the market. It offers a secure cloud service platform where users can create and deploy applications using various AWS services. In the context of the video, AWS is the provider of Amazon Redshift, emphasizing the integration of Redshift with other AWS services.

💡Data Warehouse

A data warehouse is a system used for reporting and data analysis, and is considered a core component of business intelligence. It stores data from one or more sources and serves the purpose of data analysis. In the script, the traditional challenges of data warehouses are discussed, such as the difficulty of fetching data and high maintenance costs, which Amazon Redshift aims to solve.

💡Scalability

Scalability refers to the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. In the video, scalability is highlighted as one of the key advantages of Amazon Redshift, allowing users to scale up or down resources on the fly based on their requirements.

💡Column Storage

Column storage is a database storage approach where data is stored column-wise instead of row-wise. This method is beneficial for analytic workloads as it allows for faster query performance by reducing the amount of data that needs to be read. The script explains column storage as an essential factor in optimizing query performance in Amazon Redshift.

💡Compression

Compression in the context of databases refers to the process of reducing the size of data to save storage space and improve query performance. The script mentions column-level compression as a feature of Amazon Redshift, which decreases storage requirements and enhances the speed of data retrieval.

💡Leader Node

In Amazon Redshift, the leader node is responsible for coordinating and distributing the workload among the compute nodes. It manages client connections and directs the execution of SQL commands. The script describes the leader node as a critical component of the Redshift architecture, which sends instructions to compute nodes and aggregates results.

💡Compute Nodes

Compute nodes in Amazon Redshift are the individual database servers that perform the actual data processing. They work in parallel to handle queries and perform data operations. The script explains that these nodes are scalable and can be increased based on demand, which contributes to the high performance of Redshift.

💡JDBC

JDBC stands for Java Database Connectivity, which is a Java API that allows applications to connect to a database and execute SQL statements. In the video, JDBC is mentioned as a method for client applications to interact with the leader node in Amazon Redshift, emphasizing its role in enabling database connectivity.

💡ODBC

ODBC stands for Open Database Connectivity, which is a standard API for accessing database management systems. The script mentions ODBC as a means for the leader node to have direct interaction with Amazon Redshift, allowing for live data access and manipulation.

💡Database Migration

Database migration refers to the process of transferring data from one database system to another. In the context of the video, database migration is discussed as a straightforward process when moving from a traditional data center to Amazon Redshift, facilitated by inbuilt tools provided by AWS.

Highlights

Introduction to Amazon Redshift as a data warehouse service on AWS.

Request for subscription to the channel for more informative content.

Explanation of AWS as a secure cloud service platform provided by Amazon.

Traditional data warehouse challenges such as geographical limitations and high maintenance costs.

How Amazon Redshift overcomes the issues faced by traditional data warehouses.

Definition of Amazon Redshift as a cloud-based data warehouse service.

Use case of DNA, a telecommunication company, benefiting from Amazon Redshift with a 52% increase in application performance.

Cost-effectiveness and high performance of Amazon Redshift compared to other cloud data warehouse services.

Amazon Redshift's scalability allowing on-demand adjustment of database nodes.

High availability of Amazon Redshift across multiple availability zones.

Security features of Amazon Redshift including virtual private clouds and security groups.

Flexibility in managing Amazon Redshift clusters with snapshot and region transfer capabilities.

Simplicity of database migration to Amazon Redshift from traditional data centers.

Architecture of Amazon Redshift including leader nodes and compute nodes.

Column storage and compression techniques in Amazon Redshift for optimized query performance.

List of notable companies utilizing Amazon Redshift for their data warehousing needs.

Step-by-step demo on creating an Amazon Redshift cluster and setting up necessary permissions.

Demonstration of data migration from Amazon S3 to Redshift using SQL Workbench/J.

Final remarks encouraging subscription and highlighting the value of the presented information.

Transcripts

play00:07

hi guys uh this is akil and today we are

play00:10

going to discuss about amazon redshift

play00:12

which is one of the data warehouse

play00:13

service on the aws but uh before

play00:16

starting up with the amazon redshift i

play00:18

would request you guys to subscribe our

play00:21

channel you can find the link just below

play00:23

this video at the right side so let's

play00:25

begin with amazon redshift and let's see

play00:27

what we have for today's session so

play00:30

what's in it for you today we'll see

play00:32

what is aws

play00:33

why we require amazon redshift what do

play00:36

we mean by amazon redshift the

play00:38

advantages of amazon redshift the

play00:40

architecture of amazon redshift some of

play00:42

the additional concepts associated with

play00:44

the redshift and the companies that are

play00:46

using the amazon redshift and finally

play00:49

we'll cover up one demo which will show

play00:51

you the practical example that how you

play00:53

can actually use the redshift service

play00:55

now what is aws as we know that aws

play00:58

stands for amazon web service it's one

play01:00

of the largest cloud providers in the

play01:02

market and it's basically a secure cloud

play01:04

service platform provided from the

play01:06

amazon also on the aws you can create

play01:10

and deploy the applications

play01:13

using the aws service along with that

play01:16

you can access the services provided by

play01:18

the aws over the public network that is

play01:21

over the internet they are accessible

play01:24

plus you pay only whatever the service

play01:27

you use for now let's understand why we

play01:29

require amazon redshift so earlier

play01:32

before amazon redchef what used to

play01:34

happen that the people used to or the

play01:36

developers used to fetch the data from

play01:38

the data warehouse so data warehouse is

play01:40

basically a terminology which is

play01:42

basically represents the collection of

play01:45

the data so a repository where the data

play01:47

is stored is generally called as a data

play01:49

warehouse now fetching data from the

play01:51

data warehouse was a complicated task

play01:54

because might be a possibility that the

play01:56

developer is located at a different

play01:57

geography and the data data warehouse is

play02:01

at a different location and probably

play02:02

there is not that much network

play02:04

connectivity or

play02:06

some networking challenges internet

play02:08

connectivity challenges security

play02:10

challenges might be and a lot of

play02:12

maintenance was required to manage the

play02:14

data warehouses so what were the cons of

play02:17

the traditional data warehouse services

play02:19

it was time consuming to download or get

play02:22

the data from the data warehouse

play02:24

maintenance cost was high and there was

play02:27

the possibility of loss of information

play02:29

in between the downloading of the data

play02:32

and the data rigidity was an issue now

play02:35

how these problems could overcome and

play02:37

this was uh basically solved with the

play02:40

introduction of amazon redshift over the

play02:42

cloud platform now we say that amazon

play02:45

redshift has solved traditional data

play02:46

warehouse problems that the developers

play02:49

were facing but how what is amazon

play02:51

redshift actually is so what is amazon

play02:54

redshift it is one of the services over

play02:56

the aws amazon web services which is

play02:59

called as a data warehouse service so

play03:01

amazon redshift is a cloud-based service

play03:03

or a data warehouse service that is

play03:05

primarily used for collecting and

play03:06

storing the large chunk of data so it

play03:09

also helps you to get or extract the

play03:12

data analyze the data using some of the

play03:14

bi tools so business intelligence tools

play03:16

you can use and get the data from the

play03:18

redshift and process that and hence it

play03:20

simplifies the process of handling the

play03:22

large scale data sets so this is the

play03:25

symbol for the amazon redshift over the

play03:27

aws now let's discuss about one of the

play03:29

use case so dna is basically a

play03:32

telecommunication company and they were

play03:34

facing an issue with managing their

play03:36

website and also the amazon s3 data

play03:39

which led down to slow process of their

play03:41

applications now how could they overcome

play03:43

this problem let's say that so they

play03:45

overcome this issue by using the amazon

play03:48

redshift and all the company noticed

play03:50

that there was a 52 increase in the

play03:53

application performance now did you know

play03:56

that amazon redshift is

play03:58

basically cost less to operate than any

play04:01

other cloud data warehouse service

play04:03

available on the cloud computing

play04:05

platforms and also the performance of an

play04:07

amazon redshift is the fastest data

play04:10

warehouse we can say that that is

play04:11

available as of now so in both cases one

play04:14

is that it saves the cost as compared to

play04:16

the traditional data warehouses and also

play04:18

the performance of this red shift

play04:20

service or a data warehouse service the

play04:22

fastest available on the cloud platforms

play04:24

and more than 15 000 customers primarily

play04:27

presently they are using the amazon

play04:29

redshift service now let's understand

play04:31

some of the advantages of amazon

play04:33

redshift first of all as we saw that it

play04:36

is one of the fastest available data

play04:37

warehouse service so it has the high

play04:39

performance second is it is a low cost

play04:42

service so you can have a large scale of

play04:44

data warehouse or a databases combined

play04:46

in a data warehouse at a very low cost

play04:49

so whatever you use you pay for that

play04:51

only scalability now in case if you

play04:54

wanted to increase the nodes of the

play04:55

databases in your redshift you can

play04:57

actually increase that based on your

play04:58

requirement and that is on the fly so

play05:01

you don't have to wait for the

play05:03

procurement of any kind of hardware or

play05:04

the infrastructure it is whenever you

play05:06

require you can scale up or scale down

play05:09

the resources so this scalability is

play05:11

again one of the advantage of the amazon

play05:13

redshift availability since it's

play05:15

available across multiple availability

play05:18

zones so it makes this service as a

play05:20

highly available service security so

play05:22

whenever you create whenever you access

play05:25

redshift you create a clusters in the

play05:26

redshift and the clusters are created in

play05:29

the you can define a specific virtual

play05:31

private cloud for your cluster and you

play05:33

can create your own security groups and

play05:36

attach it to your cluster so you can

play05:38

design the security parameters based on

play05:40

your requirement and you can get your

play05:42

data warehouse or the data items in a

play05:44

secure place flexibility and you can

play05:47

remove the clusters you can create under

play05:49

clusters if you are deleting a cluster

play05:52

you can take a snapshot of it and you

play05:54

can move those snapshots to different

play05:55

regions so that much flexibility is

play05:57

available on the aws for the service and

play06:00

the other advantage is that it is

play06:03

basically a very simple way to do a

play06:05

database migration so if you're planning

play06:07

that you wanted to migrate your

play06:08

databases from the traditional data

play06:10

center over the cloud on the redshift it

play06:13

is basically a very simple to do a

play06:16

database migration you can have some of

play06:19

the inbuilt tools available on the aws

play06:21

access you can connect them with your

play06:23

traditional data center and get that

play06:25

data migrated directly to the redshift

play06:27

now let's understand the architecture of

play06:29

the amazon redshift so architecture of

play06:31

an amazon redshift is basically it

play06:33

combines of a cluster and that we call

play06:35

it as a data warehouse cluster in this

play06:37

picture you can see that this is a data

play06:39

warehouse cluster and this is a

play06:41

representation of a amazon redshift so

play06:44

it has some of the compute nodes which

play06:46

does the data processing and a leader

play06:48

node which gives the instructions to

play06:51

these compute nodes and also the leader

play06:53

node basically manages the client

play06:56

applications that require the data from

play06:59

the redshift so let's understand about

play07:01

the components of the redshift the

play07:03

client application of amazon redshift

play07:06

basically interact with the leader node

play07:08

using jdbc or the odbc now what is jdbc

play07:11

it's a java database connectivity and

play07:13

the odbc stands for open database

play07:16

connectivity the amazon redshift service

play07:18

using a jdbc connector can monitor the

play07:21

connections from the other client

play07:23

applications so the leader node can

play07:25

actually have a check on the client

play07:26

applications using the jdbc connections

play07:29

whereas the odbc allows a leader node to

play07:32

have a direct interaction or to have a

play07:34

live interaction with the amazon

play07:36

redshift so odbc allows a user to

play07:39

interact with live data of amazon

play07:41

redshift so it has a direct connectivity

play07:43

direct access of the applications as

play07:45

well as the leader node can get the

play07:47

information from the compute nodes now

play07:50

what are these compute nodes these are

play07:52

basically kind of our databases which

play07:54

does the processing so amazon redshift

play07:56

has a set of computing resources which

play07:58

we call it as a nodes and the nodes when

play08:00

they are combined together they are

play08:01

called it as a clusters now a cluster a

play08:04

set of computing resources which are

play08:06

called as nodes and this gathers into a

play08:09

group which we call it as a data

play08:11

warehouse cluster so you can have a

play08:13

compute node starting from 1 to n number

play08:15

of nodes and that's why we call that the

play08:17

redshift is a scalable service because

play08:19

we can scale up the compute nodes

play08:21

whenever we require now the data

play08:23

warehouse cluster or the each cluster

play08:26

has one or more databases in the form of

play08:29

a nodes now what is a leader node this

play08:31

node basically manages the interaction

play08:33

between the client application and the

play08:35

compute node so it acts as a bridge

play08:37

between the client application and the

play08:38

compute nodes also

play08:40

it analyzes and develop designs in order

play08:42

to carry out any kind of a database

play08:45

operations so leader node basically

play08:47

sends out the instructions to the

play08:49

compute nodes basically perform or

play08:51

execute that instructions and give that

play08:53

output to the leader node so that is

play08:55

what we are going to see in the next

play08:57

slide that the leader node runs the

play08:59

program and assign the code to

play09:01

individual compute nodes and the compute

play09:04

nodes execute the program and share the

play09:07

result back to the leader node for the

play09:09

final aggregation and then it is

play09:11

delivered to the client application for

play09:13

analytics or whatever the client

play09:15

application is created for so compute

play09:17

nodes are basically categorized into

play09:19

slices and each node slice is alerted

play09:22

with specific memory space or you can

play09:24

say a storage space where the data is

play09:26

processed these node slices works in

play09:28

parallel in order to finish their work

play09:30

and hence when we talk about a redshift

play09:32

as a fast faster processing capability

play09:36

as compared to other data warehouses or

play09:38

traditional data warehouses this is

play09:40

because that these node slices work in a

play09:42

parallel operation that makes it more

play09:44

faster now the additional concept

play09:46

associated with amazon redshift is there

play09:48

are two additional concepts associated

play09:50

with the redshift one is called as the

play09:51

column storage and the other one is

play09:53

called as the compression let's see what

play09:55

is the column storage as the name

play09:57

suggests column storage is basically

play09:59

kind of a data storage in the form of a

play10:01

column so that whenever we run a query

play10:04

it becomes easier to pull out the data

play10:06

from the columns so column storage is an

play10:08

essential factor in optimizing query

play10:10

performance and resulting in quicker

play10:12

output so one of the examples are

play10:14

mentioned here so below example show how

play10:16

database tables store record into disk

play10:19

block by row so here you can see that if

play10:21

we wanted to pull out some kind of an

play10:23

information based on the city address

play10:25

age we can basically create a filter and

play10:28

from there we can put out the details

play10:30

that we require and that is going to

play10:32

fetch out the details based on the

play10:33

column storage so that makes data more

play10:36

structured more streamlined and it

play10:37

becomes very easier to run a query and

play10:39

get that output now the compression is

play10:42

basically to save the column storage we

play10:45

can use a compression as an attribute so

play10:47

compression is a column level operation

play10:49

which decreases the storage requirement

play10:51

and hence it improves the query

play10:53

performance and this is one of the

play10:54

syntax for the column compression now

play10:57

the companies that are using amazon

play10:58

redshift one is lya the other one is

play11:02

equinox the third one is the pfizer

play11:04

which is one of the famous

play11:05

pharmaceuticals company mcdonald's one

play11:08

of the burger chains across the globe

play11:09

and philips it's an electronic company

play11:12

so these are one of the biggest

play11:14

companies that are basically relying and

play11:16

they are putting their data on the

play11:18

redshift data warehouse service now in

play11:20

another video we'll see the demo for

play11:22

using the amazon redshift let's look

play11:24

into the amazon redshift demo so these

play11:27

are the steps that we need to follow for

play11:29

creating the amazon redshift cluster and

play11:31

in this demo what we will be doing is

play11:33

that we will be creating an im role for

play11:35

the redshift so that the redshift can

play11:37

call the services and specifically we

play11:39

will be using the s3 service so the role

play11:41

that we will be creating will be giving

play11:43

the permission to redshift to have an

play11:45

access of an s3 in the read-only format

play11:48

so in the step one what we require we'll

play11:50

check the prerequisites and what you

play11:52

need to have is the aws credentials uh

play11:56

if you don't have that you need to

play11:57

create your own credentials and you can

play12:00

use your credit and the debit card and

play12:02

then in the step two we'll proceed with

play12:03

the im roll for the amazon redshift once

play12:06

the role is created we'll launch a

play12:08

sample amazon redshift cluster mentioned

play12:10

in the step 3 and then we'll assign a

play12:13

vpc security groups to our cluster now

play12:15

you can create it in the default vpc

play12:16

also you can create a default security

play12:18

groups also otherwise you can customize

play12:20

the security groups based on your

play12:22

requirement now to connect to the sample

play12:25

cluster you need to run the queries and

play12:27

you can connect to your cluster and run

play12:29

queries on the aws management console

play12:31

query editor which you will find it in

play12:33

the redshift only or if you use the

play12:35

query editor you don't have to download

play12:37

and set up a sequel client application

play12:39

separately and in the step 6 what you

play12:41

can do is you can copy the data from the

play12:44

s3 and upload that in the redshift

play12:46

because the redshift would have an

play12:48

access

play12:49

read-only access

play12:51

for the s3 as that will be created in

play12:53

the im role so let's see how we can

play12:55

actually use the redshift uh on the aws

play12:58

so i am already logged in into my

play13:00

account i am in north virginia region

play13:02

i'll search for redshift service and

play13:04

here i find amazon redshift so just

play13:06

click on it

play13:07

let's wait for the redshift to come now

play13:10

this is a redshift dashboard and from

play13:12

here itself you have to run the cluster

play13:14

so to launch a cluster you just have to

play13:16

click on this launch cluster and once

play13:17

the cluster is created and if you wanted

play13:20

to run queries you can open query editor

play13:22

or you can basically create queries and

play13:25

access the data from the red shift so

play13:27

that's what it was mentioned in the

play13:28

steps also that you don't require a

play13:30

separate sql client application to get

play13:32

the queries run on the data warehouse

play13:34

now before creating a cluster we need to

play13:36

create the role so what we'll do is

play13:38

we'll click on the services and we'll

play13:40

move to im role section so im rule i can

play13:44

find here under the security identity

play13:46

and compliance so just click on the

play13:47

identity access management and then

play13:50

click on create roles so let's wait for

play13:52

i am page to open so here in the im

play13:54

dashboard you just have to click on the

play13:56

rules i already have the role created so

play13:59

what i'll do is i'll delete this role

play14:00

and i'll create it separately so just

play14:03

click on create role and under the aws

play14:05

services you have to select for the

play14:06

redshift because now the redshift will

play14:08

be calling the other services and that's

play14:10

why we are creating the role now which

play14:12

other services that the redshift will be

play14:14

having an access of s3 why because we'll

play14:16

be putting up the data on the s3 and

play14:18

that is something which needs to be

play14:20

uploaded on the redshift so we'll just

play14:22

search for the redshift service and

play14:25

we can find it here so just click on it

play14:28

and then click on red shift customizable

play14:31

in the use case now click on next

play14:33

permissions and here in the permissions

play14:35

give the access to this role assign the

play14:38

permissions to this role in the form of

play14:40

an s3 read-only access so you can search

play14:43

here for the s3 also let's wait for the

play14:46

policies to come in here it is let's

play14:49

type s3 and here we can find amazon s3

play14:51

read-only access so just click on it and

play14:54

assign the permissions to this role tags

play14:56

you can leave them blank click on next

play14:58

review put a name to your role let's put

play15:01

my redshift role and click on a create

play15:03

role now you can see that your role has

play15:05

been created now the next step is that

play15:07

we'll move to redshift service and we'll

play15:09

create one cluster so click on the

play15:11

services click on amazon redshift you

play15:14

can find that in the history section

play15:16

since we browsed it just now and from

play15:18

here we are going to create a sample

play15:20

cluster now to launch a cluster you just

play15:22

have to click on launch this cluster

play15:24

whatever the uncompressed data size you

play15:26

want in the form of a gigabyte terabyte

play15:28

or petabyte you can select that and

play15:30

let's say if you select in the form of

play15:32

gb how much db memory you want you can

play15:34

define it here itself this also gives

play15:36

you the information about the costing on

play15:39

demand is basically pay as you use so

play15:42

they are going to charge you 0.5 dollars

play15:44

per r for using the two node slices so

play15:47

let's click on launch this cluster and

play15:49

this will be a dc2 dot large kind of an

play15:52

instance that will be given to you it

play15:53

would be in the form of a solid state

play15:55

drive ssds which is one of the fastest

play15:58

way of storing the information and

play16:00

the nodes two are mentioned by default

play16:03

that means there will be two node slices

play16:05

and that will be created in a cluster

play16:07

you can increase them also let's say if

play16:09

i put three node slices so it is going

play16:11

to give us 3 into 0.16 db per node

play16:14

storage now here you have to define the

play16:16

master username password for your

play16:18

redshift cluster and you have to follow

play16:21

the password instructions so i would put

play16:24

a password to this cluster and if it

play16:27

accepts that means it does not give you

play16:29

any kind of a warning otherwise it is

play16:31

going to tell you about you have to use

play16:33

the ascii characters and all and here

play16:34

you have to assign this cluster the role

play16:37

that we created recently so in the

play16:39

available im rules you just have to

play16:41

click on my red shift roll and then you

play16:43

have to launch the cluster if you wanted

play16:45

to change something in with respect to

play16:47

the default settings let's say if you

play16:48

wanted to change the vpc from default

play16:51

vpc to your custom vpc and you wanted to

play16:53

change the default security groups to

play16:56

your own security group so you can

play16:57

switch to advanced settings and do that

play16:59

modification now let's launch the

play17:01

cluster and here you can see the

play17:03

redshift cluster is being created now if

play17:05

you wanted to run queries on this

play17:07

redshift cluster so you don't require a

play17:09

separate sql client you just have to

play17:10

follow

play17:11

the simple steps to run a query editor

play17:14

and the query editor you will find it on

play17:16

the dashboard so let's click on the

play17:18

cluster and here you would see that the

play17:20

redshift cluster would be created with

play17:22

the three nodes in the us east1b

play17:25

availability zone so we have created the

play17:27

redshift cluster

play17:29

in the ohio region and now what we'll do

play17:31

is we'll

play17:33

see how we can create the tables inside

play17:35

the redshift and we'll see how we can

play17:37

use the copy command so that we can

play17:40

directly move the data uploaded on the

play17:43

s3 bucket to the redshift database

play17:45

tables and then we'll query the results

play17:47

of a table as well so how we can do that

play17:50

first of all after creating the redshift

play17:52

cluster we have to install sql workbench

play17:56

j

play17:57

this is not a mysql which is managed by

play17:59

oracle and you can find this

play18:01

on the google you can download it from

play18:03

there and

play18:04

then you have to connect this client

play18:07

with the redshift database how you can

play18:09

do click on file click on connect window

play18:12

and after connecting a window uh you

play18:14

have to paste the url which is a jdbc

play18:16

driver this driver link you can find it

play18:20

onto the aws console so if you open up a

play18:23

redshift cluster there you would find

play18:26

the jdbc driver link let's wait for it

play18:29

so this is our cluster created let's

play18:32

open it and here you can find this jdbc

play18:35

url and also make sure that in the

play18:38

security groups of a redshift you have

play18:41

the port 5439 open for the traffic

play18:44

incoming traffic you also need to have

play18:46

the amazon redshift driver and this is

play18:49

the link where you can download the

play18:52

driver and specify the path once you are

play18:54

done with that you provide the username

play18:56

and the password that you created while

play18:58

creating the redshift cluster click on

play19:00

ok so this connects with the database

play19:02

and now the database connection is

play19:04

almost completed now what we will be

play19:07

doing in the sql workbench we'll be

play19:10

first creating the sales table and then

play19:13

in the sales table we'll be adding up

play19:15

the entries copied from the s3 bucket

play19:18

and then move it to the redshift

play19:20

database and after that we'll query the

play19:23

results in the sales table now whatever

play19:26

the values you are creating in the table

play19:27

the same values needs to be in the data

play19:30

file and

play19:32

i have taken up this sample data file

play19:34

from this link which is

play19:36

docs.aws.amazon.com redshift sample

play19:40

database creation and here you can find

play19:42

a download file ticketdb.zip file this

play19:45

folder has basically multiple

play19:48

data files sample data files which you

play19:50

can actually use it

play19:52

to practice uploading the data on the

play19:54

redshift cluster so i have extracted one

play19:58

of the files

play20:00

from this folder and then i have

play20:02

uploaded that file in the s3 bucket now

play20:06

we'll move into the s3 bucket let's look

play20:08

for the file that has been uploaded on

play20:10

the s3 bucket so this is the bucket

play20:12

sample and sales underscore tab dot text

play20:15

is the file that i have uploaded this

play20:18

has the entries data entries that will

play20:20

be uploaded using a copy command onto

play20:22

the redshift cluster now after executing

play20:25

after putting up the command for

play20:27

creating up the table then

play20:29

we'll use a copy command and

play20:32

copy command we have to define the table

play20:34

name the table name is sales and we have

play20:37

to define the path from where the data

play20:39

would be copied over to the sales table

play20:42

in the red shift now path is the s3

play20:44

bucket and this is the redshift bucket

play20:47

sample and

play20:48

it has to look for the data inside the

play20:50

sales underscore tab.txt file also we

play20:53

have to define the

play20:55

role arn that was created previously and

play20:58

once it is done then the third step is

play21:01

to query the results inside the sales

play21:03

table to check whether our data has been

play21:06

uploaded correctly on the table or not

play21:08

now what we'll do is we'll execute all

play21:10

these three syntax

play21:12

it gives us the error because we have to

play21:14

connect it again to the database let's

play21:17

wait for it

play21:18

execute it

play21:19

it's again gives us the error let's look

play21:22

into the name of the bucket it's

play21:24

redshiftbucketsample so we have two t's

play21:27

mentioned here right

play21:29

let's connect with the database again

play21:32

and now execute it so table sales

play21:34

created and we got the error the

play21:37

specified bucket does not exist uh

play21:40

redshift bucket sample let's view the

play21:43

bucket name redshift bucket sample let's

play21:46

copy that put it here connect to the

play21:49

window connect back to the database

play21:52

right and now execute it so table sales

play21:54

created

play21:55

the data in the table has been copied

play21:58

from the s3 bucket

play22:00

to sales underscore tab.text to the

play22:02

redshift and then the query of the

play22:04

results now the results

play22:06

from the table has been queried so

play22:08

that's it with respect to the redshift

play22:09

perspective and i hope you liked our

play22:12

video just don't forget to subscribe and

play22:14

like our channel watch out for our

play22:16

channel for the upcoming videos on the

play22:18

aws itself bye for now thank

play22:24

you hi there if you like this video

play22:27

subscribe to the simply learn youtube

play22:29

channel and click here to watch similar

play22:32

videos turn it up and get certified

play22:34

click here

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
Amazon RedshiftData WarehousingAWS ServicesCloud ComputingBusiness IntelligenceData AnalysisTelecommunication CaseCost EfficiencyPerformance SpeedDatabase Migration
هل تحتاج إلى تلخيص باللغة الإنجليزية؟