Building a Serverless Data Lake (SDLF) with AWS from scratch
Summary
TLDRThis video from the 'knowledge amplifier' channel dives into the AWS Serverless Data Lake Framework (SDLF), an open-source project that streamlines the setup of data lake systems. It outlines the core AWS services integral to SDLF, such as S3, Lambda, Glue, and Step Functions, and discusses their roles in creating a reusable, serverless architecture. The script covers the framework's architecture, detailing the data flow from raw ingestion to processed analytics, and highlights the differences between near real-time processing in Stage A and batch processing in Stage B. It also touches on CI/CD practices for data pipeline development and provides references to related tutorial videos for practical implementation guidance.
Takeaways
- 📚 The video introduces the AWS Serverless Data Lake Framework (SDLF), a framework designed to handle large volumes of structured, semi-structured, and unstructured data.
- 🛠️ The framework is built using core AWS serverless services including AWS S3 for storage, DynamoDB for cataloging data, AWS Lambda for light data transformations, AWS Glue for heavy data transformations, and AWS Step Functions for orchestration.
- 🏢 Companies like Formula 1 Motorsports, Amazon retail Ireland, and Naranja Finance utilize the SDLF to implement data lakes within their organizations, highlighting its industry adoption.
- 🌐 The framework supports both near real-time data processing in Stage A and batch processing in Stage B, catering to different data processing needs.
- 🔄 Stage A focuses on light transformations and is triggered by events landing in S3, making it suitable for immediate data processing tasks.
- 📈 Stage B is designed for heavy transformations using AWS Glue and is optimized for processing large volumes of data in batches, making it efficient for periodic data processing tasks.
- 🔧 The video script explains the architecture of SDLF, detailing the flow from raw data ingestion to processed data ready for analytics.
- 🔒 Data quality checks are emphasized as crucial for ensuring the reliability of data used in business decisions, with a dedicated Lambda function suggested for this purpose.
- 🔄 The script outlines the use of AWS services for data transformation, including the use of AWS Step Functions to manage workflows and AWS Lambda for executing tasks.
- 🔧 The importance of reusability in a framework is highlighted, with the SDLF being an open-source project that can be adapted and reused by different organizations.
- 🔄 CI/CD pipelines are discussed for managing project-specific code changes, emphasizing the need to implement continuous integration and delivery for variable components of the framework.
Q & A
What is the AWS Serverless Data Lake Framework (SDLF)?
-The AWS Serverless Data Lake Framework (SDLF) is an open-source project that provides a data platform to accelerate the delivery of enterprise data lakes. It utilizes various AWS serverless services to create a reusable framework for data storage, processing, and security.
What are the core AWS services used in the SDLF?
-The core AWS services used in the SDLF include AWS S3 for storage, DynamoDB for cataloging data, AWS Lambda and AWS Glue for compute, and AWS Step Functions for orchestration.
How does the SDLF handle data ingestion from various sources?
-Data from various sources is ingested into the raw layer of the SDLF, which is an S3 location. The data can come in various formats, including structured, semi-structured, and unstructured data.
What is the purpose of the Lambda function in the data ingestion process?
-The Lambda function acts as a router, receiving event notifications from S3 and forwarding the event to a team-specific SQS queue based on the file's landing location or filename.
Can you explain the difference between the raw, staging, and processed layers in the SDLF architecture?
-The raw layer contains the ingested data in its original format. The staging layer stores data after light transformations, such as data type checks or duplicate removal. The processed layer holds the data after heavy transformations, such as joins, filters, and aggregations, making it ready for analytics.
How does the SDLF ensure data quality in the ETL pipeline?
-The SDLF uses a Lambda function to perform data quality validation. This function can implement data quality frameworks to ensure the data generated by the ETL pipeline is of good quality before it is used for analytics.
What is the role of AWS Step Functions in the SDLF?
-AWS Step Functions are used for orchestration in the SDLF. They manage the workflow of data processing, starting from light transformations in the staging layer to heavy transformations in the processed layer.
How does the SDLF differentiate between Stage A and Stage B in terms of data processing?
-Stage A is near real-time, processing data as soon as it lands in S3 and triggers a Lambda function. Stage B, on the other hand, is for batch processing, where data is accumulated over a period and then processed together using AWS Glue.
What is the significance of using CI/CD pipelines in the SDLF?
-CI/CD pipelines are used to manage the deployment of project-specific code for light transformations and AWS Glue scripts. They ensure that only the variable parts of the project are updated, streamlining the development and deployment process.
How can one implement the SDLF using their own AWS services?
-To implement the SDLF, one can refer to the provided reference videos that cover creating event-based projects using S3, Lambda, and SQS, triggering AWS Step Functions from Lambda, and interacting between Step Functions and AWS Glue, among other topics.
Outlines
🚀 Introduction to AWS Serverless Data Lake Framework (SDLF)
This paragraph introduces the AWS Serverless Data Lake Framework (SDLF), emphasizing its importance and core AWS services involved. The framework is designed to address complex business problems by leveraging a centralized repository for storing, processing, and securing various data types. It mentions the use of AWS Transfer Family, SFTP, and different injection frameworks like scoop Talent or Spark. The paragraph also touches on the reusability aspect of a framework and introduces the open-source nature of SDLF, highlighting its adoption by large organizations like Formula 1 Motorsports, Amazon retail Ireland, and Naranja Finance.
🌟 Core AWS Services in SDLF and Data System Layers
The second paragraph delves into the core AWS services utilized in the SDLF, such as AWS S3 for storage, DynamoDB for cataloging data, AWS Lambda and Glue for compute tasks, and AWS Step Functions for orchestration. It explains the three major layers of a data system: the raw or landing layer, the staging layer, and the processed or analytical layer. The paragraph outlines the process flow from data ingestion to transformation and storage across these layers, culminating in a detailed explanation of the architecture and the use of AWS services within SDLF.
🔍 Data Flow and Processing in SDLF Architecture
This paragraph explores the data flow within the SDLF architecture, starting from the raw data landing in S3 to the triggering of Lambda functions and the use of Amazon SQS for event handling. It describes how data is processed through light transformations in the staging layer and then moved to the processed or analytical layer for heavy transformations using AWS Glue. The paragraph also explains the role of AWS Step Functions in initiating the processing workflow, the use of Lambda for routing events to team-specific queues, and the importance of metadata updates in DynamoDB for audit and logging purposes.
🛠️ Batch Processing and CI/CD Integration in SDLF
The fourth paragraph discusses the distinction between Stage A and Stage B in the SDLF architecture, highlighting Stage A as a near real-time system and Stage B as a batch processing system. It explains the use of a CloudWatch rule to trigger Lambda functions for batch processing every five minutes. The paragraph also addresses the CI/CD aspect, emphasizing the importance of implementing pipelines for project-specific code changes in Lambda and AWS Glue, while using source control tools like CodeCommit and CodePipeline.
🔗 Implementing SDLF and Reference to Related Videos
The final paragraph provides guidance on implementing the SDLF flow using AWS services, referencing previous videos that cover various components of the framework. It suggests videos on creating event-based projects with S3, Lambda, and SQS, triggering Step Functions from Lambda, and using CloudWatch rules for periodic Lambda invocations. The paragraph also mentions the interaction between Step Functions and AWS Glue, as well as the importance of data quality checks and the use of DynamoDB for audit and logging. It concludes with an invitation to like, share, comment, and subscribe to the channel for more informative content.
Mindmap
Keywords
💡AWS Serverless Data Lake Framework (SDLF)
💡AWS Services
💡Data Lake
💡Structured Data
💡Semi-structured Data
💡Unstructured Data
💡ETL (Extract, Transform, Load)
💡Orchestration
💡Lambda Functions
💡S3 Raw Layer
💡CloudWatch Rules
💡Data Quality
💡CI/CD Pipeline
Highlights
Introduction to AWS Serverless Data Lake Framework (SDLF).
Core AWS Services used in SDLF: S3, DynamoDB, Lambda, Glue, and Step Functions.
Advantages of using SDLF for complex business problems.
Data Lake as a centralized repository for various data formats.
Reusability as a key feature of a framework and its importance in SDLF.
Three options for implementing Data Lake: building from scratch, purchasing, or using open-source projects like SDLF.
Popularity of SDLF among large organizations like Formula 1 Motorsports and Amazon Retail Ireland.
Explanation of the architecture of SDLF with Stage A and Stage B.
Role of AWS S3 in storing raw data and its serverless nature.
Use of AWS Lambda for light data transformation within the 15-minute execution limit.
AWS Glue's role in heavy data transformation for batch processing.
Orchestration of data workflows using AWS Step Functions in a serverless manner.
Data flow from raw to staging to processed layers in SDLF.
Event-driven architecture using AWS S3 notifications and SQS for data ingestion.
Lambda functions routing events to team-specific SQS queues for organized data processing.
Step Functions workflow for light transformation, auditing, and logging with DynamoDB.
Difference between Stage A for near real-time processing and Stage B for batch processing.
Implementation of CI/CD pipelines for project-specific code in SDLF.
Data quality checks as an essential part of the ETL pipeline to ensure data reliability.
Reference to additional videos for implementing specific parts of the SDLF workflow.
Transcripts
Hello friends Welcome to our Channel
knowledge amplifier so today in this
particular video we are going to explore
a very important framework and that is
AWS serverless datal framework in short
it is called as sdmf okay first we are
going to explore what it is what are the
core AWS Services used in this framework
what are the advantages which companies
are using this to solve complex business
problems on this we are going to explore
in detail in this particular discussion
okay so before going ahead with actual
framework let us try to recall what is
data link so as we know it is nothing
but a centralized repo designed to store
process and secure large number of
structured semi-structured unstructured
data right maybe from your vendor
company using AWS transfer family that
is using SFTP some files are coming that
can be structured data in CSV tsv format
or that can be image Json this kind of
semi structured or unstructured data
also possible Maybe maybe from some on
premise system using scoop Talent OR P
spark using different injection
Frameworks the data is coming in our
centralized repository that particular
location where all these various format
of data various kind of data is getting
accumulated that centralized repo is
nothing but data right and whenever
framework what we use that time one
particular thing directly get attached
with that system and that is reusability
right that particular system should be
reusable then only we can call that as a
framework so now let us try to explore
how using different AWS serverless
Services we can create some reusable
framework for our data L system okay so
whenever any organization need to
implement data L for solving any big
data related use case or well AP related
use case that time they generally get
two option either they can build that
particular data L system from scratch
within their organization or maybe from
some third party vendor company they can
buy or purchase some data leg framework
which is following industry standard
these are General conventional two
approaches which most of the
organizations follow but we are having a
third option and that is popular
open-source project that is some
developer or some organization might
have created a very well architect
system and they have published for free
to use for different other organizations
or individual users and that is nothing
but falling under open source project so
this particular Sur less data framework
is one such open-source project which
provides a data platform that
accelerates the delivery of Enterprise
data L okay that is using this
particular open source project we can
quickly build a data L system within our
organization and this particular open-
source project is used by large
organizations like Formula 1 Motorsports
it is very popular for racing right we
all know and then apart from that Amazon
retail Ireland also Al use that Naranja
Finance from Argentina these are some
popular big organizations who are
following this sdlf framework to
implement data l in their organization
right now let us try to understand to
implement this data framework what are
the core AWS Services used okay so
because this framework is serverless
whatever AWS Services we are using to
implement this framework they are also
obviously serverless like for example if
you consider storage system we are
having AWS S3 which is having limited
storage capacity we no need to think
about servers and back ends AWS has
taken care of all those things on our
behalf and apart from that if we need to
catalog that data that time Dynamo can
also help us right now this is for
storage now we need some compute system
for our oap and that time AWS Lambda and
glue can help us using Lambda we can
process our data Whenever there is light
transformation which can be completed
within 15 minutes and if there is some
heavy ATL workloads that time we can use
glue which is again a serverless
services and also we need some
orchestration tool like very popular
orchestration tool are autois or earflow
now because this is serverless system we
are going to build so for orchestration
purpose we can use AWS ST which is
nothing but a serverless service
provided by AWS for orchestration right
so these are the major Services we are
using in this particular serverless data
framework or hdf now we will directly
jump into the architecture part so let
me just zoom this particular part a
little bit so here in this stlf
architecture we are having two stage
stage a and here in the lower part we
are having Stage B first we will try to
explore this stage a what it is doing
here okay so from some vendor
organization or from on prame system we
are getting Law data okay so this is our
law file that raw file is getting
ingested in S3 Raw location so as we
know in in our data system we are having
multiple layers the first layer is
called as raw or Landing layer then we
are having staging layer then after that
we are having processed or analytical
layer right so let me just give a quick
recap on that and then we will go back
to this architecture so here I have just
written some important difference
between different layers what we
generally encountered in our data system
that is first is raw or Landing layer
and this layer basically contains the
ingested data that has not been
transformed okay that is exactly in raw
format from The Source system it get
landed in a particular S3 location that
is called as raw layer okay now from
that raw layer we consume the data and
we apply very light transformation maybe
for example data type check or data
duplicate removal this kind of light
transformation we apply and then we
store in staging layer and once data is
available in staging layer we read that
data and we apply heavy transformation
in it maybe for example joining the data
from various sources performing
different filters or aggregation based
on the business requirement and after
doing all this heavy processing or heavy
transformation we write the data in
processed Zone okay so these are the
three major layers which we generally
encountered in a data system so the same
we are going to observe in our this
framework also so I just thought to give
you a quick recap now we will go back to
the architecture okay so here from the
onr system or from some vendor company
some using some ination framework the
data is getting landed in raw layer that
is in raw format it is available now
what happens as soon as the data get
landed in S3 we send an event
notification to AWS sqs okay and from
that AWS sqs we trigger one Lambda
function and that Lambda function send
that particular event to another sqsq
and this sqsq is team wise separated
okay maybe for example team a we are
having one sqsq for Team B we are having
some other sqsq for team C we are having
some other sqsq so this Lambda based on
where the S3 file is getting landed or
maybe based on the file name forward
that particular sqs event in one of
these sqsq which are created team wise
maybe if that particular message is for
team a then it will send to this sqsq if
it is for Team B it will send to this
sqs if it is for another team it will
send to this sqs so like that this
particular sqsq are created team wise
okay I hope you got this point now let
me just erase this particular part so
the Lambda has sent that particular
event to sqs now this particular sqs is
team specific now let us try to
understand within a team what kind of
data flow happens with those events okay
so till now this sqsq contains that
particular file information what got
landed in our raw ler okay it is not yet
processed this Lambda has not processed
that particular event it just acted as a
router in different sqsq now this sqsq
has that information that in this
particular S3 R layer that particular
file got landed and based on this sqs
this Lambda get triggered basically this
Lambda continuously pull with some
certain time interval that any new
message got published in this SPS or not
and once the Lambda get the message it
start one AWS step function okay and
here our first layer of processing
starts okay what happens that here our
step function is getting started the
first Lambda function update the
metadata information about our job in
Dynam maybe for audit or Recon purpose
that is pre-update comprehensive catalog
maybe this particular file it started
processing this is the time stamp it
started and some other metadata
information for future reference for
audit purpose or logging purpose It
capture in Dynam okay and then it
execute light transformation as I told
you that from raw layer we consume the
data and then we apply light
transformation for example data type
check or data duplicacy removal Etc and
then we store in staging there so this
particular Lambda does that activity
that light transformation and because
this is light transformation so
obviously we can expect that it get
completed within 15 minutes for that
reason Lambda can be a good choice in
this particular stage and it write that
data in another S3 location now what is
this S3 location this particular S3
location is nothing but staging layer
right from raw layer we consume the data
and then write in strating layer after
applying simple transformation or light
transformation right and once this
particular Lambda apply that light
transformation once this get executed
successfully this Lambda is taking that
responsibility to update in that Dynamo
DB table that this job is completed this
is the end time of that job and all
other met information right so this is a
typical AWS St function workflow okay
right now let's see what happens next
Once the data available in saging layer
this particular event goes to another
sqsq so here if you see this particular
line is going and it is going like this
way and
here that same staging location is there
and here whenever a file get written by
the Lambda from that save function from
the staging layer one event get emitted
and that basically write that event in
another sqsq so this is another sqsq
okay we have encountered mult M sqsq
till now in this architecture initially
we were having a generic sqsq then we
are having first team specific sqsq then
after this processing here when this
staging data is written then that event
is also updated in this particular sqsq
now here we are having another Lambda
now this Lambda get triggered by this
particular Cloud watch rule as we know
that using Amazon event Bridge we can
schedule a particular Lambda or glue job
with some certain typ time interval so
here this particular cloudwatch rule get
triggered every 5 minutes and this
Lambda basically done every 5 minute
what it does it check in this particular
sqsq any event arrived or not and when
in this sqsq event will arrive when in
the staging location some data will be
written okay and if Lambda find out that
in staging location data is written that
means if in this ssq messages are
available that time this Lambda will
start another step function work
okay and now we are going from staging
to processed or analytical Zone that
means this time we need to apply heavy
transformation not light transformation
we might need to apply joints with
various data sources filters we might
need to apply aggregations we have to do
based on business requirement right so
these are heavy transformation on our
big data so that time there is a high
chance that it will take more than 15
minute so this time we cannot take risk
to run our job in Lambda but rather we
should run the job in blue so this
particular step is doing that heavy
transformation if you see here start
button we are doing and then here this
Lambda is triggering a glue job with all
these sqs events that these are the
files got landed in staging layer that
the glue job should read using p spark
or spark with Scala Etc and then process
it out okay and once the glue job
process the data where it will write it
will obviously right in post stage zone
or we call this processed area or
analytical area this is the final layer
of our data right and once the Lambda
trigger our glue job glue job may take
30 minute or 1 Hour 2 hour Etc so the
Lambda will not wait it will just invoke
this glue job and then it will send that
particular job ID to another Lambda and
here we are having a weight block and
this Lambda will continuously pull with
some certain interval maybe every 15
second or every 20 second every 2 minute
Etc it will check whether the glue job
is completed or not based on the job ID
and if the job is finished it will go to
the next block and if the job is not
finished then it will again wait for a
few more minutes based on the configured
value and then again it will P for the
glue job sets okay right that's how it
works we have already discussed this
particular pattern also now once our
glue job finish what will happen in the
analytical layer or in the processed
layer the curated data is written now
analytical team the data analyst team
need to query that data how they can
query the data from S3 obviously one of
the Bas is Athena so for that they will
first run a glue crawler the crawler
will update the catalog table and using
that catalog data the data analy team or
data scientist team can easily query in
aena and whenever this particular whole
job is running that is Lambda is
triggering glue processing this complete
information again following that same
pattern will be tracked in Dynamo DV for
audit or login purpose okay so whenever
this particular Lambda will submit that
glue job it will make an entry in Dynamo
DB that this particular Blue Job started
with this job ID and once this whole
step is finished here you can see this
Lambda layer post update comprehensive
catalog it is to you that means it will
update in blue that this job is
completed and all this information okay
and once our crawler is also ready the
audit or Deon is also computed using
Dynamo DB the last step is data quality
check because with bad Data Business
might take wrong decision and it might
lead to company's loss so always we need
to make sure that the data whatever we
have generated using our ETL pipeline is
having good quality so that time we can
Implement another Lambda function which
will do that data quality validation
using DQ or any sort of this kind of
data quality related framework so this
is our complete flow again I am reating
the complete flow from scratch first
from the raw layer the event will be
getting published in an sqsq that sqsq
will trigger Lambda Lambda will route
that event to p team specific sqsq and
we'll be having another Lambda for each
team that will basically continuously
pull the team specific sqsq if it get
the message immediately it will start a
step function that step function will
perform the light transformation
complete audit or logging will be
captured using this Amazon Dynamo DV and
after processing it will write that data
in staging layer from staging layer the
event will be sent to sqsq that the new
file is written in staging layer and we
are having another Lambda that is
getting triggered every five minutes
based on this cloudwatch Rule and if
Lambda find any message in this sqs it
triggered the heavy transformation using
awsp so this way initially we are having
raw data then in the middle we are
getting staging data and finally here we
are getting process data which is ready
for analytics workn okay so this is our
seress data L framework which is
basically open source the complete
project link anyway I'll be providing in
the description box the have code base
you can go through that and now I would
like to draw your attention on a major
difference between this stage a and
Stage B and that is stage a is kind of
near real time because here if you see
that as soon as file is getting landed
in S3 from this particular generic sqs
via Lambda the event is going to team
specific sqs and as soon as message is
available in near real time Lambda is
consuming that and triggering this
particular step function to apply light
transformation in it okay it is not
doing much of batching Etc right because
these are very light transformation what
we are applying in this St function we
can apply this particular process on
individual small files also but if you
consider our Stage B that is this
particular place here we are using AWS
glue for transformation okay and
although glue is server lless but mostly
it should be used for a good volume of
data processing it is not like for a
single single file triggering AWS glue
right so here instead of near real time
we should do batching in this Stage B
okay and how we are implementing that
that is implemented using this
particular cloudwatch room if You
observe in this particular staging layer
when file is getting landed here the
event is published to sqsq and Lambda is
not pulling this sqs to trigger this St
function but rather This Cloud watch
rule is triggering Lambda and Lambda is
is checking whether in sqs any message
is available or not okay that means that
5 minutes window here it is basically
waiting for accumulating all the files
whatever is getting landed in this
particular staging location and Lambda
is triggering this particular step
function to process all those files
together okay so this particular part if
you see here it is clearly written batch
just give this this part and here is a
very vital point that this particular
Stage B is for batch processing and this
particular stage a is almost a near
realtime system okay and another part of
this architecture is cicd component
right so here data Engineers need to
write the code for the light
transformation as well as data Engineers
need to write the code for applying
business logic in AWS glue so this kind
of Project Specific code how to push so
for that time we are going to use the
cicd pipeline using Port commit and Port
pipeline in Stage B as well as if You
observe here in stage and mostly we need
to work with where we are applying
transformation like for example here it
is clearly directing that the upper part
where Lambda is configuring in Dynamo DB
that this event processing is started
that is also generic and this last
Lambda which is updating in Dynamo DB
that the event is process that is also
generic in the Middle where using Lambda
we are applying light transformation
that part may vary project to project
right so here if you see quote commit
and quote pipeline only changing this
particular Lambda so instead of using
these kind of CD tools you can use
GitHub action also but our main focus
should be only changing that part which
is variable for project to project like
in this particular stage a this part is
variable similarly for Stage B this glue
part is variable so here if you see code
commit to code Pipeline and here it is
going to this particular Lambda which
will trigger different different glue
for different different things right so
only Implement cicd pipeline for those
variable places not for whole pipeline
that's what I'm trying to say okay so
these are two very important points one
is cicd pipeline one is the nature of
stage a and Stage B stage a is near real
time but Stage B is batch processing
right I hope you understood this and now
if you think that I want to implement
this whole flow in my own AWS services
from scratch that time you can refer
some of my videos which will surely help
you to implement this complete flow let
me explain you so here if you see the
initial part let me just highlight with
different
color so if you consider this particular
part here from is3 to sqs to Lambda this
particular flow I have already uploaded
in this video that is create event based
projects using S3 Lambda and sqs that
same pattern exactly is covered in this
video the link I'll be providing in the
description box then here you can see
this Lambda is publishing that event in
sqs and from sqs this Lambda is getting
triggered so so here we should know how
to publish a message using python code
to an sqsq right so that part also I
have covered in this video sending
message in sqsq from python B 3 okay now
the next step Here If You observe this
Lambda is triggering our AWS St function
so we should know how to trigger AWS St
function from Lambda so that particular
thing I have already covered in this
video and exactly same format that
messages are coming to sqs from their
Lambda and from there it is going to AWS
St okay you can check this video to
understand the complete flow of this
particular part okay right and then step
function on this I have already covered
many videos in my AWS step function
playlist and now here if you see from S3
the message is going to sqs anyway it is
very simple and here we are having a
cloudwatch rule which is triggering
Lambda every 5 minutes okay so that
particular concept also I have covered
if you see here this particular video is
there how to configure a cloudwatch
event rule that calls AWS Lambda
function periodically this particular
video you can refer and Here If You
observe this particular Lambda what it
is doing it is triggering our Blue Job
using State function so here either you
can use Lambda function within our step
function to trigger glue job or AWS step
function directly have interaction
support with AWS glue that also you can
use which I covered in this video build
DL pipeline using AWS glue and State
function where I have covered in detail
how to start a crawler how to wait for
the crawler to be complete State and
then here how to start a glue job and
how to wait for it to be completed so
this particular video will give you the
idea on interaction between State
function and this glue and lastly Here
If You observe everywhere for audit or
logging purpose we are using Dynamo in
the Stage B also and in the stage a also
that Lambda is writing the data in
Dynamo so how to insert the data in
Dynamo from Lambda that also I have
covered in this video so you can check
this video for that particular
implementation so like this way all
these reference videos can surely help
you to implement this particular sdlf or
serverless dat framework I hope you
understood this this is all for my this
video If you find this particular video
interesting then please like share and
comment subscribe our Channel if you
have not subscribed till now and don't
forget to press the Bell icon to get the
notific ific of our latest videos thank
you for watching
Посмотреть больше похожих видео
Part 1- End to End Azure Data Engineering Project | Project Overview
AWS Solution Architect Interview Questions and Answers - Part 2
Spark Tutorial For Beginners | Big Data Spark Tutorial | Apache Spark Tutorial | Simplilearn
Amazon Redshift Tutorial | Amazon Redshift Architecture | AWS Tutorial For Beginners | Simplilearn
Azure Data Factory Part 5 - Types of Data Pipeline Activities
Hadoop and it's Components Hdfs, Map Reduce, Yarn | Big Data For Engineering Exams | True Engineer
5.0 / 5 (0 votes)