Azure Data Factory Part 5 - Types of Data Pipeline Activities
Summary
TLDRThis video from the Azure Data Factory series delves into the concept of pipelines and activities, explaining their roles in data processing. It clarifies the distinction between a pipeline, a logical grouping of activities, and an activity, a processing step within a pipeline. The video outlines three main types of activities: data movement, data transformation, and control flow, providing examples and emphasizing their importance in data factory operations. It also directs viewers to Microsoft's detailed documentation for further understanding.
Takeaways
- đ The video is part of a series on Azure Data Factory, focusing on pipelines and activities, and their types.
- đ Pipelines are logical groupings of activities that perform a unit of work in Azure Data Factory.
- đ Activities represent individual processing steps within a pipeline, such as copying data or performing transformations.
- đ Understanding different types of activities is crucial for determining which to use based on specific requirements.
- đ The script revisits the concept of pipelines and activities, emphasizing their roles in data movement and transformation.
- đ The video mentions Azure ADF's integration runtimes, which are essential for data movement activities.
- đ Data movement activities in Azure Data Factory primarily involve the Copy Activity, which supports various data stores.
- đ§ Data transformation activities include Data Flows, Azure Functions, Hive, Pig, and MapReduce for big data processing.
- đ Control flow activities are used for managing the flow of execution in a pipeline, such as conditionals and iterations.
- đ The video references Microsoft's detailed documentation on pipelines and activities in Azure Data Factory and Azure Synapse Analytics.
- đ The presenter encourages viewers to subscribe to the channel for more educational content, emphasizing continuous learning and sharing.
Q & A
What is the main focus of the fifth part of the Azure Data Factory video series?
-The main focus of the fifth part is to explore the concept of pipelines and activities, including the different types of activities available in Azure Data Factory.
What are the top-level concepts discussed in section one of the video series?
-In section one, the top-level concepts discussed include pipelines, activities, datasets, linked services, integration runtimes, and triggers.
What did the audience learn about in section three of the video series?
-In section three, the audience learned about creating their first pipeline and got an introduction to different types of activities.
What is a pipeline in Azure Data Factory?
-A pipeline in Azure Data Factory is a logical grouping of activities that perform a unit of work, such as copying data from one location to another or performing data transformations.
What is an activity in the context of Azure Data Factory?
-An activity in Azure Data Factory represents a processing step within a pipeline, such as a copy activity that moves data from one data store to another.
What is the difference between a dataset and a linked service in Azure Data Factory?
-A dataset in Azure Data Factory refers to a table or file, whereas a linked service defines the connection to a data source or a cloud service.
What are the three main types of activities in Azure Data Factory?
-The three main types of activities in Azure Data Factory are data movement activities, data transformation activities, and control flow activities.
What is a data movement activity in Azure Data Factory?
-A data movement activity, such as the copy activity, is used for moving data from various sources to various destinations within Azure Data Factory.
What is a data transformation activity in Azure Data Factory?
-Data transformation activities in Azure Data Factory, such as data flows, Azure Functions, Hive, Pig, and MapReduce, are used to transform data based on specific requirements.
What are control flow activities in Azure Data Factory?
-Control flow activities in Azure Data Factory include ForEach, If Conditions, Execute Pipeline, Lookup, Add Variable, Switch, Until, and Validation activities, which are used to control the flow of data processing.
Where can one find detailed documentation on pipelines and activities in Azure Data Factory?
-One can find detailed documentation on pipelines and activities in Azure Data Factory on the Microsoft documentation website, specifically in the section about Azure Data Factory and Azure Synapse Analytics.
Outlines
đ Exploring Azure Data Factory Pipelines and Activities
In this video, the focus is on understanding pipelines and activities in Azure Data Factory (ADF). The introduction briefly revisits concepts from previous sections, emphasizing the importance of pipelines, which are logical groupings of activities that perform various data operations such as copying, transforming, and cleaning data. The video aims to delve deeper into the different types of activities and their specific uses, building on the foundational knowledge established in earlier parts of the series.
đ Detailed Explanation of Pipeline and Activities
The video reiterates the definition of pipelines in ADF, describing them as logical groupings of activities that perform units of work. Activities are the individual steps within a pipeline, such as copying data between locations or transforming data. The explanation emphasizes understanding the distinction between pipelines and activities, and how they interact with datasets to perform tasks. The discussion includes examples of how activities communicate with datasets to produce or consume data for various operations.
đ Types of Activities in Azure Data Factory
The video categorizes activities in ADF into three main types: data movement activities, data transformation activities, and control flow activities. Data movement activities, like the copy activity, are used for transferring data between sources and sinks. Data transformation activities involve manipulating data using tools like data flows, Azure Functions, and other big data processing techniques. Control flow activities include operations like loops, conditions, and variables that control the execution flow within a pipeline. The video highlights the importance of selecting the appropriate type of activity based on the specific requirements of the task at hand.
đ Resources and Documentation for ADF Activities
The video references detailed Microsoft documentation that provides comprehensive information about pipelines and activities in ADF and Azure Synapse Analytics. It clarifies that Synapse Analytics is an integrated service combining data transformation and storage capabilities. The documentation includes extensive lists of supported data stores and the types of activities that can be performed on them, categorized by various criteria such as source, sink, and integration runtime support. The importance of utilizing these resources to understand the full capabilities and configurations of activities in ADF is emphasized.
đ Practical Application: Creating Pipelines and Activities
In a practical demonstration, the video shows how to create a new pipeline in ADF and explore the available activities. It categorizes activities under 'Move and Transform' for data operations and 'General' for control flow operations. The demonstration highlights how to navigate the ADF interface to find and utilize different activities for specific tasks. The video concludes by encouraging viewers to practice creating and using different types of activities, promising more detailed tutorials on each type of activity in future videos.
đ Encouragement and Call to Action
The video ends with a call to action, encouraging viewers to subscribe to the channel for more tutorials and updates. The speaker expresses the hope that viewers found the content informative and helpful, and reiterates the channel's motto of 'keep learning and sharing.' The video aims to motivate viewers to engage with the content and stay tuned for future videos that will explore ADF activities in greater detail.
Mindmap
Keywords
đĄAzure Data Factory
đĄPipeline
đĄActivity
đĄData Movement Activities
đĄData Transformation Activities
đĄControl Flow Activities
đĄIntegration Runtimes
đĄData Flow
đĄDataset
đĄLinked Service
đĄADF Instance
Highlights
Introduction to Azure Data Factory video series part five focusing on pipelines and activities.
Exploration of the concept of a pipeline as a logical grouping of activities in Azure Data Factory.
Activities defined as individual processing steps within a pipeline.
Explanation of the difference between a pipeline and an activity in the context of data processing.
Discussion on the creation of the first pipeline and the inclusion of various types of activities.
Overview of the three main types of activities: data movement, data transformation, and control flow.
Detailed look at data movement activities, specifically the Copy Activity in Azure Data Factory.
List of supported data stores for the Copy Activity as both source and destination.
Introduction to data transformation activities, including Data Flow and big data technologies like Hive, Pig, and MapReduce.
Highlight of control flow activities for managing the workflow within a pipeline.
Description of the ForEach, If Condition, and other control flow constructs available in Azure Data Factory.
Mention of Azure Functions and Data Lake as part of the data transformation capabilities.
The importance of understanding different types of activities to select the appropriate one for specific requirements.
Link to Microsoft's detailed documentation on pipelines and activities in Azure Data Factory.
Clarification of Azure Synapse Analytics as a service within Azure, combining transformation and storage.
Encouragement for viewers to subscribe to the channel for more learning and sharing of knowledge.
Final thoughts on the importance of continuous learning in the field of Azure Data Factory.
Transcripts
hi everyone uh welcome back to azure
data factory video series part five so
in this section we will be exploring
more on what is a pipeline and what is
activity what are the types of
activities so the main stretch would be
on the different types of
the pipeline activities we have i know
that in section one we have discussed in
a very high level about pipelines
activities data sets linked services
integration runtimes and also triggers
because they are like the top level
concepts and top level components i
think it's in section two yeah and in
section three we've also created our
first pipeline and we've seen uh
pipeline and also like different types
of different activities of course but we
haven't discussed about the different
types of activity so when it comes to
azure adf you have activities and you
also have types of activities it's
similar to you have integration run
times and you also have three different
types of integration runtimes so mainly
this is
the theory which you need to understand
because
if you if you know what the different
types of activities you have then you
understand okay which type of activity i
need to use for my requirements right
okay so without any further ado let's
get into the concept so basically
pipeline and activities so uh i know
that you've already seen this definition
but again just for the folks who haven't
gone through previous videos i'm just
repeating the same thing here so data
factory might have one or more pipeline
so basically a data factory is nothing
but you create data pipeline so
basically this is what we are talking
here a pipeline is a data factory could
have one or more pipelines right so
basically you create the data pipeline
and the pipeline is a logical grouping
of activities
so then what is pipeline a pipeline is
basically the group of activities it's a
logical group of activities that
performs the unit of work
unit of work could be anything it could
be copy data from one location to other
location or do some data transformation
do do some data cleaning apply some
expressions apply some formulas
add some business rules
unit of work could be anything which you
perform on the data basically you use
pipelines you create pipelines pipeline
is a logical
group of activities basically they
perform the work
so together the activities in a pipeline
perform a task basically
together the activities in a pipeline
perform a task okay so there is a basic
example uh we will not see that example
it's a from the documentation
and let's try to understand activity so
activities represent a processing step
in a pipeline so this is important so
basically the processing step in a
pipeline is represented by the
activities we will look into that for
example you might use a copy activity to
copy data from one data store to another
data store so basically the copy
activity is doing the copy job
right
the copy functionality so you're copying
something from one location to another
location using a copy activity
so you create a pipeline maybe copy data
pipeline and within that pipeline use
activity called copy activity basically
it does the job so you need to
understand the main difference between a
pipeline and an activity okay
then so basically this
this flow should help you to understand
that okay so you create a pipeline which
is a logical group of an activity and
basically this activity actually
communicates with the data set so either
you produce a data set right to the sink
or to the destination or you consume the
data set from the source to make some
transformations okay so we've seen how
to create a data set like basically data
set is
is referring either to a table or to the
file and you also know what is the
difference between data set and linked
service if you do not know that i would
highly recommend you to go and look into
video series part two where i have
explained the top components and there
you have data set and link service so if
you try to understand the flow so
basically what you do is you create a
pipeline and then within the pipeline
you bring activity and basically that
activity does the actual action here
that is very important here to
understand
okay
so then
what are the different types of
activities we have so basically this is
what i wanted to explain in this section
so there are mainly three types of
activities data movement activities
data transformation activities and
control flow activities okay so we will
try to see
where we have them in the azure adf
instance
before going there i have a very nice
documentation um
not i have a documentation it's
basically provided by the microsoft so
they have
very nice documentation when i say nice
all the information is in detail
few of the information are also in
simple terms for everyone to understand
right so if you come to this section
here
so this is the url
okay and within this url you have
pipelines and activities in azure data
factory and azure synapse analytics so
don't get confused with what is the
synapse analytics basically this is one
of
the service within azure uh basically
it's a combination of both
transformation and the storage so it's
it's like a cloud data warehouse i could
say but it has more features and
functionalities we will try to explore
that in coming sessions but let's try to
only focus on azure data factory okay so
pipelines and activities in azure data
factory if i scroll up basically there
is this diagram which we've seen and
then you have data movement activities
so what is data movement activities or
what are data moment activities so
here
if you see copy activity in azure data
factory copies data from various sources
and also to various things so basically
here you have a big list of data stores
within azure where the copy activity is
supported okay so basically the copy
activity within the adf is mainly used
for data movement activities
and here you can see
whether it is supported as source or not
you have a very big list of data stores
and then whether it is supported as sync
or not and it is also supported by azure
integration runtime we have seen what is
integration time over the different
types of integration on times in the
previous section which is part
four yeah and also you have
whether it is supported by self-hosted
integration runtime or not okay so you
have a very big list here you can just
go through them you can also see these
are categorized by azure and then by
databases
and also by nosql file and everything uh
important key point here is data
movement activity mainly the copy
activity is used for the data moment
activities then you have data
transformation activities so data
transformation activities so basically
you transform the data so to do that you
have a different list here so you have a
data flow we will see what is data flow
and then you have
here you can see what is a compute
environment basically the data flow will
get executed in the apache spark
clusters we will get into that so i have
very big playlist on a data flow and
mapping data flow where we will try to
understand them and use them for data
transformation related works then you
have azure functions hive pig you know
map reduce all these are related to big
data
right and you can also see the computer
environment underlying there and then
you have some custom activity data
breaks so basically these are all used
based on the requirement based on the
use case
for data transformation and then you
have control flows okay so control flow
activities are
something like for each
filter and if else condition pending or
executing a pipeline and look up adding
a variable and like a switch until
activity validation activity weight or
web hook so basically these are control
flows okay
three types data movement data
transformation and control flow and we
have seen those three different types
there and if i go back to our adf
instance just to see here so when you
try to click on create a pipeline new
pipeline okay for example and then if i
close here then you have a big list of
activities here so the move and
transform here you see data copy is
under the move
okay and transform you also have a data
flow and then you also have data
explorer azure function basically few of
them are used for data transformation
and then when you come to general
then you have a few options here few
activities here basically which are used
for
control flow activities right we have
seen them control flow activities and
then you also have iteration and
conditionals these are also used for
control flow each for each if condition
switch until and filter and then
basically you have hd inside azure
function data breaks data lakes these
are used for data transformations and
move and transform you have copy and
data right so this is all about
different types of
different types of activities and what
are they
and when to use what
again moving forward we will look into
them in detail how to use them when to
use them but i hope you like this
section
so uh if you think if you've gained any
knowledge uh out of this video i would
kindly request you to subscribe our
channel which gives a lot of motivation
and encouragement to make mo these kind
of videos
motto is very simple
keep learning and sharing and i hope to
have a good day thank you so much
Voir Plus de Vidéos Connexes
Azure Data Factory Part 6 - Triggers and Types of Triggers
Azure Data Factory Part 3 - Creating first ADF Pipeline
Azure Data Factory Part 4 - Integration Run Time and Different types of IR
Building a Serverless Data Lake (SDLF) with AWS from scratch
How to create a ROPA (Record of processing activity), GDPR Article 30
CPU, Pipeline & Vector Processing, Input-Output Organization | Computer System Architecture UGC NET
5.0 / 5 (0 votes)