Azure Data Factory Part 6 - Triggers and Types of Triggers

databag
5 Mar 202211:36

Summary

TLDRThis video from the Azure Data Factory series introduces triggers, which automate pipeline execution. It explains three types: scheduled, tumbling window, and event-based triggers. The presenter demonstrates how to create and configure these triggers in Azure Data Factory, emphasizing their role in batch and real-time processing. The tutorial also covers manual triggering and monitoring triggers, providing practical insights for automating data workflows.

Takeaways

  • πŸ”§ Triggers in Azure Data Factory are mechanisms to execute a pipeline run automatically.
  • πŸ“… There are three main types of triggers: scheduled, tumbling window, and event-based triggers.
  • ⏰ Scheduled triggers run pipelines based on a fixed time schedule, like every day at 9 a.m.
  • πŸ”„ Tumbling window triggers operate on a periodic interval and differ from scheduled triggers in their granularity, focusing on hours and minutes.
  • πŸ“‚ Event-based triggers initiate pipeline execution in response to specific events, such as the creation of a file in a storage account.
  • πŸ› οΈ Creating a trigger involves setting a name, description, start time, time zone, and recurrence pattern.
  • πŸ”„ Recurrence settings for triggers allow for customization of execution frequency, from minutes to months.
  • πŸ”— Tumbling window triggers can have dependencies on other triggers, indicating a sequence of operations.
  • 🚫 Advanced options for triggers include the ability to set delays, manage concurrency, and define retry policies for failed pipeline executions.
  • πŸ“ Annotations can be added to triggers for additional information or notes.
  • πŸ”’ Security and monitoring are implied as part of the process, with a dashboard to view trigger statistics and status.
  • πŸ”„ The video script demonstrates the creation and management of triggers within Azure Data Factory, emphasizing automation and customization.

Q & A

  • What is the primary purpose of triggers in Azure Data Factory?

    -Triggers in Azure Data Factory are used to automate the execution of a pipeline run. They determine when a pipeline execution should be initiated, allowing for the automation of batch processing and real-time processing tasks.

  • How many types of triggers are discussed in the video?

    -The video discusses three main types of triggers: scheduled trigger, tumbling window trigger, and event-based trigger.

  • What is a scheduled trigger and how does it work?

    -A scheduled trigger is a type of trigger that invokes a pipeline on a wall clock schedule. It can be set to run a pipeline at specific times, such as every day at 9 a.m., according to a predefined schedule.

  • Can you explain the concept of a tumbling window trigger?

    -A tumbling window trigger operates on a periodic interval. It is different from a scheduled trigger in terms of its granularity, which is limited to hours and minutes, and it does not include months, days, and weeks as options.

  • What is an event-based trigger and how does it differ from other triggers?

    -An event-based trigger responds to a specific event, such as the creation or deletion of a file in a storage account. It differs from other triggers as it does not rely on a time-based schedule but rather on the occurrence of an event.

  • What is the difference between 'trigger now' and creating a new trigger in Azure Data Factory?

    -'Trigger now' is a manual trigger that initiates a pipeline run immediately, whereas creating a new trigger sets up an automated process that will run the pipeline at specified intervals or in response to an event.

  • How can you set up a pipeline to run every 15 minutes in Azure Data Factory?

    -You can set up a pipeline to run every 15 minutes by creating a scheduled trigger, setting the recurrence to every 15 minutes, and specifying the appropriate time zone.

  • What is the role of dependencies in tumbling window triggers?

    -Dependencies in tumbling window triggers allow one trigger to be dependent on the completion or start of another trigger. This ensures that the pipeline execution is coordinated based on the status of other triggers.

  • What options are available for configuring an event-based trigger in Azure Data Factory?

    -For an event-based trigger, you can configure the Azure subscription, storage account, container name, blob path with regular expressions, and specify the event type, such as blob creation or deletion, that will trigger the pipeline execution.

  • How can you monitor the performance and status of triggers in Azure Data Factory?

    -You can monitor the performance and status of triggers in Azure Data Factory using the monitoring dashboard, which provides statistics for different types of triggers, including scheduled, tumbling window, storage events, and custom events.

  • What are some advanced options available for configuring triggers in Azure Data Factory?

    -Some advanced options for configuring triggers include setting a delay, managing concurrency to control how many triggers run simultaneously, defining retry policies for failed pipeline executions, and specifying a retry interval time.

Outlines

00:00

πŸ”„ Introduction to Triggers in Azure Data Factory

This paragraph introduces the concept of triggers in Azure Data Factory, which are mechanisms to execute pipeline runs automatically. It explains the need for automation in scenarios like batch and real-time processing and outlines the three types of triggers: scheduled, tumbling window, and event-based. The speaker illustrates how triggers can be used to define when a pipeline should run, either on a set schedule or in response to specific events, and provides an example of setting up a manual trigger versus creating a scheduled or event-based trigger within the Azure Data Factory interface.

05:01

πŸ“… Creating and Configuring Schedule-Based Triggers

The second paragraph delves into the specifics of creating a schedule-based trigger, which is set to invoke a pipeline on a predetermined time schedule. The speaker guides through the process of naming the trigger, setting the start time, time zone, and recurrence pattern. It also touches on the advanced options for configuring the trigger, such as dependencies on other triggers, delay settings, concurrency limits, and retry policies. The paragraph emphasizes the granularity of scheduling options available, from minutes to months, and the ability to define an end date for the trigger's operation.

10:04

πŸ”„ Understanding Tumbling Window and Event-Based Triggers

This paragraph contrasts the tumbling window trigger with the schedule-based trigger, highlighting the differences in their granularity and use cases. It explains that tumbling window triggers operate on a periodic interval, focusing on hours and minutes, and can be set up with dependencies and advanced options similar to schedule-based triggers. The speaker then transitions to event-based triggers, which are activated by specific events such as the creation or deletion of a blob in Azure storage. The paragraph outlines the configuration steps for event-based triggers, including selecting the Azure subscription, storage account, container, and defining the events that will trigger the pipeline execution.

Mindmap

Keywords

πŸ’‘Triggers

Triggers in the context of Azure Data Factory are mechanisms that initiate the execution of a pipeline automatically or based on specific conditions. They are pivotal for automating workflows and are central to the video's theme of explaining different methods to execute pipeline runs. For example, the script mentions setting up triggers to run a pipeline 'every one hour' or 'every night around eight o'clock'.

πŸ’‘Pipeline

A pipeline in Azure Data Factory is a sequence of data movements and transformations that process data. The script discusses how triggers are used to run these pipelines without manual intervention, emphasizing the importance of pipelines in data orchestration and automation.

πŸ’‘Scheduled Trigger

A scheduled trigger is a type of trigger that activates a pipeline based on a fixed schedule, such as daily or hourly. The script explains this concept by stating that one can 'schedule a trigger so, a trigger that invokes a pipeline on, wall clock schedule', illustrating how it can be used to run a pipeline at '9 a.m. every day'.

πŸ’‘Tumbling Window Trigger

This trigger operates on a periodic interval and is different from a scheduled trigger in terms of its operational window. The script clarifies the distinction by pointing out that a tumbling window trigger deals with 'hours and minutes' as opposed to the broader 'months, days, and weeks' scope of a scheduled trigger.

πŸ’‘Event-based Trigger

An event-based trigger initiates a pipeline in response to a specific event, such as the creation of a file in Azure storage. The script provides an example of an event-based trigger where 'whenever the blob is created', it triggers the pipeline, highlighting its use in reactive data processing scenarios.

πŸ’‘ADF (Azure Data Factory)

Azure Data Factory is a cloud-based data integration service that allows for the creation of data-driven workflows for data movement and transformation. The script refers to ADF as the platform where triggers and pipelines are configured and managed, with the video focusing on how to set up and use triggers within this service.

πŸ’‘Data Sets

Data sets in the script refer to the collections of data that are used as inputs and outputs for the copy activities within an Azure Data Factory pipeline. The script mentions creating 'source and sync data sets' which are essential for defining what data the pipeline will process.

πŸ’‘Blob Storage

Blob storage is a service provided by Azure for storing unstructured data, such as text or binary data. In the script, blob storage is used as an example when setting up an event-based trigger, where the creation of a blob in a storage container acts as the triggering event.

πŸ’‘Automation

Automation is the process of making a system or process run without manual intervention. The script emphasizes the role of triggers in automating the execution of pipelines in Azure Data Factory, eliminating the need for manual initiation of processes.

πŸ’‘Recurrence

Recurrence in the context of triggers refers to the frequency at which a trigger will activate a pipeline. The script explains that one can configure triggers to recur 'every 15 minutes' or at other specified intervals, demonstrating how recurrence is configured for both scheduled and tumbling window triggers.

πŸ’‘Dependencies

Dependencies in the script refer to the conditional activation of a trigger based on the completion or initiation of another trigger. This concept is introduced when the script discusses advanced options for configuring triggers, allowing for complex orchestration of pipeline executions.

Highlights

Introduction to triggers in Azure Data Factory and their role in executing pipeline runs.

Explanation of how triggers can automate the running of pipelines without manual intervention.

Types of triggers in Azure Data Factory: Scheduled, Tumbling Window, and Event-based triggers.

Scheduled triggers for running pipelines at specific times based on a wall clock schedule.

Tumbling window triggers for operating on a periodic interval, differing from scheduled triggers.

Event-based triggers that respond to specific events, such as file copy completion.

Demonstration of creating a demo pipeline in Azure Data Factory with copy activity.

Creating and managing triggers directly from the Azure Data Factory interface.

Options to trigger a pipeline manually or to create a new scheduled, event-based, or tumbling window trigger.

The process of creating a new trigger with a specific start time and recurrence.

Configuring the granularity of trigger recurrence, such as minutes, hours, days, or weeks.

Setting an end date for when a trigger should no longer execute the pipeline.

Adding annotations and configuring dependencies between different triggers.

Advanced options for triggers, including delay, concurrency, and retry settings.

Monitoring dashboard in Azure Data Factory for viewing trigger statistics and types.

Practical example of creating an event-based trigger for blob storage events.

Custom events configuration for more specific or complex event-based trigger scenarios.

Encouragement to subscribe to the channel for more informative sessions on Azure Data Factory.

Transcripts

play00:00

hi everyone welcome back to azure data

play00:02

factory video series part six so in this

play00:04

section we'll be discussing about uh

play00:06

triggers and different types of triggers

play00:09

so without any further ado let's get

play00:11

into the concept triggers so triggers

play00:14

are basically just a way that you can

play00:16

execute a pipeline run

play00:18

now let's say you have created your

play00:20

pipeline and that is ready to be

play00:21

deployed in the protection now you

play00:23

wanted to run the pipeline so it's it's

play00:26

it's

play00:27

it's not a job that you go there and you

play00:28

run it manually every day for example

play00:31

you have batch processings and you have

play00:33

real time processing for example and you

play00:35

are instructing

play00:36

to the adf that you run this pipeline

play00:38

every one hour or maybe every night

play00:41

around eight o'clock end of the day

play00:42

reports for example for end of the day

play00:44

batch right so for that you need to have

play00:47

an automatic process in place so

play00:49

basically triggers are used to automate

play00:52

the pipeline run so you you can

play00:56

advise or you can define that to run a

play00:59

specific pipeline on a specific time

play01:02

and you can also schedule that so

play01:04

triggers represent

play01:06

a unit of processing that determines

play01:08

when a pipeline execution needs to be

play01:10

kicked off so mainly this is when a

play01:12

pipeline execution needs to be kicked

play01:14

off so basically you are defining or you

play01:16

are telling to adf using triggers when

play01:19

you want to execute a pipeline in an

play01:20

automatic fashion okay and what are the

play01:23

different types of triggers you have

play01:24

basically you have three types of

play01:26

triggers scheduled trigger and then you

play01:28

have tumbling window trigger and event

play01:30

based trigger so let's try to see one by

play01:32

one so what is schedule trigger so i

play01:34

think the name itself says it's

play01:36

basically self explanatory so basically

play01:38

you are trying to schedule a trigger so

play01:40

a trigger that invokes a pipeline on

play01:42

wall clock

play01:43

schedule basically i say that every day

play01:46

morning 9 a.m you trigger my pipeline so

play01:49

that comes under schedule trigger then

play01:51

you have another type which is called

play01:53

tumbling window trigger so a trigger

play01:55

that operates on a periodic interval

play01:58

basically that is called a tumbling

play02:00

window trigger so there is a slight

play02:02

difference between schedule trigger and

play02:03

tumbling window trigger we will look

play02:05

into that and then you have a third type

play02:06

of trigger which is called event based

play02:08

trigger

play02:09

so basically

play02:11

this is nothing but a trigger that

play02:13

responds to an event so let's say i do

play02:15

not want my pipeline to run every day or

play02:18

every one hour i want my pipeline to

play02:21

respond or i want my pipeline to run

play02:23

based on a specific event

play02:25

event could be

play02:27

copying a file from one location to

play02:29

another location once that file is being

play02:31

copied to specific location then you

play02:34

trigger my pipeline basically my

play02:35

pipeline picks that file and sends it to

play02:37

another location that could be a use

play02:39

case okay so basically these are the

play02:40

three types of triggers we have

play02:42

then if i

play02:44

look into

play02:46

the azure data factory for example

play02:48

so if you remember that we've used uh

play02:52

we've created this demo pipeline one in

play02:54

section three creating uh your first

play02:56

azure data factory pipeline there we've

play02:59

used this copy activity to copy from

play03:01

source

play03:02

right test data input and also we have

play03:04

created a sync data sets basically we've

play03:06

created two data sets source and sync

play03:08

and we used copy data

play03:10

pipeline uh

play03:12

maybe copy data activity and create a

play03:14

demo pipeline basically to copy the file

play03:16

from source to sync

play03:18

right now we have created the pipeline

play03:21

so let me close this so that we have

play03:23

more space so now i want to run this

play03:25

pipeline so what are the options i have

play03:28

so if you see

play03:29

you have a option called add trigger

play03:32

here

play03:34

right so if i click on this guy so

play03:37

basically if i click on this guy i have

play03:38

two options

play03:40

either i can trigger it now

play03:42

so basically trigger now which is like

play03:45

manual trigger or i can create a trigger

play03:48

any one of the type like schedule event

play03:51

based or tumbling window i can create

play03:53

any of that type of trigger

play03:55

and i can tell or i can define saying

play03:56

that you run this automatically every

play03:59

day for example right so you can also

play04:01

have a manual trigger if i click on

play04:03

trigger now basically so it says that

play04:05

okay

play04:06

it takes the last published

play04:07

configuration that's fine if i click on

play04:09

ok

play04:10

so basically now it's running uh the

play04:12

pipeline so basically this pipeline does

play04:14

copy a file from source sensing so this

play04:15

is a manual trigger but i do not want to

play04:18

manually trigger this every day in the

play04:20

morning that's not my job i want to

play04:22

automate this process so for that you

play04:24

could

play04:25

create a new trigger here then choose

play04:27

the trigger basically i do not have any

play04:29

triggers created okay that is succeeded

play04:31

okay so i do not have any triggers

play04:33

created either i can click new which

play04:35

will allow me to create a new triggers

play04:38

or

play04:39

i could go to manage and then i could

play04:42

also create a new triggers from this

play04:44

section so you have multiple places

play04:46

where you can create the triggers and if

play04:48

i go to monitor

play04:50

just to show you something here you have

play04:52

a monitoring

play04:53

dashboard with different types of

play04:55

trigger so if you see here if you click

play04:57

on odd basically if you have any

play04:59

triggers you get the statistics of those

play05:01

triggers here but you have schedule

play05:03

based triggers

play05:04

tumbling window storage events and

play05:06

custom emails basically these two come

play05:08

under event-based triggers okay so

play05:10

currently we do not have any triggers

play05:12

let's try to create one trigger and see

play05:14

how to do that

play05:15

so if i come here if you are under

play05:17

triggers then you click on new

play05:19

and then it asks you to create the name

play05:22

of the trigger okay

play05:24

so i put a

play05:27

schedule hyphen

play05:32

okay it doesn't like hyphen okay so you

play05:34

have a specific naming convention then i

play05:36

say schedule underscore trigger

play05:40

right and then add some description here

play05:43

so basically i wanted to show you what

play05:44

are the different options you have so

play05:46

when it comes here on the type basically

play05:48

you have three different four different

play05:49

types schedule tumbling and event and

play05:51

event has two different types so now i

play05:53

want schedule okay you select that then

play05:55

it asks you what is the start time for

play05:56

this so when you want to choose when you

play05:59

want to start this trigger so basically

play06:01

you can fill in the start time then you

play06:03

can also give the time zone as well so

play06:04

which time zone you want

play06:06

the trigger to follow and then you can

play06:09

say that what is the recurrence time so

play06:12

whether you want to run it like every

play06:13

day or every two days every 15 minutes

play06:16

okay so if you click on this guy here

play06:18

you have couple of other options so

play06:20

basically you can configure whether you

play06:21

want to run it at minutes or hours days

play06:24

and weeks right this is the granularity

play06:26

and you also have months as

play06:28

so let's put it for minutes so what i'm

play06:30

trying to say is you run this every 15

play06:32

minutes you run the pipeline every 15

play06:34

minutes you can also specify the end

play06:36

date when you want to end this complete

play06:38

pipeline you do not want to run it after

play06:39

the specific end date you can also add

play06:41

some annotations and here you have a

play06:43

check box basically it says that

play06:46

if you enable this check box and then

play06:47

you click on ok it's you are saying that

play06:49

you start the trigger immediately after

play06:51

creating this trigger so basically you

play06:52

don't have to enable this so

play06:55

this is start date and every 15 minutes

play06:57

you run this trigger so this is schedule

play06:59

based trigger okay so then i have

play07:04

tumbling trigger

play07:07

so

play07:08

when it comes to tumbling trigger

play07:13

if i select tumbling trigger let's try

play07:14

to see what are the different options so

play07:16

i think you need to understand the

play07:17

slight difference between tumbling

play07:19

trigger and also schedule based trigger

play07:21

so basically the start date is same for

play07:23

both and then the recurrence is also

play07:24

same but if i click on these units

play07:27

basically the granularity here for

play07:29

tumbling is

play07:30

for hours and minutes whereas for

play07:32

schedule it was still months days and

play07:34

weeks right this is a major difference

play07:36

between these two different types of

play07:38

triggers and then if you click on this

play07:40

advanced option you also have an option

play07:42

to add dependencies for example if i

play07:45

click on this so you could also choose

play07:46

if there is any other trigger

play07:48

let's say i am saying that

play07:50

you depend this trigger on another

play07:52

trigger once the other trigger is

play07:53

finished or started then you

play07:56

run this trigger so basically i'm adding

play07:58

a dependencies here with different

play08:00

triggers so you also have an option to

play08:02

do that and then if i click on advanced

play08:05

and you can also configure few other

play08:06

options you can add delay you can have

play08:08

concurrency how many triggers you want

play08:10

to run simultaneously

play08:12

if the pipeline fails you

play08:14

can also say that retry for us three or

play08:17

four times again right because for

play08:19

network related sometimes if you're

play08:21

running this under vnet if if the

play08:23

network is down it could also fail it

play08:26

doesn't mean that the pipeline is wrong

play08:27

you could retry as well and what is a

play08:29

retry interval time as well in seconds

play08:31

you can also define that you can add

play08:33

annotations and this is the same star

play08:34

trigger on creation basically this is a

play08:36

tumbling trigger let's also look into

play08:39

the storage event so basically event

play08:41

based trigger

play08:43

so event

play08:44

trigger

play08:48

yeah so here with for event based

play08:50

triggers so basically the

play08:52

the options you have here to configure

play08:54

is so you select azure subscription okay

play08:57

you you fill in all the information what

play08:59

is the storage account and things like

play09:00

that maybe let's try to fill in this

play09:02

information and then i have different

play09:04

storage accounts okay let me fill in

play09:06

this information you also need to

play09:08

provide the container name

play09:10

okay for example container is not

play09:12

written

play09:15

let me look into the container name what

play09:18

i have

play09:19

within my azure storage account

play09:23

so if i go to

play09:27

portal.azure.com i just want to check my

play09:29

storage accounts and container names for

play09:31

example if i go to the storage accounts

play09:34

and then within this storage account i

play09:36

just wanted to check what is the

play09:37

container i have okay i have source and

play09:39

sync

play09:40

maybe i can give um

play09:43

sou or ce source

play09:46

and this okay this is in the same in the

play09:49

puff the correct format it's expecting

play09:51

now

play09:52

you need to give

play09:54

block path let's say blob path begins

play09:56

with blob path ends with you can also

play09:58

give those a regular expressions here

play10:00

okay so what is the event here right

play10:03

choose which events are associated with

play10:05

this trigger so

play10:08

when you want to run this trigger what

play10:10

is the event you are defining so

play10:11

whenever the blob is created you trigger

play10:14

this pipeline or whenever the blob is

play10:16

deleted you create this pipeline

play10:18

basically this is the event i'm giving

play10:20

here right so this could be one use case

play10:23

if there is a new blob created then you

play10:25

just trigger this

play10:27

this trigger is

play10:29

based on the event

play10:30

which is block creation or population

play10:33

and then you have another type which is

play10:34

a custom events so more or less you have

play10:36

the same options here and then you it is

play10:38

basically customized so when it comes to

play10:41

storage accounts it's specific to

play10:43

storage but this could be more

play10:44

customized right and you have couple of

play10:46

options here which you can configure as

play10:48

well and you can also add some advanced

play10:50

filters

play10:51

and even types you can also add you can

play10:54

also give some event types but uh for

play10:57

custom events we will look into it with

play10:59

an example when we do more practical

play11:02

sessions

play11:03

with the mapping data flows and with

play11:04

different control

play11:06

flow activities but for now uh you have

play11:09

to understand that there are types of

play11:11

triggers right and how to configure them

play11:13

basically

play11:15

and if i go back to

play11:17

my session yeah so basically these are

play11:19

the three different types of triggers we

play11:21

have in azure data factory so i hope uh

play11:25

you like this session or

play11:28

just this video if you think you've

play11:30

gained any knowledge out of this by this

play11:32

i kindly request you to subscribe our

play11:34

channel and also share it with your

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Azure DataFactory VideoTriggersAutomate PipelineBatch ProcessingReal-TimeScheduled TriggerTumbling WindowEvent-BasedADF TutorialData Processing