Learn Apache Airflow in 10 Minutes | High-Paying Skills for Data Engineers

Darshil Parmar
7 Oct 202312:37

Summary

TLDRThis script introduces Apache Airflow, a popular open-source tool for managing complex data pipelines. It explains how Airflow, initiated by Airbnb and incubated by Apache, allows for the creation, scheduling, and execution of workflows as code, using a Directed Acyclic Graph (DAG) structure. The video also covers the simplicity of using Python scripts for data tasks and the scalability issues that arise with numerous pipelines, which Airflow addresses. It highlights Airflow's user-friendly interface, customizable nature, and community support, encouraging viewers to explore an end-to-end project for hands-on experience.

Takeaways

  • 😀 Data Engineers often build data pipelines to transform and load data from multiple sources.
  • 🔧 Initially, simple Python scripts can be used for data pipeline tasks, but managing multiple pipelines can be challenging.
  • ⏰ Cron jobs can schedule scripts to run at specific intervals, but they are not scalable for hundreds of data pipelines.
  • 🌐 The vast amount of data generated in recent years drives the need for efficient data processing and pipelines in businesses.
  • 🌟 Apache Airflow is a popular open-source tool for managing data workflows, created by Airbnb and now widely adopted.
  • 📈 Airflow's popularity stems from its 'pipeline as code' philosophy, allowing customization and scalability.
  • 📚 Apache Airflow is a workflow management tool that uses Directed Acyclic Graphs (DAGs) to define tasks and their dependencies.
  • 🛠️ Operators in Airflow are functions used to create tasks, with various types available for different operations like running Bash commands or sending emails.
  • 💡 Executors in Airflow determine how tasks run, with options for sequential, local, or distributed execution across machines.
  • 📊 The Airflow UI provides a visual representation of DAGs, tasks, and their statuses, making it easy to manage and monitor data pipelines.
  • 🚀 For practical learning, building a Twitter data pipeline using Airflow is recommended as a project to understand real-world applications of the tool.

Q & A

  • What is a data pipeline in the context of data engineering?

    -A data pipeline in data engineering is a process that involves extracting data from multiple sources, transforming it as needed, and then loading it into a target location. It's a way to automate the movement and transformation of data from one place to another.

  • Why might a simple Python script be insufficient for managing data pipelines?

    -A simple Python script might be insufficient for managing data pipelines, especially as the number of pipelines grows, because it can become complex and difficult to manage. Tasks might need to be executed in a specific order, and handling failures or scheduling can be challenging.

  • What is a Cron job and how is it used in data pipelines?

    -A Cron job is a time-based job scheduler in Unix-like operating systems. It is used to schedule scripts to run at specific intervals. In the context of data pipelines, Cron jobs can be used to automate the execution of scripts at regular times, but it can become cumbersome when managing many pipelines.

  • What is Apache Airflow and why is it popular?

    -Apache Airflow is an open-source workflow management tool designed to schedule and monitor data pipelines. It became popular due to its 'pipeline as code' philosophy, which allows data pipelines to be defined in Python scripts. It is widely adopted because it is open source, customizable, and supports complex workflows.

  • What does the term 'DAG' stand for in Apache Airflow?

    -In Apache Airflow, 'DAG' stands for Directed Acyclic Graph. It is a collection of tasks that are defined in a way that they are executed in a specific order, with no loops, making it a blueprint for the workflow.

  • How does Apache Airflow handle the execution of tasks?

    -Apache Airflow uses executors to determine how tasks are run. Different types of executors are available, such as Sequential Executor for sequential task execution, Local Executor for parallel task execution on a single machine, and Celery Executor for distributing tasks across multiple machines.

  • What is an operator in Apache Airflow and what role does it play?

    -An operator in Apache Airflow is a function provided by Airflow to create tasks and perform specific actions. Operators can be used to execute tasks like running Bash commands, calling Python functions, or sending emails, making it easier to manage different types of tasks in a pipeline.

  • How can one define a DAG in Apache Airflow?

    -In Apache Airflow, a DAG is defined using the DAG function from the Airflow library. You provide parameters such as the name, start date, schedule, and other parameters to configure the DAG. Tasks are then added to the DAG using operators like PythonOperator or BashOperator.

  • What is the significance of the 'pipeline as code' concept in Apache Airflow?

    -The 'pipeline as code' concept in Apache Airflow allows data pipelines to be defined in code, typically Python scripts. This makes it easier to version control, test, and modify pipelines, as well as collaborate on them, similar to how software development works.

  • How can one visualize the workflow in Apache Airflow?

    -The workflow in Apache Airflow can be visualized through the Airflow UI, which provides a graphical representation of DAGs. This visual representation helps in understanding the sequence of tasks, their dependencies, and the overall structure of the data pipeline.

  • What is an example project that can be done using Apache Airflow?

    -An example project that can be done using Apache Airflow is building a Twitter data pipeline. This involves extracting data from a Twitter API, performing transformations, and then loading the data into a storage system like Amazon S3. Although the Twitter API mentioned is not valid anymore, similar projects can be done with other APIs.

Outlines

00:00

🔧 Introduction to Data Pipelines and Apache Airflow

This paragraph introduces the concept of building data pipelines as a Data Engineer, which involves extracting data from various sources, transforming it, and loading it into a target location. It discusses the use of Python scripts for this purpose and the limitations of using Cron jobs for scheduling tasks, especially when dealing with a large number of data pipelines. The paragraph also highlights the importance of data in modern businesses and the role of data pipelines in personalized recommendations and advertisements. It concludes with an introduction to Apache Airflow, a data pipeline tool developed by Airbnb, which became popular due to its 'pipeline as code' philosophy and open-source nature, allowing for customization and scalability.

05:03

🛠 Understanding Apache Airflow's Core Components

The second paragraph delves into the specifics of Apache Airflow, explaining its components and how it simplifies the management of complex data pipelines. It starts by discussing the Cron job's inadequacy for managing numerous pipelines and introduces the Directed Acyclic Graph (DAG) concept, which is the core of Airflow's workflow management. The paragraph explains that a DAG is a blueprint defining tasks and their dependencies. It also introduces operators as functions provided by Airflow to create tasks for different operations, such as running Bash commands or Python functions. The paragraph further explains the role of executors in determining how tasks are run, with options for sequential, local, or distributed execution across machines.

10:03

📊 Practical Overview of Airflow's UI and DAG Execution

This paragraph provides a practical overview of Apache Airflow's user interface and the execution of DAGs. It describes how to declare a DAG in Python, including setting parameters like name, start date, and schedule. The paragraph illustrates the use of the Dummy Operator and the creation of task dependencies to ensure tasks execute in a specific sequence. It also explains how to view and manage DAGs through the Airflow console, including monitoring their status such as queued, running, successful, or failed. The paragraph concludes with an example of enabling and manually running a DAG, and observing its progression and outcome within the Airflow UI.

🐦 Building a Twitter Data Pipeline with Apache Airflow

The final paragraph discusses a project involving the creation of a Twitter data pipeline using Apache Airflow. Although the Twitter API mentioned is no longer valid, the paragraph suggests using alternative free APIs for a similar project. It provides a brief explanation of the code involved in the project, which includes defining a function to extract data from the Twitter API, perform transformations, and store the data on Amazon S3. The paragraph also outlines the structure of the 'twitter_dag.py' file, detailing how to define a DAG, tasks, and dependencies within Airflow. It concludes by recommending a project for beginners to gain hands-on experience with Airflow and to solidify their understanding of its practical applications.

Mindmap

Keywords

💡Data Pipeline

A data pipeline is a series of processes through which data is ingested, transformed, and delivered to a target location. In the context of the video, building a data pipeline involves reading data from multiple sources, applying transformations, and storing the processed data. This is crucial for data engineering as it enables the flow of data necessary for analytics and decision-making.

💡Cron Job

A Cron job is a time-based job scheduler in Unix-like operating systems. It is used to schedule scripts to run at specific intervals. In the video, Cron jobs are initially mentioned as a way to automate the execution of simple Python scripts for data processing. However, the speaker points out limitations when scaling to hundreds of data pipelines, highlighting the need for more sophisticated tools like Apache Airflow.

💡Apache Airflow

Apache Airflow is an open-source tool for orchestrating complex data workflows. It allows users to define, schedule, and monitor workflows as code. The video emphasizes its popularity due to its 'pipeline as code' philosophy, making it a preferred choice for managing data pipelines at scale. It was initially developed by Airbnb and later became an Apache project.

💡Directed Acyclic Graph (DAG)

In Apache Airflow, a DAG is the fundamental concept that represents a collection of tasks with dependencies. The term 'Directed Acyclic Graph' implies that tasks are directed (have a specific order) and acyclic (do not form loops). The video explains that a DAG serves as a blueprint for workflows, defining the sequence in which tasks must be executed.

💡Operators

Operators in Apache Airflow are functions that create tasks to perform specific actions. They are used to define the logic of individual tasks within a DAG. The video mentions various types of operators, such as Bash Operator for running Bash commands, Python Operator for executing Python functions, and Email Operator for sending emails, illustrating how they facilitate the execution of different types of tasks in a data pipeline.

💡Executors

Executors in Apache Airflow determine how tasks are run. They manage the execution environment for tasks within a DAG. The video discusses different types of executors, such as Sequential Executor for running tasks one at a time, Local Executor for parallel execution on a single machine, and Celery Executor for distributing tasks across multiple machines.

💡Workflow Management Tool

A workflow management tool is a software application that helps in defining, managing, and executing workflows. Apache Airflow is described as such a tool in the video, emphasizing its ability to manage complex data workflows. It allows users to visualize and control the execution of tasks in a sequence, ensuring that data processing happens in the correct order.

💡Tasks

In the context of Apache Airflow, tasks are the individual components of a DAG that perform specific operations. They are executed in a sequence defined by the DAG. The video uses the example of a data pipeline where tasks might include extracting data from sources, transforming data, and loading it into a target location.

💡Data Transformation

Data transformation is the process of converting data from one format or structure into another, often as part of a data pipeline. In the video, data transformation is mentioned as a step in the data pipeline where raw data is processed and prepared for analysis or storage. This is a critical step in ensuring that data is usable for its intended purpose.

💡Twitter Data Pipeline

The video mentions a project involving a Twitter data pipeline, which is an example of a specific use case for Apache Airflow. This project would involve extracting data from the Twitter API, performing transformations, and storing the data in a location like Amazon S3. It serves as an illustration of how Apache Airflow can be used in real-world data engineering scenarios.

Highlights

Building a data pipeline involves taking data from multiple sources, transforming it, and loading it onto a target location using Python scripts.

Cron jobs can schedule scripts to run at specific intervals but are not efficient for managing hundreds of data pipelines.

90% of the world's data was generated in the last 2 years, highlighting the importance of data processing in business.

Apache Airflow is a highly used data pipeline tool introduced by Airbnb engineers in 2014 and open-sourced in 2016.

Airflow's popularity stems from its 'pipeline as code' philosophy, allowing for customization and open-source accessibility.

Airflow is a workflow management tool that uses Directed Acyclic Graphs (DAGs) to define tasks and their dependencies.

DAGs in Airflow are a visual representation of tasks with directed, acyclic movement, ensuring no looping.

Operators in Airflow are functions used to create tasks, with different types available for various operations like Bash commands or Python functions.

Executors in Airflow determine how tasks run, with options for sequential, local, or distributed task execution.

Airflow's UI provides a centralized place to manage, monitor, and visualize data pipelines.

The Airflow UI displays the status of DAGs, including queued, running, successful, failed, and more.

Airflow allows for the creation of complex data pipelines with multiple dependencies and tasks.

The video provides an example of building a Twitter data pipeline using Airflow, demonstrating practical application.

The presenter offers a project for beginners to build a Twitter data pipeline using Airflow to understand its real-world application.

Airflow's simplicity and the presenter's aim to demystify technical concepts make it accessible for learners.

The video concludes with a call to action for viewers to subscribe and like for more simplified technical content.

Transcripts

play00:00

One of the tasks you will do as a Data Engineer is  to build a data pipeline. Basically, you take data  

play00:05

from multiple sources, do some transformation  in between, and then load your data onto some  

play00:10

target location. Now, you can perform this entire  operation using a simple Python script. All you  

play00:16

have to do is read data from some APIs, write  your logic in between, and then store your data  

play00:21

onto some target location. There is something  called a Cron job. So, if you want to run your  

play00:25

script at a specific interval, you can schedule  it using Cron job. It looks something like this.

play00:30

But here's the thing: you can use Cron  job for, let's say, two to three scripts,  

play00:34

but what if you have hundreds of data pipelines?  We know that 90% of the world's data was generated  

play00:40

in just the last 2 years, and businesses around  the world are using this data to improve their  

play00:44

products and services. The reason you see the  correct recommendation on your YouTube page or the  

play00:49

correct ads on your Instagram profile is because  of all of these data processing. There are more  

play00:55

than thousands of data pipelines running in these  organizations to make all of these things happen.

play01:00

So today, we will understand how all of  these things happen behind the scene,  

play01:04

and we will understand one of the highly  used data pipeline tools in the market,  

play01:09

called Apache Airflow. So, are  you ready? Let's get started.

play01:12

At the start of this video, we talked about  the Cron job. As the data grows, we will have  

play01:16

to create more and more data pipelines to process  all of these data. What if something fails? What  

play01:21

if you want to run all of these operations  in a specific order? So, in a data pipeline,  

play01:25

we have multiple different operations coming.  So, one task might be to extract data from RDBMS,  

play01:31

APIs, or some other sources. Then the second  script will aggregate all of these data,  

play01:35

and the third script will basically store  this data onto some location. Now, all of  

play01:39

these operations should happen in a specific  sequence only, so we will have to make sure  

play01:44

we schedule our Cron job in such a way that all  of these operations happen in proper sequence.

play01:50

Now, doing all of these operations using a simple  Python script and managing them is a headache. You  

play01:54

might need to put a lot of engineers on each  and individual task to make sure everything  

play01:58

runs smoothly. And this is where, ladies and  gentlemen, Apache Airflow comes into the picture.

play02:04

In 2014, engineers at Airbnb started working on a  project, Airflow. It was brought into the Apache  

play02:10

Software Incubator program in 2016 and became  open source. That basically means anyone in  

play02:16

the world can use it. It became one of the most  viral and widely adopted open-source projects,  

play02:21

with over 10 million pip installs over a  month, 200,000 GitHub stars, and a Slack  

play02:26

community of over 30,000 users. Airflow became  a part of big organizations around the world.

play02:32

The reason Airflow got so much popularity  was not because it was funded or it had a  

play02:37

good user interface or it was easy to install.  The reason behind the popularity of Airflow was  

play02:43

"pipeline as a code." So before this, we talked  about how you can easily write our data pipeline  

play02:47

in a simple Python script, but it becomes very  difficult to manage. Now, there are other options,  

play02:51

such as you can use enterprise-level  tools such as Alteryx, Informatica,  

play02:55

but these software are very expensive. And also,  if you want to customize based on your use case,  

play03:00

you won't be able to do that. This is where  Airflow shines. It was open source, so anyone  

play03:05

can use it, and on top of this, it gave a lot  of different features. So, if you want to build,  

play03:09

schedule, and run your data pipeline on scale,  you can easily do that using Apache Airflow.

play03:15

So now that we understood why Apache Airflow  and why we really need it in the first place,  

play03:19

let's understand what Apache Airflow is. So,  Apache Airflow is a workflow management tool.  

play03:24

A workflow is like a series of tasks that need to  be executed in a specific order. So, talking about  

play03:29

the previous example, we have data coming from  multiple sources, we do some transformation in  

play03:33

between, and then load that data onto some target  location. So, this entire job of extracting,  

play03:38

transforming, and loading is called a workflow.  The same terminology is used in Apache Airflow,  

play03:44

but it is called a DAG (Directed Acyclic  Graph). It looks something like this.

play03:49

At the heart of the workflow is a DAG that  basically defines the collection of different  

play03:53

tasks and their dependency. This is the  core computer science fundamental. Think  

play03:57

of it as a blueprint for your workflow. The  DAG defines the different tasks that should  

play04:01

run in a specific order. "Directed" means  tasks move in one direction, "acyclic" means  

play04:06

there are no loops - tasks do not run in a  circle, it can only move in one direction,  

play04:11

and "graph" is a visual representation  of different tasks. Now, this entire  

play04:15

flow is called a DAG, and the individual  boxes that you see are called tasks. So,  

play04:20

the DAG defines the blueprint, and the tasks  are your actual logic that needs to be executed.

play04:26

So, in this example, we are reading the data from  external sources and API, then we aggregate data  

play04:31

and do some transformation, and load this data  onto some target location. So, all of these  

play04:35

tasks are executed in a specific order. Once the  first part is completed, then only the second part  

play04:40

will execute, and like this, all of these  tasks will execute in a specific order.

play04:44

Now, to create tasks, we have something called  an operator. Think of the operator as a function  

play04:49

provided by Airflow. So, you can use all of these  different functions to create the task and do the  

play04:55

actual work. There are many different types  of operators available in Apache Airflow. So,  

play04:58

if you want to run a Bash command, there is an  operator for that, called the Bash Operator. If  

play05:02

you want to call a Python function, you can use a  Python Operator. And if you want to send an email,  

play05:06

you can also use the Email Operator. Like this,  there are many different operators available for  

play05:10

different types of jobs. So, if you want to read  data from PostgreSQL, or if you want to store your  

play05:15

data to Amazon S3, there are different types of  operators that can make your life much easier.

play05:21

So, operators are basically the functions  that you can use to create tasks,  

play05:24

and the collection of different tasks is  called a DAG. Now, to run this entire DAG,  

play05:29

we have something called executors. Executors  basically determine how your tasks will run. So,  

play05:34

there are different types of  executors that you can use. So,  

play05:36

if you want to run your tasks sequentially, you  can use the Sequential Executor. If you want to  

play05:40

run your tasks in parallel in a single machine,  you can use the Local Executor. And then, if you  

play05:44

want to distribute your tasks across multiple  machines, then you can use the Celery Executor.

play05:49

This was a good overview of Apache Airflow.  We understood why do we need Apache Airflow  

play05:53

in the first place, how it became popular,  and what are the different components in  

play05:57

Apache Airflow that make all of these  things happen. So, I will recommend an  

play06:01

end-to-end project that you can do using Apache  Airflow at the end of this video. But for now,  

play06:05

let's do a quick exercise of Apache Airflow to  understand different components in practice.

play06:10

So, we understood the basics about Airflow  and what are the different components that are  

play06:14

attached to Airflow. Now, let's look at a quick  overview of what the Airflow UI really looks  

play06:19

like and how these different components come  together to build the complete data pipeline.

play06:25

Okay, so we already talked about DAGs, right?  So, Directed Acyclic Graph is a core concept in  

play06:30

Airflow. Basically, a DAG is the collection  of tasks that we already understood. So,  

play06:34

it looks something like this: A is the task,  B is the task, D is the task, and sequentially  

play06:38

it will execute and it will make the complete  DAG. So, let's understand how to declare a DAG.

play06:43

Now, it is pretty simple. You have to  import a few packages. So, from Airflow,  

play06:46

you import the DAG, and then there is the  Dummy Operator that basically does nothing. So,  

play06:50

with DAG, this is the syntax. So, if you know the  basics of Python, you can start with that. Now,  

play06:54

if you don't have the Python understanding,  then I already have courses on Python,  

play06:58

so you can check that out if you  want to learn Python from scratch.

play07:01

So, this is how you define the DAG. With DAG,  then you give the name, you give the start date,  

play07:06

so when you want to run this particular DAG,  and then you can provide the schedule. So, if  

play07:09

you want to run daily, weekly, monthly basis, you  can do that. And there are many other parameters  

play07:14

that this DAG function takes. So, based on your  requirement, you can provide those parameters,  

play07:19

and the DAG will run according to all of  those parameters that you have provided.

play07:23

So, this is how you define the DAG. And if you  go over here, you can use the Dummy Operator,  

play07:27

where you give basically the task, the task  name, or the ID, and you provide the DAG that  

play07:31

you want to attach this particular task to. So,  as you can see it over here, we define the DAG,  

play07:36

and then we provide this particular DAG name to  the particular task. So, if you are using the  

play07:40

Python Operator or Bash Operator, all you have to  do is use the function and provide the DAG name.

play07:45

Now, just like this, you can also create the  dependencies. So, the thing that we talked about,  

play07:50

right? I want to run my, uh, all of these tasks  in the proper sequence. So, as you can see,  

play07:55

I provide the first task, and then you can  use something like this. So, what will happen,  

play07:59

the first task will run, and it will execute the  second and third tasks together. After the third  

play08:03

task completes, the fourth task will be executed.  So, this is how you create the basic dependencies.

play08:08

Now, uh, this was just documentation, and  you can always read about it if you want  

play08:12

to learn more. So, let's go to our Airflow  console and try to understand this better.

play08:16

Okay, once you install Apache, it will look  something like this. You will be redirected  

play08:20

to this page, and over here, you will see a lot  of things. So, first is your DAGs. These are the  

play08:25

example DAGs that are provided by Apache Airflow.  So, if I click over here, and if I go over here,  

play08:29

you will see, uh, this is the DAG, which basically  contains one operator, which is the Bash Operator.  

play08:34

Just like this, if you click onto DAGs, you will  see a lot of different examples. If you want to  

play08:39

understand how all of these DAGs are created  from the backend, over here, you will get the  

play08:43

information about the different runs. If your  DAG is currently queued, if it's successful,  

play08:47

running, or failed, this will give you all of  the different information about the recent tasks.

play08:51

So, I can go over here, I can just enable this  particular DAG. Okay, I can go inside this,  

play08:56

and I can manually run this from the  top. Okay, so I will trigger the DAG,  

play09:00

and it will start running. So, currently,  it is queued. Now it starts running,  

play09:04

and if I go to my graph, you will  see it is currently running. Uh,  

play09:07

if you keep refreshing it, so as you can see,  this is successful. So, our DAG ran successfully.

play09:11

Now, there are other options, such  as like failed, queued, removed,  

play09:14

restarting, and all of the different statuses  that you can track if you want to do that. So,  

play09:18

this is what makes Apache Airflow a very  popular tool because you can do everything  

play09:23

in one place. You don't have to worry about  managing these things at different places. So,  

play09:27

at one single browser, you  will be able to do everything.

play09:30

So, all of the examples that you see it  over here are just basic templates. So,  

play09:34

if I go over here and check onto example_complex,  you will see a graph which is this complicated,  

play09:40

right? You will see a lot of different  things. So, we have like entry group,  

play09:43

and then entry group is, uh, dependent on  all of these different things. So, the graph  

play09:48

is pretty complex. So, you can create all  of these complex pipelines using Airflow.

play09:53

Now, one of the projects that you will do after  this is build a Twitter data pipeline. Now,  

play09:58

Twitter API is not valid anymore,  but you can always use different  

play10:02

APIs available in the market for free  and then create the same project. So,  

play10:06

I'll just explain to you this code so  that you can have a better understanding.

play10:09

So, I have defined the function  as run_twitter_etl, and the name  

play10:13

of the file is twitter_etl, right? Uh,  this is the basic Python function. So,  

play10:17

what we are really doing is extracting  some data from the Twitter API,  

play10:20

doing some basic transformation, and  then storing our data onto Amazon S3.

play10:24

Now, this is my twitter_dag.py. So, this is  where I define the DAG of my Airflow. Okay,  

play10:31

so as you can see it over, we are  using the same thing. From Airflow,  

play10:34

import DAG. Then, from PythonOperator, I'm  using the PythonOperator because I want to  

play10:38

run this particular Python function, which is  run_twitter_etl, using my Airflow DAG. Okay,  

play10:44

so I first defined the parameters, which  is like the owner, start time, emails,  

play10:48

and all of the other things. Then, this is where  I define my actual DAG. So, this is my DAG name,  

play10:53

these are my arguments, and these are my  description. So, you can write whatever you want.

play10:57

Now, I define one task. So, in this  example, I only have one task. So,  

play11:01

PythonOperator, I provide the task ID, Python  callable, I provide the function name. Now,  

play11:07

this function is, I import it from the  twitter_etl, which is the second file, uh,  

play11:12

this one. So, twitter_etl, from twitter_etl,  I import the run_twitter_etl function,  

play11:16

and I call it inside my PythonOperator. So,  I call that function using my PythonOperator,  

play11:22

and then I attach it to the DAG. And then,  at the end, I just provide the run_etl.

play11:26

Now, in this case, if I had like different  operators, such as I can have like run_etl1,  

play11:30

run_etl2, something like this, okay? So,  I can do something like this: run_etl1,  

play11:35

run_etl2. And then, I can create the dependencies  also. So, then etl1, then etl2. So, this will  

play11:42

execute in a sequence manner. So, once this  executes, then this will execute, this and this.

play11:46

So, I just wanted to give you a  good overview about Airflow. Now,  

play11:50

if you really want to learn Airflow from scratch  and how to install and each and everything,  

play11:55

I already have one project available,  and the project name is the Twitter data  

play11:59

pipeline using Airflow for beginners. So,  this is the data engineering project that  

play12:02

I've created. I will highly recommend  you to do this project so that you will  

play12:05

get a complete understanding of Airflow  and how it really works in the real world.

play12:09

I hope this video was helpful. The goal of  this video was not to make you a master of  

play12:12

Airflow but to give you a clear understanding  of the basics of Airflow. So, after this,  

play12:17

you can always do any of the courses available  in the market, and then you can easily master  

play12:22

them because most of the people make technical  things really complicated. And the reason, uh,  

play12:27

I started this YouTube channel is  to simplify all of these things.

play12:30

So, if you like these types of content,  then definitely hit the subscribe button,  

play12:34

and don't forget to hit the like button. Thank  you for watching. I'll see you in the next video.

Rate This

5.0 / 5 (0 votes)

Related Tags
Data EngineeringApache AirflowWorkflow ManagementData PipelineCron JobsPython ScriptETL ProcessAirbnb ProjectOpen SourceData Transformation