Course Content
Introduction to Apache Airflow
0/1
DAG in Apache Airflow
0/1
Apache Airflow Basics

Apache Airflow is a workflow orchestration tool that helps automate and schedule workflows (ETL, data pipelines, etc.). At its core, it consists of DAGs, Operators, and Tasks, which define how workflows are executed.

To follow along and use can install airflow in local python folllowing this tutorial.

Directed Acyclic Graph (DAG)

A DAG (Directed Acyclic Graph) is a collection of tasks that Airflow schedules and runs in a defined order. It ensures that tasks are executed as part of a single DAG run and follow dependencies correctly.

The DAG does not manage the internal execution of tasks; its primary role is to define how they are executed, determining their order, retry attempts, timeouts, and other scheduling parameters.

Key Features of a DAG

  • Directed: Tasks execute in a specific order.
  • Acyclic: No circular dependencies (Task A → Task B → Task A is NOT allowed).
  • Graph: Represents a workflow where tasks are nodes, and dependencies are edges.

Creating a DAG in Airflow

There are three way to decalre DAG:

  • Using Dag constructor

from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

# Define DAG
dag = DAG(
    "dag_using_constructor",
    schedule_interval="@daily",  # Run daily
    start_date=datetime(2025, 3, 1),
    catchup=False
)

# Define tasks
start_task = DummyOperator(task_id="start", dag=dag)
end_task = DummyOperator(task_id="end", dag=dag)

# Define dependencies
start_task >> end_task

  • Using a python context manager with

from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

# Define DAG using context manager
with DAG(
    "dag_using_context_manager",
    schedule_interval="@daily",
    start_date=datetime(2025, 3, 1)
) as dag:

    # Define tasks
    start_task = DummyOperator(task_id="start", dag=dag)
    end_task = DummyOperator(task_id="end", dag=dag)

    # Define dependencies
    start_task >> end_task

  • Using a @dag decorator

from airflow.decorators import dag
from airflow.operators.empty import EmptyOperator
from datetime import datetime


# Define Dag using taskflow
@dag(start_date=datetime(2025, 3, 1), schedule="@daily")
def generate_dag():
    EmptyOperator(task_id="task")


generate_dag()

When defining a DAG, three essential parameters must be provided.

  • dag_id: The unique identifier for the DAG within an Airflow instance.
  • schedule: Defines the execution frequency.
  • start_date: Specifies when the DAG’s first execution will occur.

ℹ️ When using a decorator method to create a DAG, if dag_id is not provided, it defaults to the name of the function defining the DAG. In the example above, since we did not specify a dag_id, the DAG name will be generate_dag_decorator.