Course Content
1. Get Started with dbt
0/2
2. Setup dbt
0/1
4. Jinja Templates and Macros
0/2
7. Snapshots
8. Analyses
9. Hook and Exposures
10. Tests and Data Quality
11. Documentation
12. Best Practices
0/1
13. Deploy dbt in Production
0/2
dbt From Basics to Mastery by Examples

In this lesson, you will learn how to define and use sources in dbt. Sources allow you to reference raw data in your data warehouse in a structured and maintainable way, improving data lineage, documentation, and quality control.

This tutorial will guide you through step-by-step examples, making it easy to integrate sources into your dbt workflow. Whether you are new to dbt or looking to enhance your existing projects, this tutorial will provide the knowledge you need to organize and manage your raw data effectively.

What are sources in dbt ?

Sources allow you to define and document the data that your Extract and Load (EL) tools bring into your data warehouse. By declaring tables as sources in dbt, you can:

  • Reference source tables in your models using the {{ source() }} function, making data lineage clear.
  • Validate assumptions about your source data through tests.
  • Monitor data freshness to ensure timely and accurate transformations.

Create a source

To create a source, simply add a filename.yaml file inside the models directory where your dbt models are defined.

In the example below, we define two sources: dev and prod.

  • The dev source includes two tables from the dbt-demo database within the demo schema.
  • The prod source contains one table.

version: 2

sources:
  - name: dev
    database: dbt-demo
    schema: demo
    tables:
      - name: orders
      - name: customers

  - name: prod
    tables:
      - name: payments

ℹ️ By default, the schema matches the source name. Include the schema only if the source name differs from the actual schema in your database.

Using a source

A source in dbt serves three main purposes:

  1. Selecting data within models from a source.
  2. Testing data quality of the source.
  3. Checking the freshness of the source data.

In this tutorial, we’ll focus only on using sources for selecting data. Dedicated chapters will cover the concepts of data quality testing and source checking freshness.

In the previous tutorial about dbt model, we created the completed_orders model by selecting data directly from demo.orders.

SELECT
    order_id,
    customer_id,
    order_date,
    total_amount
FROM demo.orders
WHERE status = 'Completed'

Now, let’s utilize the dev source we defined earlier. In the code below, the FROM statement has been updated to reference the source instead of directly selecting from the table.

SELECT
    order_id,
    customer_id,
    order_date,
    total_amount
FROM {{ source('dev', 'orders') }} 
WHERE status = 'Completed'