[05] – Seeds in dbt

In this course, you’ll learn about Seed in dbt.

What are the Seeds? #

Seeds in dbt are CSV files (typically stored in the seeds directory) that can be loaded into a data warehouse using the dbt seed command. They can be referenced in models using the ref function, just like other dbt models. Since they reside in the dbt repository, they are version-controlled and subject to code review.

When use the Seed #

Best suited for:

Static data that rarely changes, such as country code mappings, test email lists, or employee account IDs.

Not suitable for:

Large raw data exports.
Sensitive production data (e.g., PII or passwords).

Steps to Create a Seed #

The typicall steps to create a seed are as follow:

Create a seeds/ folder in your dbt project.
Place CSV files inside (e.g., seeds/product_categories.csv).
Run the command to load the seed into the database.

Seed Example #

Now, let’s put in place what learn above about the seeds. The Orders table might contain only product_id, but we want to enrich it with product categories. Instead of joining with an external database, we can load a small CSV file as a seed to map product_id to its category.

product_id,category
101,Electronics
102,Clothing
103,Home & Kitchen
104,Beauty

Once you add the file seeds/product_categories.csv, run the command bellow to create to load in the data wharehouse the csv as table.

dbt seed

Once the command complete, a table named product_categories should be created. Now, with the help the ref function we’re going to product_categories table the to improve our model stg_orders.

When to Use Seeds vs. Source Tables? #

Feature	Use a Seed (CSV)	Use a Source Table
Data Changes Frequently?	❌ No	✅ Yes
Data Comes from an External System?	❌ No	✅ Yes
Static Reference Data?	✅ Yes	❌ No
Need to be Manually Updated?	✅ Yes	❌ No

Updated on March 2, 2025

Apache Airflow

Apache Iceberg

Apache Spark

Data Build Tool (DBT)

SQL - Advanced

SQL - Basics

SQL - Intermediate