In this course, you’ll learn about Seed in dbt.
What are the Seeds? #
Seeds in dbt are CSV files (typically stored in the seeds directory) that can be loaded into a data warehouse using the dbt seed command. They can be referenced in models using the ref function, just like other dbt models. Since they reside in the dbt repository, they are version-controlled and subject to code review.
When use the Seed #
Best suited for:
- Static data that rarely changes, such as country code mappings, test email lists, or employee account IDs.
Not suitable for:
- Large raw data exports.
- Sensitive production data (e.g., PII or passwords).
Steps to Create a Seed #
The typicall steps to create a seed are as follow:
- Create a
seeds/
folder in your dbt project. - Place CSV files inside (e.g.,
seeds/product_categories.csv
). - Run the command to load the seed into the database.
Seed Example #
Now, let’s put in place what learn above about the seeds. The Orders table might contain only product_id
, but we want to enrich it with product categories. Instead of joining with an external database, we can load a small CSV file as a seed to map product_id
to its category.
product_id,category
101,Electronics
102,Clothing
103,Home & Kitchen
104,Beauty
Once you add the file seeds/product_categories.csv
, run the command bellow to create to load in the data wharehouse the csv as table.
dbt seed
Once the command complete, a table named product_categories
should be created. Now, with the help the ref function we’re going to product_categories
table the to improve our model stg_orders.
When to Use Seeds vs. Source Tables? #
Feature | Use a Seed (CSV) | Use a Source Table |
---|---|---|
Data Changes Frequently? | ❌ No | ✅ Yes |
Data Comes from an External System? | ❌ No | ✅ Yes |
Static Reference Data? | ✅ Yes | ❌ No |
Need to be Manually Updated? | ✅ Yes | ❌ No |