Using dbt in Snowflake

Set up a dbt project, Group, Warehouse and Service Account in Snowflake.

Using dbt in Snowflake

Dbt (data build tool) is a command-line tool that enables data analysts and engineers to transform and model data in the data warehouse. It's like the "T" in "ETL" but for the modern data stack. Dbt does the "transform" part by enabling you to write, document, and test SQL-based data transformation workflows. It does not extract or load data, but it’s extremely good at transforming the data that’s already loaded into your warehouse.

Snowflake is a data warehouse built on top of the Amazon Web Services (AWS) cloud infrastructure. It's designed to be easy-to-use, scalable, and flexible.

To install and configure dbt in Snowflake, follow these steps:

Step-by-Step Guide to Use dbt In Snowflake

Step 1: Install dbt

You can install dbt by using pip, which is the package installer for Python. You can install dbt by running the following command in your terminal:

to get started: "pip install dbt"

Step 2: Create a dbt Project

After installing dbt, you can create a new dbt project by running the following command in your terminal:

use the "dbt init" command

This will create a new directory called my_project with a basic dbt project structure.

Step 3: Configure your dbt Profile

Dbt uses a file called profiles.yml for connection configurations. This file is usually located in the ~/.dbt directory.

Here is an example profiles.yml file for snowflake:

Replace the placeholder values with your own Snowflake credentials.


Step 4: Edit your dbt_project.yml File

Edit the dbt_project.yml file in your dbt project directory. This file is used to configure your dbt project.

Here is an example dbt_project.yml file:

Example of a dbt_project.yml file in your dbt project directory to use dbt in Snowflake.


Step 5: Run dbt Commands

Now you can run dbt commands to interact with your Snowflake database. Here are some common dbt commands:

  • dbt run: This command will run all models in your project.
  • dbt test: This command will run tests on your models.
  • dbt seed: This command will load CSV files into your database.
  • dbt docs generate: This command will generate documentation for your project.

Remember to always run dbt debug to check your connection before running any other dbt commands.

That's it! You have successfully installed and configured dbt Python in Snowflake.

Snowflake Best Practices for dbt

Before diving into dbt, it is crucial to ensure that we are adhering to the best practices in Snowflake. There are a few preliminary configurations to set up before starting with dbt.

First, it is highly advised to configure your credentials to use a key-pair authentication mechanism (we will discuss other configurations later). This is more secure than using a username and password. Second, it is recommended to create a dedicated group and warehouse for dbt. This ensures that the dbt group has the necessary permissions to create and read various objects. Additionally, having a dedicated warehouse ensures that dbt transformations are isolated in terms of compute and memory resources. This isolation helps in monitoring and managing performance expectations.

Lastly, if you plan to automate dbt tasks, it is wise to create a dedicated dbt service account. This is important to decouple the automation from an employee's account, which may become inactive if the employee leaves the organization. Using a service account ensures that the automation continues to run smoothly even if an employee departs.

Setting Up Key-Pair Authentication

Snowflake supports key-pair authentication, which is more secure than using a username and password. Here are the steps to set up key-pair authentication:

  1. Generate a key pair.
  2. Upload the public key to Snowflake.
  3. Configure the dbt profile to use the private key.

Generate a Key Pair

You can generate a key pair using the openssl command-line tool. Run the following commands in your terminal:

Generate a key-pair to use dbt with Snowflake

Upload the Public Key to Snowflake

Log in to the Snowflake web interface and navigate to the 'Users' tab. Select the user you want to configure and add the public key to the 'RSA Public Key' field.

Configure the dbt Profile

Edit your profiles.yml file to configure the dbt profile to use the private key. Here is an example profiles.yml file that uses key-pair authentication:

Here is an example profiles.yml file that uses key-pair authentication. Replace the placeholder values with your own Snowflake configuration.

Creating a dbt Group and Warehouse

It is a good practice to create a dedicated group and warehouse for dbt in Snowflake. This ensures that the dbt group has the necessary permissions to create and read various objects, and that dbt transformations are isolated in terms of compute and memory resources.

Create a dbt Group

Log in to the Snowflake web interface and navigate to the 'Roles' tab. Create a new role called 'DBT_ROLE' and grant the necessary permissions to this role.

Create a dbt Group in Snowflake

Create a dbt Warehouse

Navigate to the 'Warehouses' tab and create a new warehouse called 'DBT_WAREHOUSE'. Configure the warehouse size and other settings as needed.

Create a dbt Warehouse in Snowflake

Creating a dbt Service Account

If you plan to automate dbt tasks, it is recommended to create a dedicated dbt service account. This ensures that the automation is not tied to an employee's account and continues to run smoothly even if an employee leaves the organization.

Create a dbt Service Account

Log in to the Snowflake web interface and navigate to the 'Users' tab. Create a new user called 'DBT_SERVICE_ACCOUNT' and configure the user with the necessary settings.

Create a dbt Service Account in Snowflake

Grant Permissions to the dbt Service Account

Grant the necessary permissions to the dbt service account.

GRANT ROLE DBT_ROLE TO USER DBT_SERVICE_ACCOUNT

Conclusion

By following these best practices, you can ensure that your dbt setup in Snowflake is secure, scalable, and ready for automation. Remember to always use key-pair authentication, create a dedicated group and warehouse for dbt, and use a dedicated service account for automation.

New Release
Share

Get in Touch to Learn More

See Why Users Love CastorDoc
Fantastic tool for data discovery and documentation

“[I like] The easy to use interface and the speed of finding the relevant assets that you're looking for in your database. I also really enjoy the score given to each table, [which] lets you prioritize the results of your queries by how often certain data is used.” - Michal P., Head of Data