SQL is a powerful language 🤖⠀fueling analytics, product, and operations. It directly impacts decision-making and eventually revenue.
Maintaining an analytics or feature store pipeline involves a lot of SQL and parameters. We give a useful tip on how to serve those parameters in a smooth manner to cut down on headaches 🤕⠀and errors.
As I was designing and reviewing BI queries at Qonto (a french fintech) I had to integrate constants:
Let’s pick the following example where the marketing team 👩🏽💻⠀wants reporting based upon a medium bracket customer segment. We agreed with stakeholders on the following definition:
The medium bracket should hold clients bearing between 10 and 30 valid transactions through the last 20 days
We want a base table that stores the client id belonging to that bracket and summarizes the number of transactions as well as the total amount of those transactions 💳
The base query would group transactions by customer_id over the period of time and then filter upon the threshold aforementioned.
We can use a CTE (common table expression) for the first two steps and produce the following
One nice improvement over scattering constants all over lengthy SQL files is to group all constants and parameters in a dedicated CTE (common table expression) 🏗
Another one is to use WHERE TRUE with AND for each filter which allows easy editing
Here is what the query could look like in a Snowflake dialect
This example has a minimal footprint. In real-life base tables, you could face 10–20 parameters and 50+ output columns.
The upsides are manyfold:
Let’s explore how to use this trick with the most common use cases
We use python 3.9 syntax throughout those use cases (including new dictionary update)
Most database connection engine allows binding parameters:
The syntax are defined in PEP249 and are driver dependent: for psycopg2 use the %(parameter)s syntax
You can mix it with configuration, references, and variables in customer_transaction.sql
Set variables in dbt_project.yml configuration file as such:
You can use a sql operator such as PostgresOperator in conjunction with:
Define the source.sql file with parameters placeholders
Define the parameters.py file with constant values:
Add the file loader load_file.py
You can feed parameters at task invoke the DAG customers.py
See documentation
Metabase allows you to use variables in queries using {{ }}
With the following parameters:
Writing and maintaining 🔧⠀complex and efficient SQL queries requires experience and skills. It involves wrapping your mind around growing data assets. You might want to leverage🕵🏽♀️⠀the most of your data knowledge.
I am a co-founder at Castor, a data discovery platform designed to help anyone find, understand and use data assets across the whole company.
We developed a SQL query history feature to help data people find and share queries. Check it out.