What if ... you had to use parameters in SQL queries

TLDR: SQL parameters can be a mess, quickly. Introducing tricks for SQL, DBT, pandas, Airflow, Metabase

5 min read

May 9, 2023

By Amaury Dumoulin

SQL is a powerful language 🤖⠀fueling analytics, product, and operations. It directly impacts decision-making and eventually revenue.

Maintaining an analytics or feature store pipeline involves a lot of SQL and parameters. We give a useful tip on how to serve those parameters in a smooth manner to cut down on headaches 🤕⠀and errors.

‍

Hell game of literals in BI queries

As I was designing and reviewing BI queries at Qonto (a french fintech) I had to integrate constants:

revenue contractual commission on certain transactions 💰
compliance thresholds for certain batch reporting
marketing segmentation parameters (e.g. number of employees)
top management dashboard temporal differences to account to different moving averages

‍

Use Case

Need

Let’s pick the following example where the marketing team 👩🏽‍💻⠀wants reporting based upon a medium bracket customer segment. We agreed with stakeholders on the following definition:

The medium bracket should hold clients bearing between 10 and 30 valid transactions through the last 20 days

We want a base table that stores the client id belonging to that bracket and summarizes the number of transactions as well as the total amount of those transactions 💳

‍

First approach

The base query would group transactions by customer_id over the period of time and then filter upon the threshold aforementioned.

Select all valid transactions within the timeframe ⏰
Group them by customer_id and aggregate count of transactions
Filter lines where the count is within the rang 🚦

We can use a CTE (common table expression) for the first two steps and produce the following

‍

Introducing tricks

One nice improvement over scattering constants all over lengthy SQL files is to group all constants and parameters in a dedicated CTE (common table expression) 🏗

Another one is to use WHERE TRUE with AND for each filter which allows easy editing

Here is what the query could look like in a Snowflake dialect

This example has a minimal footprint. In real-life base tables, you could face 10–20 parameters and 50+ output columns.

The upsides are manyfold:

DRY (don’t repeat yourself): declare once use multiple times if needed 🙉
Easy maintenance: fewer errors and easier update for anyone
Easy to understand: parameters should come ahead

‍

Real-world usage

Let’s explore how to use this trick with the most common use cases

science pandas 👨🏻‍🔬
engineering airflow and dbt
visualization Metabase 📈

We use python 3.9 syntax throughout those use cases (including new dictionary update)

‍

pandas

Most database connection engine allows binding parameters:

Object Relationship Mapping tools — ORM (think sqlalchemy)
SQL tools (think DBeaver)

The syntax are defined in PEP249 and are driver dependent: for psycopg2 use the %(parameter)s syntax

‍

DBT

You can mix it with configuration, references, and variables in customer_transaction.sql

Set variables in dbt_project.yml configuration file as such:

‍

wAirflow

You can use a sql operator such as PostgresOperator in conjunction with:

a SQL file dags/bi/customer_transaction/transform.sql
a parameters file dags/bi/customer_transaction/parameters.py
a file loader helper dags/load_file.py
a DAG definition file dags/customers.py

Define the source.sql file with parameters placeholders

‍

Define the parameters.py file with constant values:

Add the file loader load_file.py

You can feed parameters at task invoke the DAG customers.py

See documentation

‍

Metabase

Metabase allows you to use variables in queries using {{ }}

‍

With the following parameters:

status as Text
count_low, count_high and period_days as Number

‍

Finding Data

Writing and maintaining 🔧⠀complex and efficient SQL queries requires experience and skills. It involves wrapping your mind around growing data assets. You might want to leverage🕵🏽‍♀️⠀the most of your data knowledge.

I am a co-founder at Castor, a data discovery platform designed to help anyone find, understand and use data assets across the whole company.

We developed a SQL query history feature to help data people find and share queries. Check it out.

‍

Subscribe to the Castor Blog

New Release

Table of Contents

Why Look for Atlan Alternative?

Resources

Louise de Leyritz

August 31, 2021

SQL Editor Benchmark and Market Analysis

Discover a comprehensive SQL editor benchmark and market analysis to help you choose the best solution for your data management needs.

Learn more

Xavier de Boisredon

October 7, 2021

What is Data Governance and Privacy?

Explore the world of data governance and privacy, and learn how CastorDoc helps organizations maintain data integrity and comply with regulations.

Learn more

Louise de Leyritz

January 27, 2023

Data Sharing Challenges: Privacy & Security Concerns

Overcome data sharing challenges with CastorDoc's innovative solutions, fostering collaboration and data-driven decision-making in your organization.

Learn more

Get in Touch to Learn More

See Why Users Love Coalesce Catalog

Fantastic tool for data discovery and documentation

“[I like] The easy to use interface and the speed of finding the relevant assets that you're looking for in your database. I also really enjoy the score given to each table, [which] lets you prioritize the results of your queries by how often certain data is used.” - Michal P., Head of Data

Hell game of literals in BI queries

Use Case

Need

First approach

Introducing tricks

Real-world usage

pandas

DBT

wAirflow

Metabase

Finding Data

Subscribe to the Castor Blog

You might also like

Get in Touch to Learn More