Reverse ETL tools are the new key components of the modern data stack. They are said to be the missing piece to make any company truly data driven. Today, we take on the task of explaining what reverse ETL is, and which tools are available for your organization.
Most companies are sat on a gold mine. The gold is the data. The mine is the data warehouse. The miners exploiting the warehouse are data engineers, data analysts and data scientists. The reality is, organizations with the best miners are often under-exploiting data. Why? Because it could be exploited by the marketing team, the sales team, and other non-data departments. And this can be achieved using reverse ETL tools.
Before talking about reverse ETL, let's go back to the notion of plain, regular ETL.
ETL: Establishing a single source of truth
The standard approach to data integration (ETL) dates back to the 1970's. Simply put, ETL is the process of copying data from a system of record into the data warehouse.
Why for? Well, Companies collect data from disparate sources. Without a unified storage structure, it quickly becomes chaos when it comes to finding the right data asset. For this reason, it is now commonplace for organizations to store data in a data warehouse. This is achieved through the ETL process. Data is first Extracted from first-party databases and third-party sources (tools for sales, marketing & support), Transformed so that data can be used by data analysts and data scientists. It is finally Loaded into a data warehouse, ready to be used by the analytics team.
The idea behind transforming before loading is to reduce the volume of data before it's loaded it in the warehouse. This preserves storage, computation and bandwidth which were scarce and expensive when ETL was first created.
As a result, transformation and extraction are deeply intertwined. ETL thus involves to key challenges for organizations:
- Labor intensiveness and complexity - With ETL, pipelines don't just extract data but also perform complex transformations customised to the specific needs of end users. This involves a considerable amount of custom code.
- Constant maintenance - Because pipelines extract and transform data altogether, they break as soon as data models must be changed. This results in frequent and undesired pipeline outages.
Thankfully, storage and computation costs plummeted by a factor of millions in the past four decades, with bandwidth costs shrinking by a factor of thousands. This has led to the exponential growth of the cloud, the volume and the complexity of data, making transformation prior to loading obsolete.
Data integration technologies were not suffering the storage, computation and bandwidth constraints anymore. This meant organizations could load large bunches of untransformed data in the warehouse without worrying about costs and constraints. And guess what, they did exactly that.
This is why Extract-Load-Transform (ELT) processes have come to replace ETL. The idea here is to Extract data from source systems and directly Load it into the data warehouse without transformations taking place. The Transformation layer is handled in the data warehouse, through tools like dbt.
Regardless of the order in which these tasks are performed, the aim of ETL/ELT is to build a single source of truth for data, so that data teams can analyse their data without writing costly and tedious queries that require joining disparate data sources.
II- reverse ETL: enabling operational analytics
Reverse ETL is the exact inverse process of ETL covered above. Simply put, it's the process of copying data from the data warehouse to Saas products used by organizations.
Why would one want to do this? Well, companies today increasingly engage in operational analytics - an approach consisting in making data accessible to "operational" teams, for operational use cases (sales, marketing, ..). We distinguish it from the more classical approach of using data stored in the warehouse only for reporting and business intelligence. Instead of using data to influence long-term strategy, operational analytics informs strategy for the day-to-day operations of the business. To put it simply, it's putting the company's data to work so everyone in your organization can make smarter decisions.
Let's go back to our gold mine story. The barrier to operational analytics is that the cloud data warehouse is hardly accessible to business teams, as there is no pipeline allowing a flawless distribution of data from the warehouse to different cloud applications used by operational teams. As a result, data stored in the warehouse is used to create dashboard and BI reports, but that's about it. That's where reverse ETL comes into play.
Reverse ETL is the tool responding to the new practice of operational analytics. It can be seen as a bridge between your data warehouse and cloud applications. Reverse ETL tools move the data out of your warehouse to the SaaS products your team loves and uses. Here are a few use cases where Reverse ETL are useful:
- Customer Success Team want to send personalized emails based on usage of the product. The Data Science Team did an awesome customer segmentation based on complex modeling and it changes every week based on usage. You want these segments imported automatically in Intercom.
- Marketing team wants direct access to frequency of clients purchases in its marketing automation solution to create more powerful scenarios? Easy.
- Sales team would like to integrate lead score in the CRM to prioritise leads in Salesforce or Hubspot? That's a done deal with reverse ETL.
In order to make sure business teams use and understand all the data existing in the data warehouse flawlessly. We recommend using a data catalog such as Castor or Collibra (depending on your size). You can find a benchmark of all data catalogs tools here.
Sure, you could easily create BI reports and visualizations using this data in BI tools. But these insights will turn out to be much more powerful if they drive the daily operations of your teams across marketing, sales, finance, etc.
In short, with reverse ETL data stored in the data warehouse irrigates all the departments in the company and nourishes operational systems. It is a key tool allowing functional departments (sales, marketing, etc) to make day-to-day decisions backed by real-time data.
Another important benefit brought about by reverse ETL is data consistency. Although functional teams work different Saas products, they use the same data which brings about data continuity and consistency across the whole business.
Do you need a reverse ETL tool?
The core value of reverse ETL tools is their built-in connectors to operational systems. What if you decided to create your own API connectors between your data warehousing technology and your operational systems (HubSpot, Salesforce, ect.)?
At first sight, it might seem quite feasible. Let's say someone on the sales team wants lead scores from the data warehouse to be displayed in Salesforce. Easy. An engineer will look up Salesforce API, wrap his head around the endpoints at play. He will customise a solution that sends lead scores to a custom field in Salesforce. No big deal.
Well.. what happens when lead scoring changes, or when sales want more informations displayed in Salesforce?
The second the tool used by a specific team changes, the engineering team will have to build and maintain another one-off integration. And that's where challenges start to multiply:
- Creating API connectors manually can take days or weeks, even with a strong engineering team. We also bet your data engineers have better things to do.
- API endpoints are often unable to handle real-time data transfer. You can thus quickly expect to reach a rate limit.
- These connectors have to be constantly maintained by your team, to account for any changes in the underlying technologies on both ends.
Bottomline is, there is really no value add for your data team to be building and maintaining connectors for reverse ETL. Even if it ends up being cheaper than paying for a solution.
Choosing the right reverse ETL tool will get you access to pre-built connections between the different components of your data stack, sparing you the arduous task of building these connectors yourself. Now that you know you need an ETL tool, the question is: which one? We got you here too.
Which reverse ETL tool for your organization?
Below, you will find a Reverse ETL tools landscape, which can hopefully help you choose a Reverse ETL tool adapted to the needs of your company. We have classified the solutions across three dimensions: Whether the solution is SaaS or on-premise, whether it focuses primarily on empowering business or technical users, and according to the number of integrations proposed.
For a full Reverse ETL tool benchmark, and more classification criteria, click here.
More modern data stack benchmarks?
Find more benchmarks and analysis on the modern data stack here. We write about all the processes involved when leveraging data assets: from the modern data stack, to data teams composition, to data governance. Our blog covers the technical and the less technical aspects of creating tangible value from data.
At Castor, we are building a data documentation tool for the Notion, Figma, Slack generation. Or data-wise for the Fivetran, Looker, Snowflake, DBT aficionados. We designed our catalog to be easy to use, delightful and friendly.
Want to check it out? See how CastorDoc works with a free demo.
Subscribe to the Castor Blog
You might also like
Stay up-to-date with the latest best practices in data visualization, analytics, and ETL with CastorDoc's insightful guide.
Explore the process of building a data stack from scratch, covering essential components and considerations for organizations.
Fantastic tool for data discovery and documentation
“[I like] The easy to use interface and the speed of finding the relevant assets that you're looking for in your database. I also really enjoy the score given to each table, [which] lets you prioritize the results of your queries by how often certain data is used.”
Michal, Head of Data, Printify