Catalog Reverse ETL

By Louise de Leyritz from CastorDoc

small circle patternsmall circle pattern

As companies engage in operational analytics, operational departments in companies are using data. For this reason, there is a need to distribute data from the warehouse to different cloud applications used by operational teams.

This explains the recent explosion in the past two years of Reverse ETL tools. (internal, open-source, and SaaS). This new trend is not going to stop, and we'd rather bring visibility and structure soon.

At CastorDoc, we believe the first step to structure the Reverse ETL tools market, is more transparency. For that reason, we put up a list of all the Reverse ETL tools we heard of.

This list is still exploratory, may contain errors, or lacking information. Please reach out to us, if you notice anything wrong: louise@castordoc.com

In-depth analysis and evolution

Read the full breakdown by generation and market analysis of Reverse ETL here.

Deeper dive into Reverse ETL tools

What does each column in the benchmark below mean?

Variety of data sources: Can the reverse ETL tool connect to the data warehouse only? Is it possible to connect to other databases? Other sources, such as FTP files, spreadsheets, etc..

Number of data sources: From how many applications can the Reverse ETL tool export data?

Segmentation: With the tool, is it possible to have a source that results from the merging of other sources (SSH file, google sheet)? For example, the source is a table of clients who have spent more than $1000 in the past 15 days. This source has been built by merging two tables and can be displayed in salesforce using the Reverse ETL tool.

Modeling: How is the data warehouse queried using the solution? Is it pure SQL, or does the solution have easy mode/ no-code features such as drag and drop?

Integration with dbt: Is it possible to directly integrate a dbt model with the solution?  

Custom API connector: Is there a possibility to customize the connectors provided with the connection.

Refresh frequency: How often is the data synced from your warehouse to cloud apps? Every hour? Every minute? In real-time? Can the workflow be triggered via API, or when specific events occur/on-demand (ex: every time a dbt model is modified)?

Incremental synchronization: When synchronizing the warehouse with cloud apps, does the solution offer the possibility to only synchronize data that has been modified since the last export, or does it send all the data in the segment each time?

Security: Is your data synced on the solution's server or does it stay in your warehouse? When data stays in your warehouse, the solution is by default secure to many attacks. Is the solution compliant with GDPR, CCPA, HIPPA? Version control is the recording of audit logs of any changes made to your models or sync configurations. Data governance is the possibility to control who has access to certain models.

Observability: Does the solution allow you to see how your syncs are performing at large? Can you easily identify when a sync fails and why? Can you get an alert when a sync fails?

Additional comparisons and benchmark resources