Catalog of Data Modeling Tools

By Louise de Leyritz from CastorDoc

small circle patternsmall circle pattern

The raw data collected by companies is usually messy and unusable for data analysis. Data has to be transformed, so it can be made conducive to value-generating data analysis.

This explains the recent explosion of data transformation tools (internal, open-source, and SaaS). This new trend is not going to stop, and we'd rather bring visibility and structure soon.

At CastorDoc, we believe the first step to structure the data transformation tools market, is more transparency. For that reason, we put up a list of all the data modeling tools we heard of.

This list is still exploratory, may contain errors, or lacking information. Please reach out to us, if you notice anything wrong: louise@castordoc.com

In-depth analysis and evolution

Read the full breakdown by generation and market analysis of data transformation here

Deeper dive into SQL Editors

What does each column in the benchmark below mean?

Deployment: Is the tool SaaS or open-source?

Classification: Is the tool exclusively used for transforming data (such as dbt) or is the transformation part of a larger offering? For example, ETL tools transform data, but they also take care of the extract and loading steps.

Security: This criteria notes whether the solution is compliant with any specific regulatory law like GDPR, HIPAA, etc.

Language: What is the scripting language used for data transformations? Scala, Python, SQL? Is the solution no-code?

Community: Is there a community built around the solution? Communities tend to be especially important with open-source tools, as they provide a great amount of support.

Data sources supported: Where are the transformations operated with the solution? Does it support transformations in data warehouses? Databases?

Add data quality checks: test data quality with assertions checks for uniqueness or null values, or write a custom assertion in SQL to check any property of your data.

version control: You can easily track changes and restore version histories of datasets.

Real-time query validation:  solution validates compiled queries against BigQuery in real-time, enabling users to identify issues before running queries.

Real-time data transformation: Run SQL search, aggregations and joins just as data is generated.

Additional comparisons and benchmark resources