Why Castor x dbt makes sense?
dbt enables data analysts and engineers to transform data in their warehouse using SQL. It offers the first bricks of data documentation: lineage, dbt docs and quality tests. Yet, there are several short-comings when it comes to build a great documentation experience:
- Search: dbt search is not optimized for discovery. It is hard to find the relevant assets, fast if at all.
- Popularity: dbt ressources are not ranked or organized by popularity. This makes it hard to know what are the most important ressources when your are not familiar with the data infrastructure.
- Democratization: dbt interface, even dbt docs, is too technical for business users. Although they did an awesome job at democratizing data engineering, the marketing analyst or the legal team won't feel as comfortable on dbt as they would on Notion.
- Integrations to the rest of the data stack: you don't want analysts to change tools to document and search for data. You want a unified interface gathering all the relevant tools, ranging from ETL to BI tools.
How does it work?
The following information will be extracted from dbt daily and associated with the relevant dataset(s):
- Link to dbt model code
- column and table descriptions (from dbt docs)
- dbt run status
- any downstream/upstream sources
- dataset owner
- dataset last updated time
- dataset created at time
- dbt tests
- dbt run start/finish time
Fantastic tool for data discovery and documentation
“[I like] The easy to use interface and the speed of finding the relevant assets that you're looking for in your database. I also really enjoy the score given to each table, [which] lets you prioritize the results of your queries by how often certain data is used.”
Michal, Head of Data, Printify