For the last twelve months we've been building Castor, and yes, according to industry standards, it's a data catalog.
Here is the problem, too many people believe that the end goal of a data catalog is to document nicely your data assets. Basically, you plug a data catalog only after you've cleaned your warehouse and defined your data KPIs. Like the icing on the cake.
But no, we have a different vision. To bring visibility to the internet, we didn't organize it in clean folders. We plugged Google on top. If your data warehouse is messy, if it takes time to find the relevant data, if you have trouble trusting your data, don't spend weeks cleaning it, plug a search engine. Castor is a powerful search engine meant to help you find and trust data assets.
At Castor, our end goal is to make data users more efficient in answering questions.
Thanks to Castor, they find, understand, and use their assets faster. No matter how messy your warehouse is, even if you do not have any documentation yet.
The best metaphor I've found so far deals with hiking in a forest. Without Castor, any analyst newcomer gets dumped in the forest. If staffing is ok at that time, an experienced analyst comes along for a few hours to give him a quick tour, show where water can be found, where the grizzly sleeps... Then the buddy leaves and the new analyst is left alone, tasked to do the job on his own. Yes, from time to time a question can be asked around, but that depends on the team availability.
And with Castor? Your new analyst gets a fancy map of the forest, showing points of interest and paths frequently used. Oh, and this map is automatically updated. To add two metaphors on top of the first one, it's kind of like the Marauder's Map in Harry Potter, or the Age of Empires map when using cheat codes (I swear these are the last metaphors of this article).
So this map is amazing if you're in the darkest forest ever (like Fangorn's forest, oopsy, another super geeky metaphor) but it's also useful in our beautiful city of Paris in France.
How does that translate in the data world? Key features such as:
That the messier your data are, the more value Castor can bring. Castor is a data exploration/discovery tool (leveraging data cataloging features, yes). If it is a mess, use cheat codes now, clean later
We have some useful features to help Castor admins make their warehouse cleaner. The main one being that content is always prioritised by popularity. It puts the focus on popular content, so that documentation effort can be aligned with content popularity.
A reflex we've seen a lot is to document source tables, even if these are never used directly. We advice our clients to start with the top 10 most popular tables, listed in Castor. Of course, these tables, thanks to our very own SQL parser, are already enriched with Lineage information and Query history.
Also, at first, some of our clients only wanted to show their neatest schemas in Castor, well structured and approved, and hide the ugly ones, not considering how much these were used.
After a few weeks working together, another strategy emerged: we added back all schemas in Castor. Our clients tagged their officially approved tables and dashboards. Finally, they added redirects from soon-to-be-depreciated popular ones to their new counterparts. Castor played here again its "map" role. This pattern is even stronger when clients are doing a data migration to a new warehouse.
To put things simply: make your users more efficient, on your brand new well documented dbt models but also on that old production database that you never want to hear about.
I love hearing that sentence "oh, we're in the middle of a migration from Redshift to Snowflake, we'll plug Castor when we're done". Why do I love that sentence? Simply I know by heart the arguments to plug Castor as soon as possible.
Remember the paragraph above? About "The messier your data... The more value". Could a warehouse be messier than during data migration? We typically see clients use Castor to map new & old tables, their users can see all old and new content in the same place, with links between them. (note: a hidden reference to the best TV show ever is hidden in this paragraph)
Are you laying down the foundation of a modern data stack in a company starting its data journey? Lucky you!! These are amazing times indeed. Plug Castor, now. Why? Because the sooner you enable exploration and documentation, the less work and hassle it will be. It's super hard to climb that mountain when it's 8000 high...
Enough of this self-promoting talk, I think you got it now 😉