The Ultimate Guide to Evaluating a Data Catalog

BONUS - You'll find a RFI Template for Data Catalogs

The Ultimate Guide to Evaluating a Data Catalog

Download Data Catalog RFI/RFP here

Data catalogs were introduced to help data people find and understand data. Before data catalogs existed, data engineers, data analysts, and data scientists worked blind, deprived of visibility into data sets,  content, their quality or their usefulness. Consequently, they spent most their time trying to locate and understand data, often recreating data sets that already existed. These are the types of issues that data catalogs can help address.

Data catalogs began with the modest aim of managing data inventory and improving data discovery. Soon enough, they grew in functionality, popularity and importance. Modern data catalogs have considerably expanded their reach, and are now central to data stewardship and data governance. Data team leaders view data catalogs as strategically important and key drivers of analytic quality and data teams' productivity.

The thing is, the selection of data cataloging tools has grown exponentially in recent years and there is now a myriad of data cataloging tools to choose from. Which one is right for you? That's what we will help you uncover today.

I - What is a data catalog?

Gartner, a specialized research business, defines the notion of data catalog as follows:

“A data catalog creates and maintains an inventory of data assets through the discovery, description, and organization of distributed datasets. The data catalog provides context to enable data stewards, data/business analysts, data engineers, data scientists and other data consumers to find and understand relevant datasets for the purpose of extracting business value”

Gartner, Augmented Data Catalogs 2019

II- Choosing the right data catalog for you

1. Which features do you need?

The first step to choosing a data catalog is to understand your exact need for a data catalog. As we mentioned already, data catalog vendors have multiplied in the past years, and they cater to different needs. Are you looking for a data governance tool? A pure data discovery tool? You need to define exactly what you're looking for before going on a data catalog quest. To this end, you should start by identifying your pain points, and then find which data catalog addresses them. The first exercise is thus to identify the top challenges that affect your team's productivity and to map them to data catalog features. To facilitate the task, we've done the mapping. Based on what bothers you, we'll tell you which data catalog features you should prioritize. In this exercise, it's important that you get your team to speak. If you're leading a data team, make sure you understand what is blocking the team members from their daily work and longer-term goals. They might have different pain points affecting their productivity. You want to make sure you pick a catalog which alleviates their frustrations and allows them to fulfill their responsibilities.

The pain points associated with data catalog features

Now that you have a clearer idea of the features you're interested in, rank them in order of priority.

2. Is your team going to use it?

You've now established which features you need in a data catalog, and you're ready to scan the market to find your ideal solution. Wait a second, we're not done yet. There are other considerations you should take into account. Namely, think about what would make your team use the data catalog. In fact, the whole value of a data catalog resides in its usage. When people use the data catalog, documentation levels increase, quality of data assets improve, and more people benefit from the data catalog. On the contrary, if there's a poor user experience, this can easily turn into a vicious circle where no one adopts the catalog. In this case, not only does that mean poor quality data assets, but also it means you've wasted your investment in a data catalog. So when you contract with a data catalog vendor, you want to make sure your team actually likes the tool and plans to use it. We thus propose to look at the following four variables when evaluating a data catalog.

Driving adoption for your catalog - Image courtesy of CastorDoc

3) Understand the data catalog ecosystem

Once you have clearly defined what you're looking for in a data catalog, it's time to find your perfect match. This is no easy task, as there are a plethora of options to choose from. We've attempted to untangle the data catalog ecosystem to help you find the perfect fit. We found that data catalogs can be divided into three generations:

  • 1st generation: Basic software, similar to Excel, that syncs with your data warehouse.
  • 2nd generation: Software designed to help the data steward in maintaining data documentation (metadata), lineage, and treatments.
  • 3rd generation: Software designed to deliver business value to end-users automatically within hours after the deployment. It then guides users to document in a collaborative painless way.

Here is a brief listing of the pros and cons of each option.

Each generation has its own specificities. Go for the one that fits your data stack - Image from CastorDoc

Data catalog landscape

Below, you will find a data catalog landscape, which can hopefully help you choose a metadata management tool adapted to your needs.

*This is a brief attempt at classifying the tools on the market. If anything seems wrong, or if you don't see your data catalog and want to have it placed, feel free to reach out.

Data Catalog landscape - Image from CastorDoc

If you want to know more about vendors, their offerings, and the data catalog ecosystem , you will find our data catalog benchmark here.

4) Take demos from selected vendors.

You have now selected a few catalogs that seem to match your pre-defined criteria and answer your business needs. It's time for the next step: take a demo.

If you sit as a passive viewer during the demo, you're unlikely to get much value out of it. You should be participating actively and leave with a clear idea of how the data catalog software will help address your specific needs.

We encourage you to plan for the key topics you want to cover and share the features that matter to you the most to the vendors in advance. This will ensure a much more tailored experience.

We thus propose setting the following agenda beforehand covering the following topics:

Cost of ownership

Price is obviously a concern when choosing a catalog software. However, price often involves more than the price declared by the vendor. Total cost of ownership involves how much the software costs to purchase, implement and maintain.

Purchasing: Ensure you have understood what's comprised in every pricing tier. Inquire about potential additional purchase charges, such as extra users.

Implementation: Inquire about implementation costs, as it can make a significant difference. For example, choosing an open source data cataloging solution will save you from purchasing cost, but will lead to important implementation costs.

Maintenance: Make sure you understand clearly what the vendor charges post purchase, such as updates. Even without updates, the software might be expensive to maintain. For example, legacy data catalogs (1st generation) often require a full time engineering team to maintain the tool. Ensure that you factor in these additional costs within the total cost of ownership.

Vendor support

What relationship will you have with the vendor after completing the purchase? Will you be on your own? If so, does that work for you? This is not a negligible question. A lot of Tesla owners love their car but have encountered frustration due to bad customer service experience and bitterly regret their purchase choice. For this reason, ensure you have understood the following:

  1. Training conditions: How is your team going to learn how to use the catalog? Is training included for all users? If not, does it entail additional costs? Make sure you have cleared out the path regarding onboarding matters.
  2. Support: Ensure that you've understood different levels of customer service (phone, email, chat) and their costs. Be sure to leave with a sense of the service logistics, such as whether customer service available 24/7 or only during certain hours.

Data and privacy

Companies can lose serious amount of money and customer trust following data security breaches. Be sure to understand exactly what data the vendor has access to, the kind of security the vendor uses for its databases, and what processes they've got in place to keep your information safe.

We also advise that you attend the demo with stakeholders from different teams. This will allow you to gather the most comprehensive feedback, and thus choose the right tool that suits all kinds of users. Finally, ensure that the data catalog is compatible with your current data infrastructure as well as well as with your vision and roadmap for the next 1-5 years.

We have also pulled together a more detailed checklist of data catalog assessment criteria that you can use during demos here:

5) More on what is a Data Catalog

A cloud data catalog connects to the cloud data warehouse and the cloud business intelligence sources. It helps an organization index all the metadata from various sources into a search engine. This enables users to view, write and read documentation from the data source to learn what exists in the cloud data warehouse and BI tools. The technical capabilities of a data catalog are:

  • Understand how to use technical assets for non-technical people thanks to the query history
  • View the technical dependence of a data asset through the lineage reports and service
  • Access the knowledge base where KPI (key performance index) and analytics metrics are defined
  • Provide support to the data users across the organization on cloud data infrastructure
  • Report to the head of data and data managers on data-driven decision making and insights
  • Report and read which data products are used, for which use-cases.
  • Improve cloud data discovery in enterprise organization to learn which technical analysis and report users can find
  • Facilitates data governance and metadata management

About us

We write about all the processes involved when leveraging data assets: from the modern data stack to data teams composition, to data governance. Our blog covers the technical and the less technical aspects of creating tangible value from data.

At Castor, we are building a data documentation tool for the Notion, Figma, Slack generation. We designed our catalog software to be easy to use, delightful and friendly.

Want to check it out? Try CastorDoc for free with a 14 day demo.

Subscribe to the Castor Blog

New Release
Share

Get in Touch to Learn More

See Why Users Love CastorDoc
Fantastic tool for data discovery and documentation

“[I like] The easy to use interface and the speed of finding the relevant assets that you're looking for in your database. I also really enjoy the score given to each table, [which] lets you prioritize the results of your queries by how often certain data is used.” - Michal P., Head of Data