Guide to evaluating a data catalog

BONUS - You'll find a RFI Template for Data Catalogs

7 min read

Data catalogs were introduced to help data people find and understand data. Before data catalogs existed, data engineers, data analysts, and data scientists worked blind, deprived of visibility into data sets, their content, their quality or their usefulness. Consequently, they spent most their time trying to locate and understand data, often recreating data sets that already existed. This is the kind of issues that data catalogs seek to address.

Data catalogs began with the modest aim of managing data inventory and improving data discovery. Soon enough, they grew in functionality, popularity and importance. Modern data catalogs have considerably expanded their reach, and are now central to data stewardship and data governance. Data team leaders view data catalogs as strategically important and key drivers of analytic quality and data teams' productivity.

The thing is, the selection of data cataloging tools has grown exponentially in recent years and there is now a myriad of data cataloging tools to choose from. Which one is right for you? That's what we help you uncover today.

I - What is a data catalog?

Gartner, a specialized research business, defines the notion of data catalog as follows:

“A data catalog creates and maintains an inventory of data assets through the discovery, description, and organization of distributed datasets. The data catalog provides context to enable data stewards, data/business analysts, data engineers, data scientists and other data consumers to find and understand relevant datasets for the purpose of extracting business value”

Gartner, Augmented Data Catalogs 2019

II- Which data catalog for you?

1. Which features do you need?

The first step to choosing a data catalog is to understand your exact need for a data catalog. As we mentioned already, data catalog vendors have multiplied in the past years, and they cater to different needs. Are you looking for a data governance tool? A pure data discovery tool? You need to define exactly what you're looking for before going on a data catalog quest. To this end, you should start by identifying your pain points, and then find which data catalog addresses them. The first exercise is thus to identify the top challenges that affect your productivity and to map them to data catalog features. To facilitate the task, we've done the mapping. Tell us what bothers you, we'll tell you which data catalog features you're interested in. In this exercise, it's important that you get your team to speak. If you're leading a data team, make sure you understand what bothers team members. They might have different pain points affecting their productivity. You want to make sure you pick a catalog which alleviates their frustrations and allows them to fulfill their mission.

What are the pain points solved with data catalog features?

Now that you have a clearer idea of the features you're interested in, rank them in order of preference.

2. Is your team going to use it?

You've now established with features you need in a data catalog, and you're ready to scan the market to find your ideal catalog. Wait a second, we're not done yet. There are other considerations you should take into account. Namely, think about what would make your team use the data catalog. In fact, the whole value of a data catalog resides in its usage. When people use the data catalog, documentation levels increase, quality of data assets improve, and more people use the data catalog. On the contrary, this can easily turn into a vicious circle where no one uses the catalog. In this case, not only do have poor quality data assets, but you've wasted your money in a data catalog. So when you contract with a data catalog vendor, you want to make sure your team actually likes the tool and plans to use it. We thus propose to look at the following four variables when evaluating a data catalog.

What are the features that optimizes your data catalog's adoption?

3) Understand the data catalog ecosystem

Once you have clearly defined what you're looking for in a data catalog, it's time to find your perfect match. This is no easy task, as there is a plethora of options to choose from. We've attempted to untangle the data catalog ecosystem to help you find the perfect fit. We found that data catalogs can be divided in three generations:

  • 1st generation: basic software, similar to Excel, that syncs with your data warehouse.
  • 2nd generation: software designed to help the data steward in maintaining data documentation (metadata), lineage, and treatments.
  • 3rd generation: software designed to deliver business value to end-users automatically hours after the deployment. It then guides users to document in a collaborative painless way.

Here is a brief listing of the pros and cons of each option.

Each generation has its own specificities. Go for the one that fits your data stack.

Data catalog landscape

Below, you will find a data catalog landscape, which can hopefully help you choose a metadata management tool adapted to your needs.

*This is a brief attempt at classifying the tools on the market. If anything seems wrong, or if you don't see your data catalog and want to have it placed, feel free to reach out.

Data Catalog Landscape

If you want to know more about vendors, their offerings, and the data catalog ecosystem , you will find our data catalog benchmark here.

4) Take demos from selected vendors.

You have now selected a few catalogs that seem to math your pre-defined criteria and answer your business needs. It's time for the next step: take a demo.

If you sit as a passive viewer during the demo, you're unlikely to get much value out of it. You should be participating actively and leave with a clear idea of how the data catalog software will help address your specific needs.

We encourage you to plan for the key topics you want to cover and share the features that matter to you the most to the vendors in advance. This will ensure a much more tailored experience.

We thus propose setting the following agenda beforehand covering the following topics:

Cost of ownership

Price is obviously a concern when choosing a catalog software. However, price often involves more than the price declared by the vendor. Total cost of ownership involves how much the software costs to purchase, implement and maintain.

Purchasing: Ensure you have understood what's comprised in every pricing tier. Enquire about potential additional purchases charges, such as extra users.

Implementation: Enquire about implementation costs, as it can make a significant difference. For example, choosing an open source data cataloging solution will save you from purchasing cost, but will lead to important implementation costs.

Maintenance: Make sure you understand clearly what the vendor charges post purchases, such as updates. Even without updates, the software might be expensive to maintain. For example, legacy data catalogs (1st generation) often require a full time engineering team to maintain the tool. Ensure that you factor these additional costs within the total cost of ownership.

Vendor support

What relationship will you have with the vendor after completing the purchase? Will you be on your own? If so, does that work for you? This is not a negligible question. A lot of Tesla owners love their car but have encountered such frustration due to bad customer service experience that they bitterly regret their purchase choice. For this reason, ensure you have understood the following:

  1. Training conditions: How is your team going to learn how to use the catalog? Is training included for all users? If not, does it entail additional costs? Make sure you have cleared out the path regarding onboarding matters.
  2. Support: Ensure that you've understood different levels of customer service (phone, email) and their costs. Be sure to leave with a sense of the service logistics, such as whether customer service available 27/7 or only during certain hours.

Data and privacy

Companies can lose serious amount of money and customer trust following data security breaches. Be sure to understand exactly what data the vendor has access to, the kind of security the vendor uses for its databases and what processes he's got in place to keep your information safe.

We also advise you to attend the demo with stakeholders from different teams. This will allow you to gather the most comprehensive feedback, and thus choose the right tool that suits all kinds of users. Finally, ensure that the data catalog is compatible with your current data infrastructure as well as well as with your vision and roadmap for the next 1-5 years.

We have also pulled together a more detailed version of "what to check before/during a data catalog demo", would you be interested.

5) More on what is a Data Catalog

A cloud data catalog connects to the cloud data warehouse and the cloud business intelligence sources. It helps an organization index all the metadata from various sources into a search engine. This enables users to view, write and read documentation from the data source to learn what exists in the cloud data warehouse and BI tools. The technical capabilities of a data catalog are:

  • Understand how to use technical assets for non-technical people thanks to the query history
  • View the technical dependence of a data asset through the lineage reports and service
  • Access the knowledge base where KPI (key performance index) and analytics metrics are defined
  • Provide support to the data users across the organization on cloud data infrastructure
  • Report to the head of data and data managers on data-driven decision making and insights
  • Report and read which data products are used, for which use-cases.
  • Improve cloud data discovery in enterprise organization to learn which technical analysis and report users can find
  • Facilitates data governance and metadata management

Louise de Leyritz

Growth Analyst Intern

Linkedin Profil

More From Castor Blog

Get more value from the data you already have

Start your free 14-day trial now or schedule a product tour.
We have a flexible pricing that works for companies of all sizes.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
logo castor color
Your data has never been so clear and friendly
Linkedin Profil
© 2021 Castor. All registered.