“For the proof of concept, we tested the tools with all profiles on the data team: data scientists, data analysts, product managers, engineers, and business intelligence.” Filipe Palma, Data Platform Product manager, Printify.
Companies are leveraging data catalogs to better organize and manage their data assets so they can become more efficient and increase their competitive edge.
However, before companies can take advantage of the benefits of data catalogs, they must first go through a proof of concept (POC) process. A data catalog POC can help organizations understand the value a data catalog can provide and set them on the path to success.
From understanding the stakeholders involved to developing a timeline and budget, this article will provide the guidance needed to plan a successful proof of concept.
Before looking for any kind of tooling, you need to identify and quantify the challenges you are facing. Are you facing a behavioral problem or an issue linked to tooling? Is it a data team problem, or an organizational problem?
If you’re finding it hard to pin down the exact challenge you are facing, start by identifying the top three things that are causing data initiatives to fail in your organization. The best way to identify these is to conduct surveys and interviews across the organization. Usually, people in the trenches know exactly why their data projects fail.
Planning organizational requirements will allow you to pinpoint exactly what you are looking for in a data catalog. This step will come in extremely valuable when evaluating tools later in the process.
The most common challenges include the following:
Budget has to come early in the process of selecting a data catalog. There is a simple reason for that. If there is no budget or no sufficient budget, there’s no point wasting your time trying out different solutions that you won't be able to pay for. Make sure you have executive support before starting your data catalog adventure.
If you’re asked to prepare a budget for a data catalog, the way to go is to quantify your pain. If you’re looking for a data catalog, it usually means you are already dealing with internal inefficiencies you’d like to remove. Quantify these inefficiencies. How much do they cost you exactly? Putting a price on the challenges you’re facing will help you define how much you’re willing to pay to get rid of these inefficiencies.
We’ve broken down the ROI of implementing a data catalog by quantifying the yearly cost of major inefficiencies linked to Onboarding, Data Discovery, and Infrastructure costs (in this case, for a 10-person team). You can find the exact breakdown and justifications for such results in this article, or download our ROI calculator to personalize these metrics to your team.
Before engaging in any sort of effort, make sure you have a sponsor. A sponsor, or champion, is the person leading the effort for implementing a data catalog in your organization. If you’re reading this article, the sponsor might very well be you.
Securing a sponsor is essential for success. The sponsor must have both the authority and the necessary resources in order to move the data catalog project forward.
The sponsor will be the person responsible for engaging vendors, taking demos, and requesting a Request for Information (RFI). It will also be the person responsible for guiding the implementation process.
In addition, the sponsor should be involved in the development of the data catalog project plan and in setting the overall project goals.
The sponsor should be an individual with an understanding of the organizational data strategy and objectives. They should be able to ensure that the data catalog implementation aligns with the data strategy.
Finally, the sponsor should be in a position to advocate for the data catalog implementation and have the support of key stakeholders within the organization.
Having a sponsor ensures that someone is putting time, energy, and focus into the data catalog project. Without focus, the project is unlikely to be brought to the finish line.
Choosing the right tool requires trying the right tool in the first place. There are a lot of data catalogs out there, and there are probably more that will be created in the coming years. It’s not surprising; the need for data catalogs keeps growing as more organizations are making data-driven decisions.
There are a lot of tools, offering a lot of different capabilities, which means you will have to do some research to decide which tools you want to try out.
Filipe Palma, Data Platform Product manager at Printify revealed he searched through Medium and Reddit to gather best practices from other companies before choosing a tool.
At Castor, we put together a benchmark of all the data catalog solutions available out there. This comes in handy when mapping the different solutions.
Important: When conducting data catalog research, it’s important to understand exactly where your data lives. This will ensure you can choose the catalog with the right connections for you. You need to make sure you only try catalogs that have connectors to your databases and BI tools, or that have the ability to build one quickly.
Identify the stakeholders, and involve them in the decision process. You need to think about who will be the users of the data catalog. This involves the people both enriching the data catalog and consuming the documentation.
Stakeholders enriching the data catalog can be the data steward, the data team, or the engineering team. The data catalog consumers might be the BI team, the marketing team, or the data team. The catalog users will usually depend on your organization and the use cases you're looking to solve.
Regardless of who these users are, make sure they are involved in the decision process. Your data catalog is worthless without adoption. You need to get the green light from the people you’re expecting to use the tool.
Some data leaders only invite the core data team to the decision table, but not the wider audience. This explains the failure of many data documentation projects. The expected data catalog users might not be happy with the tool chosen by the core data team. When this is the case, the company usually struggles with adoption later in the process.
When thinking of who you should invite to the decision table, think of the three following groups:
Data team: The core data team comprises Data Engineers, Data Scientists, Data Analysts, Analytics Engineers, or anyone that reports to the Head of Data. These both enrich and consume the data catalog.
SQL writers: SQL writers are the core data team, but also people who belong to other departments. This includes sales ops, marketing analysts, or finance analysts. These people report to different departments, but they will also be regular consumers of the data catalog. Make sure they are brought in the decision-making process.
Whole company: The rest of the company might occasionally check the data catalog, in a punctual manner. They might want to check the status of a broken dashboard or understand how different data assets are related. When choosing a data catalog, make you bring someone representative of this group in the room.
Now that you have a sponsor, a budget, a list of tools, and the stakeholders sitting at the table, it’s time to think about the capabilities you’re looking to solve.
Are you looking for a data catalog with data governance capabilities such as access controls and policy management? Do you need a tool that provides excellent search and context around your data? Do you need a data lineage tool that lets you explore the flow of data? Do you need all these capabilities?
Data catalogs typically offer six core capabilities, with some of them being more proficient than others.
Prioritize the capabilities you are looking to solve before trying different tools. This will help you evaluate the tools more accurately, and keep you level-headed when making your choice.
We have pulled together a more detailed checklist of data catalog assessment criteria that you can download here.
Creating a successful POC requires careful planning and consideration of the desired outcome. Think about the desired outcomes and how you want to measure the impact of the POC.
What does success look like? What metrics will determine whether the tool passes the test or not?
Consider the time frame of the POC and what milestones need to be achieved to reach the desired outcome. What data points need to be collected and analyzed in order to measure the success of the POC?
Consider what resources are available to help you track and measure the progress, such as usage analytics and user feedback.
Finally, consider the cost of the POC and what return on investment is expected. All of these considerations will help you create an effective POC and measure the success of the trial.
This step comes down to adding more precision to the previous one. Once you have picked the capabilities you’re trying to solve, you need to put a metric on these to make sure you can measure success.
Your goals might include: improving productivity, reducing onboarding time, eliminating duplicate data, saving storage, or reducing query running time. Regardless of what success looks like, it should be very clear before comparing different tools.
Conducting a data catalog POC requires FOCUS. A lot of the time, teams want to achieve too much during their POC, yet end up not achieving anything at all.
During the POC, you cannot document everything nor fulfill all your use cases. There is usually not enough time or resources for this. For this reason, don’t try to bite off more than you can chew.
Define an objective, a database, a set of users, and a set of use cases. Focus solely on these during the POC. This can mean simply documenting your most popular data assets, for example.
Before starting the data catalog POC, it is also important to set a specific timeline for the project. This timeline should be realistic and achievable, yet ambitious enough to cover the full scope of your use cases.
Having a timeline in place will help you stay on track and ensure that the project is completed in a timely manner. At the end of the timeline, you should have a clear understanding of whether your use cases have been validated or not.
Modern data catalogs take less than 30 min to fully deploy and set up. For this reason, we recommend running a POC for a period of 2-4 weeks. This should be enough time to decide whether a specific tool will bring the value you’re expecting.
If you don’t set yourself a constraint in terms of project and timing, the risk is that you fail to invest the necessary efforts which allow you to judge whether a specific tool is worth it.
For the POC to be successful, you need to create some value in the data catalog software to accurately gauge if people will use it. We recommend conducting the POC using the two following steps:
Creating value in the tool will avoid you from ending up in a situation where a POC failed, but you can’t tell whether it failed because the tools you tried are not suited, or because you haven’t been creating value in the tool.
You’ve gone through all of the previous steps and have found the match made in heaven. You’re ready to implement a data catalog for the long term. However, there is one last thing to think about: is this catalog a good partner for the future?
What relationship will you have with the vendor after completing the purchase? Will you be on your own? If so, does that work for you? This is not a negligible question.
For this reason, ensure you have understood the following:
Adopting a data catalog has long-term benefits, and that makes it an important decision. The planning phase should match the importance level of the decision.
A data catalog POC is a great way to test the waters and determine if a data catalog is a right solution for your organization. By taking the time to plan, research, and engage stakeholders, you can ensure you choose the right solution for your business.
By understanding the potential benefits and limitations of each tool, you can ensure that you are getting the most out of your investment.
There are a lot of data catalog tools out there, focusing on different verticals and solving different pain points. Understanding your challenges is key, as it will help you choose the tool that best solves your specific needs.
We write about all the processes involved when leveraging data assets: from the modern data stack to data teams composition, to data governance. Our blog covers the technical and the less technical aspects of creating tangible value from data.
At Castor, we are building a data documentation tool for the Notion, Figma, Slack generation.
Or data-wise for the Fivetran, Looker, Snowflake, DBT aficionados. We designed our catalog software to be easy to use, delightful and friendly.
Want to check it out? Reach out to us and we will show you a demo.