What if all the data knowledge held by stakeholders in your company could be centralized in one spot? And what if the departure of data experts didn't mean the downfall of the company's knowledge? Thankfully, this is not a far-fetched dream.
Despite the saturation of the data catalog landscape, data documentation continues to suffer from two critical issues: it's not maintained, and it's not accessible.
Not maintained: As data teams scale, the amount of data often becomes too overwhelming to maintain documentation in a traditional way, which means having one person or one team being responsible for the documentation.
Not accessible: In most organizations, documentation is either spread across tools and spreadsheets, or rotting in a static data catalog that no one uses.
The concept of collective intelligence in data documentation involves two distinct but interrelated processes: leveraging everyone’s knowledge to curate the documentation with crowdsourced insights, and then sharing the documentation in the right places so that people can benefit from this collected company knowledge.
I - Maintaining the documentation
In the first phase, you can leverage collective intelligence to build and maintain the documentation. This involves capturing everyone’s share of company knowledge and basically encouraging more people to get involved in the documentation process. When all team members can contribute to documentation efforts, they tend to feel more engaged in the process and therefore are more likely to adopt documentation practices.
To leverage collective intelligence to curate documentation, there are two things successful data teams do:
- Set the right framework for crowdsourcing the documentation.
- Make documenting more engaging.
We’ll examine these two solutions in more details in this section.
"Documentation should be crowdsourced as much as possible, just like Wikipedia.” Osian Llwyd Jones, Head of Product, Data Platform, Stuart.
As data scales, it becomes impossible for a single person or team to handle documentation. To address this, documentation should be crowdsourced by allowing people who are close to the data to write the documentation.
To accomplish this, you need to have the right structure in place. Wikipedia provides a platform through which people can contribute token of knowledge. If the platform wasn’t there, it would have been impossible to crowdsource any knowledge.
It goes the same way for organizations. If you don’t have the right framework and platform, people will not contribute to your documentation effort. So, what does this structure look like in practice?
In its most basic form, it looks like assigning ownership of data assets, where a specific person or team is responsible for maintaining the documentation for a given set of tables.
Ownership is extremely powerful, for two reasons:
- It encourages teams that are close to the data to contribute their knowledge, which improves documentation quality and coverage.
- It allows stakeholders to identify the go-to person for a particular dataset. This prevents people from pinging 20 different colleagues when they’re looking for information.
The following example demonstrates that Vivien has ownership of the tables "accounts", "api_tokens", "column_description_suggestions", and "columns_joins". As the owner, he is accountable for maintaining the documentation of these tables and is considered the expert in these areas. At Castor, we all know that any inquiries regarding these tables should be directed to him.
Making documentation engaging
“Documenting data can be a tedious and unappealing task” Osian Llwyd Jones, Head of Product, Data Platform, Stuart.
Documentation is not fun. People generally hate it. If you want more people to join the documentation dance, you need to make a little marketing effort. It’s your job to make it engaging and fun.
For example, at Castor, we felt that our documentation effort could benefit from a little push. We organized a one-hour data documentation contest where team members were tasked with documenting as many tables as possible. The result was astounding, and everyone had a blast participating in the contest.
II - Making documentation accessible
Once the documentation is done, it's time to share it with others. This allows them to benefit from the collective knowledge and insights that have gone into building the documentation.
When sharing documentation, keep in mind that there are two primary type of documentation users: information seekers and explorers.
The first type is on a quest for specific information, while the second prefer browsing through the documentation to get an overview of the data landscape, or a good starting place of data analysis.
To allow both of these groups to tap into the collective intelligence of the organization, documentation should be like an octopus: centralized at its core but with tentacles extending out to all corners, making it accessible and available everywhere.
Just like an octopus needs its head, your organization needs a centralized repository for data documentation. A headquarters to discover, organize and add clarity to your data. This repository should be accessible, and provide an interface that everyone can navigate through. The centralized place is great for explorers, who want to browse the documentation to get a better understanding of the data landscape, or who want to tap into the existing knowledge when starting on an analysis.
There are two key benefits to having a centralized repository of data documentation:
Efficiency: When the documentation is scattered across different tools like Confluence, Google Docs, and Slack, people end up wasting time searching for what they need, and sometimes they just give up altogether. But when everything is in one centralized location, like Castor, everyone knows for sure where to find what they need. This way, no one has to waste time hunting for the documentation they need, and they can get on with their work.
Information re-use: By having documentation available in a centralized repository, stakeholders have access to the analysis that has been shared around specific areas. This allows for information re-use, such as popular queries, which provides a great starting point for analysis. Essentially, having documentation in one central location makes it easier to find and use important information, improving collaboration and the quality of work.
With Castor, building a centralized repository of data becomes effortless. It works by gathering documentation from all its scattered locations and storing it in a centralized, easily accessible and user-friendly tool.
Pushed back everywhere
Documentation, like an octopus, also needs its tentacles. While a centralized place is important, it's also crucial for documentation to be easily accessible in tools that stakeholders use daily.
If information seekers don't want to leave their native tool to access the information they need, they shouldn't have to. Documentation should be readily available within their reach. This local documentation is particularly useful for stakeholders who only need quick and precise pieces of information.
For instance, consider a scenario where a sales operations team is creating a dashboard in Looker and needs a quick explanation for a table field. They should be able to easily access the documentation within Looker, without having to leave the platform to search for it elsewhere. If they have to navigate to other tools to find the information, it creates a suboptimal data experience.
To leverage the collective intelligence of a company, documentation should be synced back to the tools used by stakeholders. This means that sales ops working in Looker, for example, should have access to the documentation created by an analytics engineer in dbt without having to switch between different tools. This approach ensures that everyone can tap into the collective knowledge of the company.
At Castor, we believe in the importance of syncing back documentation. That's why we've designed a data catalog with tentacles that not only pulls documentation from various sources into a centralized location but also syncs it back to all relevant tools. This allows stakeholders to access documentation from their native environment.
Data catalogs have been around for years, but they are still suffering from two issues:
Catalogs are not maintained because documentation is a painful task, and the documentation is not accessible and usually lives in siloed in tools.
To overcome these challenges, the key is to start leveraging collective intelligence in organizations, and partner with tools that help you do so.
To solve the maintenance problem, it is important to establish a proper structure for crowdsourcing documentation. This involves assigning ownership of data assets, and making documentation a more attractive task.
To solve the accessibility problem, you need to push back the documentation in all the tools, while providing a centralized repository of knowledge.
Subscribe to the newsletter
We write about all the processes involved when leveraging data assets: from the modern data stack to data teams composition, to data governance. Our blog covers the technical and the less technical aspects of creating tangible value from data.
At Castor, we are building a data documentation tool for the Notion, Figma, Slack generation.
Or data-wise for the Fivetran, Looker, Snowflake, DBT aficionados. We designed our catalog software to be easy to use, delightful and friendly.
Want to check it out? Reach out to us and we will show you a demo.
You might also like
Learn about the three faces of documentation and how CastorDoc helps organizations manage documentation effectively for better data insights.
Discover CastorDoc's comprehensive documentation framework, ensuring efficient data management and collaboration within your organization.
“[I like] The easy to use interface and the speed of finding the relevant assets that you're looking for in your database. I also really enjoy the score given to each table, [which] lets you prioritize the results of your queries by how often certain data is used.” - Michal P., Head of Data