In my previous article, I discussed the topic of data sharing which is already a well-established concept. Data sharing refers to the practice of opening data access to all departments, to empower each department to make data-driven decisions.
It is still too common for companies to jump into data-sharing initiatives without a proper plan, believing that simply granting increased access rights to business departments will be sufficient. This approach is misguided. In reality, data sharing is a complex undertaking that requires thoughtful planning and execution in order to be successful.
We propose three immutable guidelines to ensure the success of your data-sharing initiative
- Thou shalt not compromise on data quality
- Thou shalt enrich the data with bountiful context
- Thou shalt provide the right interface for exploring the data
Data quality is the backbone of data sharing - it's a non-negotiable prerequisite. Without it, the data team can't do their job and no one else can access the data. Let's face it, if the data team can't use the data, why even bother sharing it with others? The second and third rules are all about getting that good quality data to the business teams. You can find a visual representation below.
Disregarding any of these rules will inevitably lead to failure, which we’d ideally like to avoid. Let's delve deeper into each.
The foundation of successful data sharing is to maintain the quality of the data you share with business units.
Data sharing is about equipping business units with the ability to make data-driven decisions. For this to happen, you must provide them with top-notch data.
When you share flawed data, people obviously make poor decisions. This can lead to significant financial losses, missed opportunities, and damage to your company's reputation. More importantly, this can erode trust in your data and lead to general disinterest in data. If the plan is not to share first-rate data, then don't share data at all. Data sharing is an all-in or all-out effort. If not executed properly, it can be detrimental to your organization.
Data quality is the umbrella term encompassing all the factors influencing whether data can be relied upon for its intended use. There are several characteristics that define high-quality data, including but not limited to:
- Accuracy: The degree to which data correctly describes the real-world phenomenon it represents.
- Completeness: The data is complete and contains all the necessary information.
- Consistency: The data is consistent across different sources and platforms.
- Reliability: The data is up-to-date and relevant to the intended use case.
- Usability: The ease with which data can be understood and used by intended audiences to make informed decisions.
You can find more data quality metrics in Kevin Hu’s article about the topic.
When you share data with these attributes, you’re increasing the odds of improved decision-making and efficiency. But that’s not all there is to data quality.
A good way of making sure your data meets the right quality standards is to implement data contracts.
Data contracts are an important component of any data democratization initiative. The data community has a love-and-hate relationship with data contracts. But we think they are worth mentioning in a data-sharing conversation.
Data contracts are agreements between data producers and data consumers that outline the specific terms and conditions for sharing and using data. They can play an important role in ensuring data quality by setting clear expectations and guidelines for how the data should be handled.
For example, a data contract might specify that the data must respect some format, constraints, and semantic meanings before it is shared, or it might include clauses that require data to be regularly audited for quality.
Data contracts might include information such as:
- What data is being collected
- How often and how the data is being ingested
- Who owns and is responsible for the data (individual or team)
- Who has access to the data and at what level
- Security and governance measures, such as anonymization
Overall, data contracts can provide a framework for ensuring data quality by setting clear guidelines and expectations for how data should be handled and maintained. can help ensure that all parties involved are held accountable for maintaining the quality of the data. This way, data contracts can prevent flawed data from landing in the hands of operational teams.
Maintaining a high level of data quality is important, but it alone is not sufficient. The next step is to ensure that context is also provided.
Context is the second key to effectively implementing data sharing. Data without context is dangerous and worthless because it is left open to interpretation by various teams.
Let me tell you, this is not a safe bet. Different interpretations mean different conclusions, and ultimately mean incoherent reporting across departments. If you’re going to lead business teams in uncharted territory, give them a map. Context is the map.
People understand a dataset when they are aware of the needs this data will satisfy, its content, and its location. Once people find the relevant dataset, they did 10% of the job. They then need to go through a checklist of 10+ questions to make sure they understand what data they’re using. People understand the data only when they can answer the following questions:
- Where does the data come from?
- Where does it flow and which tables does it feed downstream?
- Who owns it / who is responsible for it?
- What is the meaning of a given field in my domain?
- Why does it matter?
- When was the last time this table was updated?
- What are the upstream and downstream dependencies of this data?
- Is this production-quality data?
Context starts with documentation. All the shared data assets need to be documented for stakeholders to understand them. In practice, this means curating your data assets with column definitions, tags, owners, etc. When you document your data properly, people know where to find it and how to use it without having to reach out to someone else in the company.
The second aspect of providing context is to have a robust data lineage capability. Data lineage is an extremely powerful transparency device. It enables people to understand how data assets are related. If something breaks upstream, data lineage allows everyone to understand what the consequences will be downstream, avoiding unpleasant surprises. Lineage can also assist stakeholders in identifying the source of data problems when they arise.
The third aspect of providing context is facilitating social discovery among stakeholders. This can be achieved by sharing information about how the data is being utilized. When people can see how their peers are using and querying the data, they are able to start with a stronger foundation and can learn from the insights and strategies of their colleagues. Social discovery allows teams to build on one another's knowledge and thus work more efficiently.
If you’re going to share data with anyone, you have to do it through the right interface. Not all team members have the same level of technical expertise and not all teams have the same data needs. It is essential to provide the right interface for the right team in order to make data accessible to all.
If you are documenting your data in dbt, you cannot expect the marketing team to fetch the documentation there. Context should be made available in tools that are user-friendly for business teams. There are two ways to go about this:
- One way to achieve this is by offering a tool that enables efficient search and navigation. The tool should be easy to use and understand, to ensure that non-technical team members are able to use it effectively. A data catalog is an example of such a tool that can be used to discover, understand and access data easily.
- Another approach to providing the right interface is by making data easily accessible within the tools that business teams already use. This approach involves delivering the data to the tools that are already familiar to the teams. Reverse ETL tools can be used for this purpose. By making the data findable within existing tools, teams can access the data they need without having to navigate new systems or learn new software.
No matter your approach, keep in mind that if you want to make your data available to all, you must cater to the business team's needs. Demanding they learn the technical team's tools and processes will only hinder your efforts.
Providing the right interface is crucial for democratizing data and making it accessible to all team members. It is important to consider the technical expertise and data needs of different teams when deciding on the right interface. By providing an easy-to-use tool or shipping data to existing tools, teams can access the data they need to make informed decisions and drive results.
In conclusion, data sharing is a powerful tool for driving data-driven decisions and fostering collaboration across departments.
But is a complex undertaking that requires thoughtful planning and execution in order to be successful.
We propose three immutable rules to ensure the success of your data-sharing initiative: 1) Maintaining data quality, 2) Providing rich context around the data, and 3) Providing the right interface for exploring the data.
Of course, data sharing involves privacy and security concerns that I have failed to mention in this article. My next article will be entirely dedicated to this topic!
Subscribe to the Castor blog
We write about all the processes involved when leveraging data assets: from the modern data stack to data teams composition, to data governance. Our blog covers the technical and the less technical aspects of creating tangible value from data.
At Castor, we are building a data documentation tool for the Notion, Figma, Slack generation.
Or data-wise for the Fivetran, Looker, Snowflake, DBT aficionados. We designed our catalog software to be easy to use, delightful and friendly.
Want to check it out? Reach out to us and we will show you a demo.
You might also like
Overcome data sharing challenges with CastorDoc's innovative solutions, fostering collaboration and data-driven decision-making in your organization.
Uncover the top 5 benefits of data sharing and how CastorDoc can facilitate collaboration and data-driven decision-making across your organization.
Fantastic tool for data discovery and documentation
“[I like] The easy to use interface and the speed of finding the relevant assets that you're looking for in your database. I also really enjoy the score given to each table, [which] lets you prioritize the results of your queries by how often certain data is used.”
Michal, Head of Data, Printify