“Trust is that nebulous thing that is hard to quantify but that everyone wants” - Kyle Kirwan, CEO of Bigeye.
As companies grow, they tend to invest more in democratizing and operationalizing data. More stakeholders are given access to data, and more data projects are created. This has resulted in the data warehouse becoming more chaotic, and generally harder to navigate. As more people are given access to data, the need for reliable information is greater than ever before.
When people don’t trust that they’re getting the right data, they don’t trust any analysis based on this data. When data analysis does not look legitimate, it takes a lot of arguing and convincing before a decision can be reached. As a result, decision-making slows down. It is thus vital that stakeholders trust the data if that data is to create any business value.
You may have heard the saying “Trust is built in drops and lost in buckets”. Unfortunately, it applies to data too. If a person is happy with the data for months, but then the data becomes wrong, trust will be broken and hard to rebuild. Hence the importance of maintaining trust, even when data breaks.
There’s another issue, too. Trust is intangible and almost impossible to quantify. The way out is transparency. Transparency is tangible, measurable, and it is the best stepping stone possible to achieving company-wide data trust.
This article is the output of two conversations we had about trust in the past month. We had the chance to host an in-person discussion with Ethan Aaron (Founder, Portable), Kyle Kirwan (Co-founder, Bigeye), Emily Hopper (Data Scientist, JW Player), and Tristan Mayer (Co-founder, Castor). A week later, we hosted a virtual roundtable with William Angel (Senior Technical Product Manager, Caribou), Nicky Parseghian (Sr. Director, IT Strategy and Services, SADA), and Xavier de Boisredon (Co-founder, Castor) about using transparency as a vehicle for building trust in organizations. These exchanges were incredibly rich and packed with practical insights that I have tried to summarize in this article.
This article is a guide to spotting when trust is eroded in your organization, and to re-building it quickly through practical actions to improve the more actionable concept of transparency.
Let’s dive in.
Ethan Aaron rightfully noted that trust doesn't generate value by itself, but should be seen as a multiplier of what you already have. You have to be already creating value in order for trust to matter. Trust allows you to unlock the value from assets that are already intrinsically valuable, but it doesn’t magically turn worthless data into gold.
Now that we have established this, let’s look at the signs that clearly indicate trust erosion in organizations. There are three clear outward signs that indicate data mistrust: slack pings, dashboard rebuilding, and general disinterest. We’ll examine each in turn.
A sure sign of trust erosion is the number of slack pings linked to data requests that could be answered if more context were provided, Tristan Mayer explained.
When employees don’t trust the data, they use the Data Engineering team as a data catalog, asking all sorts of data-related questions. These questions include, but are not limited to:
If your Data Engineering team relentlessly receives this type of question, it indicates that trust needs to be built or restored. These questions reflect a lack of understanding of the data, prompting a need to seek answers and validation from others before stakeholders can put a “trusted” stamp on a dataset.
Another sign that trust is eroding is when people start creating their own dashboards instead of using the ones available.
When people start seeing inconsistencies between tools or departments, they start looking into the raw data to obtain answers to their questions. This type of behavior reflects the fact that people prefer to rely on their own skills rather than turn to more experienced data team members.
The problem with this approach is that it's not only time-consuming and labor-intensive but also error-prone. This results in the creation of poor-quality dashboards (or excel sheets), unfit for usage, and that contribute to further trust erosion.
Finally, trust erosion can also be expressed by a disinterest in the data. This is a sign that people don’t think the data is valuable enough to warrant bothering you about it or asking you to make it better.
This is probably the worst sign, as disinterest is expressed by complete silence. And complete silence can be interpreted as a sign that everything is working as it should.
The best way to understand the meaning of this silence is to look at data usage. If you are using a data catalog, it is very easy to look at usage per team and per employee. In fact, when you have a data catalog in place, people use it every time they need to find or understand a data asset. This makes data catalog usage a good proxy for data usage in general.
If you have low data usage paired with silence, this is a sign that people are generally disinterested in the data you have produced.
If you have observed a particular sign or a combination of these, trust is likely an issue for your organization. You might also be looking to build data trust in your company from the ground up. The next section dives into the practical side of enabling data trust. Our key recommendation: start with transparency.
Trust is great, but it's hard to pin down and capture. For this reason, transparency is the best place to start. Transparency is critical to help people find the data they need and understand the context behind it. As people can find and understand the data, they start extracting value and begin to trust it. Transparency is as a vehicle to achieve trust, which is the end goal.
Transparency is hard to achieve and requires directed efforts. We've pinned down three levels of transparency, each level building up on the previous one to create higher levels of trust in data.
Disclaimer: transparent does NOT mean public. Transparency doesn’t mean that you should grant access to everyone in the company. As you improve transparency, you should ensure that access is managed properly and that the right data is shared with the right stakeholders.
The first level of transparency, as indicated by William Angel, involves having a common set of definitions and vocabulary that everyone in the company shares. This ensures that everyone in the company speaks the same language.
It is still very common for different departments of the SAME company to use different definitions of metrics such as “revenue” in their day-to-day operations. When people have different definitions, their reports say different things.
This is a problem for two reasons. When reports disagree, decision-making is slowed down. It’s impossible to make a decision if reports from different departments say different things.
Secondly, when definitions are unclear and ambiguous, people will use a definition that suits them and optimize their work for this preferred definition.
When two people are optimizing for different definitions of a situation, they often end up disagreeing more and more.
Definitions impact where people put their focus and energy. As a leader, you should make sure everyone is on the same page regarding definitions to ensure everyone is working toward the same goals.
In practice, this can be achieved by building a business glossary shared by everyone in the company. This glossary can act as a reference point every time someone is calculating a metric or building a report.
The second level of transparency relates to context. This means enriching your data assets with the right level of context, which enables people to understand the data.
If you don't contextualize your data before loading it in the data warehouse, it's not going to be very useful. The data you have is only pertinent if you put enough effort into producing and contextualizing it.
In practice, this means documenting and curating your data assets with column definitions, tags, owners, etc. You can automate a lot of this documentation using a data catalog, but this curation effort is essential if you’re seeking to provide context to stakeholders.
Emily Hopper from JW Player explained that since documenting data assets at JW player, people know where to find the data and how to use it without having to reach out to someone else on the team.
Another way to provide the right context around data is to invest in a strong data lineage tool. Modern data catalogs like Castor provide lineage capabilities within the catalog.
Data lineage is an extremely powerful transparency device. It enables people to understand how data assets are related. Let's say something breaks upstream. Data lineage allows everyone to understand what the consequences will be downstream, avoiding the unpleasant surprises that break trust.
As Nicky Parseghian noted, data lineage is powerful because it allows you to “build trust in situations that would normally cost you trust”. By showing how one thing relates to another, lineage helps people trust the data even when things are breaking.
This second layer of transparency provides context and perspective. It increases stakeholders’ trust in the data by allowing people to see its full picture and understand it.
The third level of transparency is enabling social discovery. This means building transparency around what people in your company are doing with the data so that other employees can learn from their best practices.
When people know that trusted colleagues have queried and used the data, they can better infer that this dataset is trustworthy.
Social discovery can be achieved through the collaborative features of your data catalog such as query history, which allow you to understand and leverage the work of others.
Obtaining social validation that the data is valuable goes a long way toward building data trust.
Most of the trust-related questions we asked Nicky Parseghian, Sr. Director of IT Strategy and Services at SADA were answered by the following phrase: “Keep it super simple”. Nicky is a big advocate of the KISS principle, stating that systems should be as simple as possible and avoid complexity wherever possible.
This approach is the best way to start when you’re wondering where to to begin with transparency and trust. Keep It Super Simple. Start by aligning with the people, and understanding their needs. Start very small and very light. Begin with getting alignment around your most important business definitions and documenting your most popular tables only. Reduce the focus. Slowly work towards building social discovery once you have put the basic stuff in place. Once you have established a strong foundation, you can start thinking about enhancing definitions and enlarging your focus.
We write about all the processes involved when leveraging data assets: from the modern data stack to data teams composition, to data governance. Our blog covers the technical and the less technical aspects of creating tangible value from data.
At Castor, we are building a data documentation tool for the Notion, Figma, Slack generation.
Or data-wise for the Fivetran, Looker, Snowflake, DBT aficionados. We designed our catalog software to be easy to use, delightful and friendly.
Want to check it out? Reach out to us and we will show you a demo.