The data mesh paradigm advocates for the use of data domains as a way to partition data into meaningful groups. Domains are initially introduced to help create a clear ownership structure and enable better data discovery.
While this approach can be effective, it can also be confusing. I’ve been reading about data mesh domains for a year, and still cannot fully grasp the concept. I’ve thus been wondering: can we achieve better ownership and discovery without having to toy with data domains?
To a certain extent, we can.
In this article, I aim to clarify the purpose of data domains and explore simpler alternative methods to achieve these goals.
At their core, data domains are simply a way to group data in an organization. Why would we want to partition data? There are three reasons for it:
👉 To make it easier for people to locate the data they need.
👉 To enable clear ownership assignment and accountability in case of issues.
👉 To provide necessary context for better understanding of the data.
The good news is, most companies never have to think about the concept of domain to tackle these three elements. So, stop reading every article you can about data domains, because you probably don’t need them. Or at least, not yet.
To tackle the elements above, companies just need to find good ways of grouping data to facilitate ownership assignment and data discovery. Thankfully, there are two ways of grouping the data that your organization is already extremely familiar with: teams and sources.
These classifications are simple yet highly effective in achieving 90% of the job in terms of ownership, discovery, and understanding. By properly using these classifications, companies can improve data management without needing to introduce the rather complex concept of domains.
In this piece, we look at how grouping data according to teams and sources can address ownership and discovery challenges. We'll also explore when it may be necessary to introduce the concept of data domains, which you might need when your organization grows in complexity.
Start with teams
Partitioning your data based on teams will help you enable powerful data discovery and assign ownership for the context around this data. Let's take a closer look at how team-based partitioning can help with these two things.
Partitioning your data according to a classification into teams makes a lot of sense for attributing ownership over the data and ensuring that responsibilities are clearly defined.
In this piece, ownership refers to context ownership, not technical ownership. This means that when we say the marketing team owns marketing data, we mean that they are responsible for maintaining the documentation and context surrounding these assets.
Now that we’ve established this, there are three compelling reasons why grouping data by teams is the way to go:
👉 Ownership assignment made easy: teams are extremely well defined entities in organizations. It makes it easy to assign the data to a team. Teams aren’t blurry, they don’t overlap, which makes assigning ownership incredibly straightforward.
👉 Better documentation: When team own their data, the documentation around the data suddenly makes more sense. Teams have a deep understanding of the processes, goals and metrics they are responsible for. This makes them the absolute best people to bring context around data close to their function.
👉 Clarity of responsibilities: Grouping data by teams clarifies who is responsible for bringing context to data assets. When a stakeholder has a question about marketing data, it will take them two second to realize they need to ping the marketing team, and not the owner of some obscure domains.
Partitioning the data by team also improves data discovery processes, and helps people find the data they need much faster.
Everyone in an organization is familiar with teams. Therefore, partitioning data based on teams makes data discovery a much more familiar process. This lowers the barrier to entry for less technical people who might be intimated by data.
Partitioning data based on teams also caters to two types of individuals looking for data context: the information seekers, who are searching for a specific piece of information and the explorers, who want to browse the data landscape.
For example, if an information seeker is looking for marketing data, they would know that the marketing team is responsible for that data and they just have to filter the data by Team → Marketing in order to find it. They also know that if they have trouble understanding the data, they can just ask the marketing team.
Similarly, if an explorer wants to understand the data strategy better, they may want to explore the type of data that the marketing team is handling. By browsing through the marketing team's dashboards and knowledge map, they will get a clearer picture of how the business operates.
Continue with sources
While partitioning your data based on teams is great for assigning ownership, it might not be sufficient for discovery purposes. Therefore, another method of partitioning data is based on its source, which is another element business people are familiar with.
Everyone is comfortable with data sources, because stakeholders use them to perform their daily tasks. For instance, sales people may not be data experts, but they know salesforce like the back of their hands. They should thus be able to filter the data by source when exploring the warehouse.
Partitioning data based on its source also caters to the two types of people looking for documentation: information seekers and explorers.
For instance, when an information seeker is searching for specific information, they will have a good idea of which data source to look for. They can filter their search by the source, such as Salesforce, and easily find the information they need. This enhances the discovery experience and reduces the time required for data exploration.
Similarly, for an explorer who’s looking to gain a broader understanding of the data landscape, exploring the data through a source angle can be helpful. For example, they can determine which sources have the most relevant or trustworthy data for their specific needs, or identify redundant or incomplete data.
This additional layer of partitioning, in combination with partitioning based on teams, can improve the efficiency of data discovery in the organization.
When do you need to introduce data domains?
Once you have partitioned your data in these two categories, you will have effectively addressed all the ownership issues, and most of the challenges related to discovery and understanding.
However, some organizations operate in a more complex environment and may need to introduce an additional layer of complexity to accurately capture the intricacies of their business.
For these organizations comes a point where the partitioning based on teams and sources alone may not be sufficient to ensure discoverability and ownership.
Let's say you work for Airbnb. The company has partitioned data ownership in teams, like marketing, engineering, and customer support. They also grouped their data based on its source, distinguishing between website data and mobile app data.
But what if you need to know something about the pricing for experiences on the platform, like tours or cooking classes? This information does not neatly fit into any of the existing teams or data sources. It's not something that the marketing team owns or something that only comes from the website or app. What do you do then?
This is where the concept of domains comes in. In this case, Airbnb needs to create a new domain called "Experiences" to capture this data and ensure it's properly owned. By introducing this new level of categorization, they can better organize their data and ensure that important information doesn't fall through the cracks.
The key takeaway is that as companies grow and become more complex, they may need to introduce new levels of categorization to manage their data.
However, this new way of partitioning data should only come into play once you have some information that span multiple teams or sources, and that do not fall neatly in either category.
Decentralizing data can be a difficult process, but you should always aim at keeping it simple. While data domains can be a powerful tool for handling complex business needs, you certainly do not need to introduce them early in your data journey.
In most cases, teams and sources are sufficient for partitioning data and ensuring ownership and discoverability. You should thus start with these more familiar concepts and gradually add complexity as needed.
The goal is not to introduce unnecessary complexity, but to provide a framework that helps teams work more effectively with data.
Subscribe to the Newsletter
We write about all the processes involved when leveraging data assets: from the modern data stack to data teams composition, to data governance. Our blog covers the technical and the less technical aspects of creating tangible value from data.
At Castor, we are building a data documentation tool for the Notion, Figma, Slack generation.
Or data-wise for the Fivetran, Looker, Snowflake, DBT aficionados. We designed our catalog software to be easy to use, delightful and friendly.
Want to check it out? Reach out to us and we will show you a demo.
You might also like
Delve into the return on investment (ROI) of data mesh and how CastorDoc can assist in maximizing its benefits for your organization.
Explore the journey of treating data as a product, from concept to reality, and the critical role CastorDoc plays in enabling data-driven organizations.
Fantastic tool for data discovery and documentation
“[I like] The easy to use interface and the speed of finding the relevant assets that you're looking for in your database. I also really enjoy the score given to each table, [which] lets you prioritize the results of your queries by how often certain data is used.”
Michal, Head of Data, Printify