“I like to compare Castor to an iPhone in the sense that it is very easy to use and the interface is extremely intuitive. I can give Castor to anyone in the company and I know that they won’t ask any questions.” Filipe Palma, Data Platform Product Manager, Printify.
Founded in 2015, Printify is a marketplace connecting online merchants to major print-on-demand and dropshipping manufacturers worldwide.
The company serves more than 2 000 000 merchants, the majority of which are individual businesses selling products to end customers through Printify.
At Printify, the data team is separated into four different sub-teams: The BI team, the data analysis team, the data science team, and the data platform team.
Castor partnered with Printify starting in 2022 to help the company improve its user experience around data while sustaining its rapid growth.
In a recent interview, we spoke with Printify's data platform product manager, Filipe Palma, about the company's data discovery journey. A year after using Castor, Filipe shared how Printify is using data discovery to unlock new opportunities.
Castor helped Printify with two specific aspects of its business: data team productivity and collaboration around data. The company now intends to use Castor as a means to build company-wide metrics alignment.
In our discussion with Filipe, we discussed the challenges that led Printify to look for a data catalog, the benchmarking and implementation process, and the results that followed.
“The quality of work was below our expectations. Data consumers were spending much time trying to understand which data to use, sometimes using the wrong datasets to produce their analysis” Filipe Palma, Data Platform Product Manager, Printify.
During the pandemic, Printify experienced exponential growth in both revenue and company size. New data stakeholders were onboarded monthly, and more employees were accessing the data.
Printify’s data environment grew rapidly. However, the lack of data governance started to threaten the data experience for stakeholders working with data.
“We had a lot of information in our data warehouse, but we were not offering context to our consumers on how to use it” Filipe claimed.
In practice, Filipe explained that data consumers were spending a lot of time trying to understand which data they should use for their analyses, and what the data meant.
Data stakeholders were overwhelming the data engineering team with day-to-day business questions. As Filipe explained, this was an unintended use of the data engineering team’s time, given their role is to focus on developing.
Poor understanding from stakeholders ended up spamming a lot of slack channels and led to a downturn in overall productivity and quality of work.
Additionally, Printify realized there was a need for stakeholders to understand the data lineage. The team wanted to understand how the data was flowing within the company, yet Printify could not easily provide the data lineage for its dataset.
The company tried to quantify these challenges by conducting a survey related to the user experience people had with the data. On a scale of 1 to 5, people rated their satisfaction with data documentation at 2.3. This clearly expressed dissatisfaction with the level of context provided around data.
After reviewing the survey results, Printify decided to prioritize implementing a data catalog.
“One of the factors we considered was choosing a data catalog that was also a startup, so it could be more agile compared to legacy vendors” Filipe Palma, Data Platform Product Manager, Printify.
As they searched for a data catalog tool, Printify used Medium and Reddit to get ideas from other companies in the ecosystem.
Printify’s data catalog search was guided by a need for simplicity. The company wanted to avoid some of more the complex data catalog solutions that could be found on the market.
Data stakeholders at Printify wanted a simple solution to search the data and obtain the right context to leverage it. They also needed a tool that could provide visibility into the data lineage in a straightforward manner.
“In our opinion, this is where Castor shines, because the search capability is more straightforward than other solutions. People immediately understand how to use the tool because it’s extremely intuitive” Filipe said.
Printify evaluated three data catalog solutions. For the proof of concept, the company chose to test the tools with all its consumer profiles within the data team: data scientists, data analysts, product managers, engineers, and business intelligence. They evaluated the tools according to a pre-determined evaluation matrix.
Printify opted for a two-step implementation process:
At Printify, the efforts to enrich the data catalog are split between different teams. The major responsibility lies with Printify’s data steward, who maintains the data catalog. The data steward is in charge of gathering context from different teams and documenting data assets accordingly.
For data assets created by specific teams, the teams themselves are responsible for documenting their work. They work together with the data steward to guarantee the standardization of the documentation and definitions across the organization, Filipe explained.
“Before Castor, we were getting questions such as ‘Where can I find this data?’ or ‘What does this data mean?’ at least once a day in our slack channels. Now, we barely have one every two weeks. Stakeholders now use Castor to answer these kind of questions.” Filipe Palma, Data Platform Product Manager, Printify.
Since implementing Castor as a data catalog solution, Printify noticed clear improvements in productivity and collaboration.
First, in terms of productivity, implementing a data catalog empowered stakeholders to leverage data without having to rely on the data engineering team. This increased both the productivity of data stakeholders and of the data engineering team. Stakeholders can now move quickly and accurately with the data, while the data engineering team can focus on producing data instead of answering data requests.
There are two indicators that suggest a strong increase in productivity at Printify.
Second, Castor also allowed Printify to build a culture of collaboration through two initiatives: identifying experts in specific data, and re-using existing analyses for new projects.
Castor makes it possible to assign ownership over data assets. For a specific dataset, one person or team can be assigned as the owner. At Printify, this feature has made it easy for stakeholders to identify the go-to person to qualify a specific data asset. Through ownership, Castor guides stakeholders to subject matter experts, which improves the quality of everyone’s work.
Printify is also using Castor as a central repository where stakeholders share analysis around specific areas. Everyone is able to find and re-use the information in Castor, such as popular queries. This improves collaboration and the quality of everyone’s work because it provides stakeholders with a good starting place when beginning analysis in a specific area.
Following these improvements around collaboration, Printify decided to take things one step further with Castor. The company is planning to use the tool as a single source of truth for business definitions, with the aim to align everyone around company metrics.
“We want Castor to be a dictionary for every definition and concept around the company. For example, if someone is wondering what the concept of ‘merchant’ or ‘order’ means, he will find the answer in Castor.’ Filipe explained.
We write about all the processes involved when leveraging data assets: from the modern data stack to data teams composition, to data governance. Our blog covers the technical and the less technical aspects of creating tangible value from data.
At Castor, we are building a data documentation tool for the Notion, Figma, Slack generation.
Or data-wise for the Fivetran, Looker, Snowflake, DBT aficionados. We designed our catalog software to be easy to use, delightful and friendly.
Want to check it out? Reach out to us and we will show you a demo.