Prior to implementing a proper data catalog, establishing data lineage for just one column would take up to a full day
This customer story was contributed by Suzette Puente, Senior Analytics Engineering Manager at Color Health. Color Health revolutionizes healthcare by making preventive care more accessible and streamlined. From cardiovascular health to cancer prevention, Color Health aims to provide at-home tests, vaccinations, and tailored care suggestions based on a patient's health history and risk factors.
At Color Health, data fuels our mission to help everyone live the healthiest life science and medicine can offer. As the Senior Analytics Engineering Manager, I lead a subset of the data team that focuses on creating intuitive and user-friendly data sources, empowering stakeholders with access to valuable insights for making informed decisions. With our collection of health histories, risk factors, and other essential data points, we transform raw data into something meaningful that improves individualized healthcare and measures the impact of our product on large populations.
Our data team is composed of business analysts, data engineers, and analytics engineers. We're driven to make healthcare data as accessible and actionable as possible, not just for the data team, but for everyone from product managers to genetics counselors to lab specialists.
We have to be agile as we're a smaller team in comparison to other departments. Our scrappy operating model allows us to be efficient, yet flexible, in responding to the ever-changing landscape of healthcare data.
I - Challenge: Shortcomings in Color Health's Data Documentation
“From a development perspective, the absence of a catalog was costing us time—a lot of it. Prior to implementing a proper data catalog, establishing data lineage for just one column would take up to a full day.” Suzette Puente, Senior Analytics Engineering Manager, Color Health.
When I first joined Color Health, our internal data documentation practices were in their infancy. Our spreadsheet version of a data catalog, while well-intentioned, had become challenging to maintain. This wasn't aligned with our data-driven culture and its effectiveness had room for improvement. As a new data scientist, I had a lot of questions about the data landscape and dependencies I wasn’t getting easily with our existing system.
Color was scaling fast, especially during the pandemic, as we spearheaded COVID-19 testing initiatives. The data team was pushing out a wealth of data to meet the company's need for informed decision-making. The lack of a structured data catalog posed a significant bottleneck. External stakeholders—from product managers to lab specialists—were unable to self-serve with data, even though they were more than capable of doing so. Simple issues, like missing field definitions, became major roadblocks.
From a development perspective, the absence of a catalog was costing us time—a lot of it. Prior to implementing a proper data catalog, establishing data lineage for just one column would take up to a full day. Especially during early phases of new service offerings, it is common for our team to make daily model changes, so tracking lineage was consuming time that could have been better spent on more strategic tasks.
II - Solution: Choosing the right data catalog
"The UI in Castor was so much cleaner and inviting, making it user-friendly not just for my team but also for non-technical folks." Suzette Puente, Senior Analytics Engineering Manager, Color Health.
Selecting a data catalog was a decision with far-reaching implications for our data infrastructure. I initiated the process by identifying potential solutions that could integrate with our existing BI tools, like Metabase. CastorDoc emerged as a strong contender, aligning well with our tech stack.
We ran brief demos with other tools, but CastorDoc's UI was a game-changer. The focus on usability extended beyond the data team to non-technical users within the organization, aligning with our aim to make data accessible and impactful across departments.
Adoption rates serve as a key metric for assessing the value of a new tool. In our case, both high-level and more technical stakeholders outside of the data team have become frequent users of CastorDoc, including product and program managers who leverage the tool to customize existing reports for their specific needs. To further drive adoption, we introduced a series of 'Data Masterclass' tutorials, which led to a measurable increase in usage across different departments.
III - Impact: Improved Productivity and Streamlined Governance
“Building lineage for columns used to eat up a full day, every two weeks. That's a 10% productivity hit. Thanks to CastorDoc's automatic tracing, I've gained that time back.” Suzette Puente, Senior Analytics Engineering Manager, Color Health.
Integrating CastorDoc into our infrastructure had an important impact on how we handle data. From an operational standpoint, we've seen a drastic reduction in the time spent on lineage mapping. What used to take a day now takes mere minutes, freeing up valuable resources for other high-impact tasks, like generating insights about the patient experience and how our services can be improved.
Before CastorDoc, our collection of data assets was much messier. With CastorDoc, we now have a clear picture of report and dashboard volume, what's being used, and what's just sitting there. We decluttered, eliminating redundant and unused data assets, and making it easier for our teams to focus on generating actionable insights. It's not just about lineage; it's about purposeful data usage. We’ve since archived approximately 50% of our data assets, which has greatly reduced search-stress for our stakeholders.
Although documentation is often viewed as a tedious task, CastorDoc has transformed it into a more manageable process. It identifies gaps, prompts updates, and holds us accountable. This resulted in a disciplined approach to data governance that aligns our team in the right direction.
Proactive Operations – No More Fire Drills
Before, when there were changes to our production models, we'd find out the hard way. Now, with CastorDoc, we see the ripple effects instantly. This has enabled us to be proactive, fixing downstream issues before they become problems. It's made a big difference in how we operate and established data trust within the company.
Ensuring consistent naming conventions was a key concern for us at Color because it is the foundation of clear communication across the organization. CastorDoc makes it easy to search for fields and rigorously enforce naming standards. This level of consistency removes any barriers to data usage and ensures that everyone is speaking the same 'data language.’
This quarter, we are leveraging CastorDoc's knowledge pages more effectively, linking them to other internal resources for a consolidated view. Additionally, we're exploring how to best document our business metrics within the tool and how we can make CastorDoc that starting point for information about our data.
Read More Success Stories
Enhancing Data Governance at Jimdo: A Case Study on CastorDoc's Impact on Efficiency and Compliance
The story of how Payfit used CastorDoc to improve documentation, optimize storage, and empower decision-making across teams.
Fantastic tool for data discovery and documentation
“[I like] The easy to use interface and the speed of finding the relevant assets that you're looking for in your database. I also really enjoy the score given to each table, [which] lets you prioritize the results of your queries by how often certain data is used.”
Michal, Head of Data, Printify