In the first episode of "Data Couch," I had the privilege of chatting with Ethan Aaron and asking him about the considerations to keep in mind when building a data stack from scratch. The conversation was super insightful, so I decided to make an article out of it, to ensure this discussion can benefit those starting on their data journey.
What is the Modern Data Stack?
Data platforms transitioned from a "data stack" to what's now known as the "modern data stack." As we move forward, the distinction between legacy, modern, or any other categorization of a data stack becomes increasingly blurry. In earlier times, data analytics took place within tangible servers – actual physical machines. The primary challenge was how to condense vast amounts of data and transfer it into these confined spaces. Thus, the conventional data stack was an assortment of tools designed to facilitate this data migration.
However, with the advent of cloud data warehouses, the paradigm shifted. The constraints of physical storage vanished, allowing for a more seamless data integration into the cloud, unburdened by scalability issues. The pivotal transition was: are your analytics centered around a tangible server or are they cloud-based? This shift predominantly occurred roughly a decade ago.
Fast forward to today, and the modern data stack has proliferated into an array of tools, with some estimates suggesting numbers as high as 1400, as per Matt Turck’s analysis. These tools aim to streamline processes like data extraction, transformation, loading, governance, and visualization in the cloud.
Yet, as we ushered in 2023, we witnessed an economic downturn. Many past valuations are predicted to diminish, leading to the unfortunate downsizing of teams. Consequently, businesses are urged to recenter their priorities on tangible business outcomes. The prevalent sentiment among teams is less about the myriad of tools at their disposal and more about the tangible returns on their investments. The crux lies in whether their data initiatives drive revenue, cut costs, or minimize risks. If not, the underlying data infrastructure, whether legacy or modern, becomes useless.
Three different use cases for data
Overwhelmed by tool choices for your data platform? Cut the confusion—start with what your business actually needs. Value comes in three flavors: analytics, automation, and data products. Let your use case lead the way in picking the right infrastructure. Now, let's dig into each of these.
Previously perceived as simply aggregating data to foster improved decision-making, its real essence lies in leverage. By providing a single individual with 10 dashboards, you empower them to influence various facets of the company.
This efficiency either allows for a leaner team or equips the existing team with amplified capabilities. Essentially, analytics is about harnessing data to amplify decision-making potential, offering the most direct route for many businesses to derive value from their data.
To establish an analytics-centric data platform, businesses should identify pertinent data sources, like CRMs. This data should then be transported to a central hub using tools like ETL or ELT, stored and computed in a data warehouse, and finally visualized through dashboards for real-time updates and insights.
This revolves around transforming manual tasks into automated processes. Consider the manual creation of Zendesk tickets for product issues. Transitioning from manual to automated ticket generation not only streamlines the process but also translates into tangible savings. The benefits here can be directly quantified by comparing the time taken for manual tasks versus their automated counterparts, resulting in direct cost savings.
To put in place an automation use case, businesses should prioritize the efficient transfer of concise data sets. The essence of automation lies in its ability to trigger specific actions based on precise data events, such as account creation. While the data volumes might be smaller than in analytics, the emphasis is on real-time responsiveness and ensuring seamless event-driven processes.
This isn't about internal tools labeled as 'data products'. Instead, it's about crafting tangible products powered by data that customers are willing to pay for. The value proposition is straightforward: if the data-driven product is profitable, it's adding value to your business.
For those gearing up to establish a data product-centric platform, it's pivotal to seek platforms that prioritize seamless integration with customer-facing applications, robust authentication mechanisms, and agile development capabilities. Given the direct interaction with customers, the platform should be scalable, highly secure, and offer tools that facilitate rapid prototyping and rigorous testing. In essence, select a platform that harmonizes technical robustness with a user-focused approach
In essence, while there's a foundational similarity in gathering and processing data across these use cases, the emphasis, challenges, and thus tools differ based on what your business objectives are.
A common misconception is that the size of your company will influence the data infrastructure. But that’s not the case. A company's data infrastructure is not directly proportional to its size. A large company might be very immature in terms of its data journey, just as a small startup could be advanced in its data operations.
The key lies in understanding where your company stands in its data journey. Starting small, focusing on a specific data source or system, and validating insights with key stakeholders can provide a clear roadmap. It's crucial to avoid distractions and remain aligned with the company's core objectives and maturity level. By emphasizing the data journey stage and crafting strategies that mirror organizational maturity, you can create a robust, efficient, and value-driven data platform.
II - Picking the Right Tools for Your Data Needs
Once you are settled on your use case, it’s time to pick your tools. Below are some considerations to keep in mind when choosing data tools.
1. Prioritize Value Creation
Let's start simple. Before diving headfirst into the world of data tools, think about what you truly need. Ask yourself: "Will this tool help my organization reach its goals? Does it add real value?" For example, if you're looking to unlock an analytics use case, you'll probably want a tool with good visualization, data extraction, and SQL features.
2. Identify Key Blockers
Once you've pinned down what you need, think about the challenges in your way. If getting data to the right people seems like a major roadblock, then perhaps a data catalog is what you're after. The idea is to identify those hurdles that are stopping you from getting the most value from your data.
3. Do Your Homework
Once you've identified a potential tool category, go the extra mile. Conduct a thorough Google search to compile a list of the top tools in that category. While visiting these tools' official websites, remember that while they might provide an initial insight, they may not always offer a complete picture.
4. Engage Directly
The best way to truly understand a tool's potential is through direct engagement. Many companies offer free trials, so take advantage of them to gain first-hand experience. Additionally, consider scheduling demos. Even if you're in the early stages of consideration, these demos can provide invaluable insights. An intensive approach ensures a holistic understanding of the tool landscape.
5. Conduct Comparative Analysis
Now that you've done your research and tried a few out, it's time to compare. See how each tool stacks up against what you need. you'll be in a great spot to pick a few top contenders for a closer look. If you’re looking for a place to start, I pulled together a comparative analysis of the data tools for each category of the modern data stack.
Creating business value from data is fundamentally about alignment—your infrastructure should directly reflect your company's core objectives and its current stage in the data journey. Be it enhancing analytics, automating processes, or developing data products, each decision should strategically support your ultimate business goals.
You might also like
Stay up-to-date with the latest best practices in data visualization, analytics, and ETL with CastorDoc's insightful guide.
Discover the power of no-code data tools with CastorDoc, making it easy to manage and analyze data. Get insights into the best tools available.
Fantastic tool for data discovery and documentation
“[I like] The easy to use interface and the speed of finding the relevant assets that you're looking for in your database. I also really enjoy the score given to each table, [which] lets you prioritize the results of your queries by how often certain data is used.”
Michal, Head of Data, Printify