I remember my first days as a data scientist at Ubisoft in San Francisco. Ubisoft is a video game company. They are creators of Assassin's Creed, Watch Dogs, The Division, Far Cry etc. I was so excited to look into players' behavior in-game. Haven't you always dreamt of knowing how many steps people did on average in Assassin's Creed Valhalla? What's the most popular song on Just Dance? Well, I am the curious type and was dying to answer those questions all day long. Yet, I am also the lazy type so I was looking to leverage work done by others. I hate doing work that's already been done by others.
Long story short, I spent my first weeks exploring all the Tableau dashboards that were already designed. I ended up really disappointed because BI tools weren't optimized for content consumption, especially at scale or in self-served data organization. I spent days finding the dashboards I was interested in.
A dashboard is a graphical representation of tabular data
Really useful to monitor business health on a day-to-day basis. You get a great deal of information at a glance. It sounds (and is!) great.
But as data grow, more people create more dashboards and it's quickly a mess.
Important things are not obvious anymore. Finding relevant analysis gets tricky. People are lost and start hating their BI tool.
I was surprised by a few things regarding BI tools. Here's the gist :
I couldn't explore dashboards efficiently
Last year, I met Arnaud de Turkheim and Amaury Dumoulin, heads of data at Payfit and Qonto. As data managers, they felt overwhelmed when dealing with BI tools. In a few words, the more they deployed self-served BI, the more they lost control, and trust in dashboards :
That was the dark side of dashboarding. Good news is there's a good side to it. Let me take you through the various good practices I've seen in customer interviews.
BI tools are very powerful, which means people can certainly find ways to create a confusing environment for end-users. Below are a few common good practices that can help you deploy self-service BI tools without losing control.
🔬 Set up a simple governance from day 1
One folder per team, one subfolder called playground, and to get out of the playground, the data team needs to:
🚰 Use 'popularity' to filter out dashboards
Archive frequently dashboards that are not used but reassure data people that these can be put back easily.
We can also use a [deprecated] prefix on the dashboard / visualisation names to mark for deletion.
🧶 Data modelling outside of BI tools
Transform most of your data inside the data warehouse with dbt. This will help you :
🎯 Find the right mix: centralized vs embedded data team
Each of those have advantages and problems.
Centralized data teams are easier to manage and organize. But they are also slower as you need to communicate your business needs more accurately.
Embedded data teams are more agile, more operational in their analysis, and are closer to business needs: they get things done quickly but not in a scalable way. It is really hard to keep a unified source of truth when it comes to data and vision.
👮 Have a clear process for how and where to save content
If your instance is the wild-west of users saving content anywhere they please, content redundancy, difficult clean-up, and ever-lasting confusion are likely to ensue. Set a standard of content hierarchy: how and where you’d like users to save their content to help keep everyone on the same page.
🦻 Collaborative feedback is powerful
You want useful dashboards to be in the spotlight. It's good for morale: BI analysts feel rewarded for their hard work and it is easier to navigate across hundreds of dashboards.
🕵️♀️ Implement test and quality checks
Every BI analyst dreads the idea of top managers asking "This dashboard seems off, what's going on?" and realizing the data pipeline feeding the dashboard broke.
You want to be the first aware of the problem. You will want know why and when your dashboard broke.
Having a good data governance strategy is a combination of three things:
Missing one part could make data governance a real pain to enforce.
A powerful search on top of your BI tool to help users browse through thousands of dashboards.
Show when the dashboard was last refreshed next to the documentation.
Prioritize results and data assets based on people's consumption. Never end up looking at last years' deprecated MRR dashboard.
'I only trust what I can see'! Knowing what are the tables and queries used to create a dashboard is essential to make sure you can trust it. By understanding downstream impacts, data engineers can communicate with the BI team to ensure they won't break important dashboards. Read more on data lineage use-cases here.
The problem with dashboards: you don't see what data flows in, you just see the output. Unless you write test and quality checks you won't know your dashboard is wrong until it's too late.
Save time and money by removing data assets that are not relevant anymore. This improve the overall state and readability of your BI tool.
Make sure every relevant person receives a notification when they should give their input.
Communicate with data people close to the data sources so that none of the relevant information gets lost in a Slack thread or in a coffee break.
Having an overview of data warehouse and BI tool rights can help you detect unlawful data usage.
Now, the question is simple.
Castor is the best tool to deal with dashboard rot. It enables data managers to scale a self-service analytics stack without losing control of data assets (data warehouse and BI tools).