Data governance sounds boring... No one ever wants to deal with it. Yet, when you dive into it, you can make it painless and against all odds ... quite fun !
Let me take you through the things you can work on to make your Data Privacy Officer's life easier:
Auto-classification of PII Data: your customer might give you their email, birthdate, credit card information, or worse access to their medical data. You can work on Machine Learning algorithms to automatically scan and detect those PII fields. Of course, if you auto-classify tags, let people know that it was a computer-powered PII tag so they can correct the error.
Automate Data Access With Tags: data consumers are numerous. It is never obvious to provide a well-designed access control policy for data people. How about defining access based custom tags such as business-related metadata, technical metadata, or security classifications? This enables you to build a complex access right system, without having to maintain a huge number of access rights.
Automate lineage generation: if you are working with modern data warehouses and data visualization tools, chances are that you can automate lineage generation through SQL parser.
Build a Propagation Algorithm: if you managed to overcome the first steps you are now able to classify PII, tag data assets, and build the lineage programmatically. This means you can automatically ensure that every table or column that is derived from a column tagged as sensitive inherits the same classification and security controls.
Make sure no one ever accesses sensitive data: now, in the past or in the future. I reckon that maintaining by hand a record of all data users can be a pain. What if you could build an audit log of all users and match it to the roles defined in the warehouse? How about automatically flagging accesses that haven't in the past been consistent with the new data policies?
Well, I agree that this is a lot of work for a few data engineers. And often, Data Privacy Officers don't have the budget to build all this automation. But, god, this makes life so much easier.
We write about all the processes involved when leveraging data assets: from the modern data stack to data teams composition, to data governance. Our blog covers the technical and the less technical aspects of creating tangible value from data.
At Castor, we are building a data documentation tool for the Notion, Figma, Slack generation. We designed our catalog software to be easy to use, delightful and friendly.