The Data Exchange Podcast: Steve Touw on why data governance needs to go from the boardroom into code.
As the amount and importance of data grows within organizations, there is growing interest in tools that enable them to strategically utilize, manage, and unlock their data resources. This week’s guest1 is Steven (Steve) Touw, cofounder and CTO of Immuta, a startup that builds tools that help companies address data governance, data discovery, data privacy and security. Our conversation included recent trends in data access governance, data engineering (including highlights from Immuta’s recent data engineering survey), and emerging topics such as DataOps and data cascades.
We discussed the challenges facing companies tasked with implementing data access governance as the volume of data, the number of data sources and platforms, and the number of data users within companies, all continue to grow rapidly. I asked Steve what are key features of modern data governance systems:
- ❛ The two big ones are scalability and stability. How can I build as few rules as possible to do what I need to do in an understandable way so anyone in the organization can understand what’s going on? I think some of the legacy open source solutions that exist today are very non scalable. And if you get into situations where if the one person that knew how to build all these rules gets hit by a bus, you’re in big trouble, and if you’re not that one person, you’re afraid to make any changes. We see a lot of customers in that situation before they move to us.
… So I said scalability, the other piece is stability, and stability is absolutely a foundational piece. We actually have seen a huge movement for what we call policy as code. So just like you have infrastructure as code, you want to be able to manage your data access policies in a similar way that you would manage your data pipelines. You could “source control” your policy definitions, you could manage pull requests against that, or have people approve changes. If you source control your data governance policies you can understand and track changes over time. We allow you to take this kind of approach of declaring your policy state in declarative files, source controlling those and pushing those to Immuta to represent that state.
We’ve seen a huge demand for that because everyone wants to build this into their DataOps tools. So this not only applies to things like data access control, but also data quality, has similar stability desires in these organizations. The other thing that helps drive this is metadata. In order to make this all work, you need to know everything about all your systems in a single place. So we consider ourselves that metadata hub, if you will, or centralized version of the metadata.
Related content and resources:
- A video version of this conversation is available on our YouTube channel.
- What is DataOps?
- Data Cascades: Why we need feedback channels throughout the machine learning lifecycle
- The Growing Importance of Metadata Management Systems
- One Simple Chart: Data Engineering jobs in the U.S.
- Abe Gong: “Data quality is key to great AI products and services”
- Ryan Wisnesky: “The Mathematics of Data Integration and Data Quality”
- Jian Pei: “Pricing Data Products”
 This post and episode are part of a collaboration between Gradient Flow and Immuta. See our statement of editorial independence.
[Image by Ben Lorica]