The Data Exchange Podcast: Jeremy Stanley on key dimensions and features of modern data quality and data monitoring solutions.
This week’s guest is Jeremy Stanley, co-founder and CTO of Anomalo, a startup building SaaS tools to help companies with data quality (DQ). Prior to Anomalo, Jeremy was VP of Data Science at Instacart. Anomalo just announced a $33M Series A Round on the same day this podcast went live. Based on our upcoming data engineering survey, DQ is one of the most important challenges facing data teams today.
I’ve been a data person for a long time. I’ve worked on data in everything from insurance applications, to advertising technology, to personalization, to all the logistics and fun challenges at Instacart. If you work with data long enough, you begin to appreciate that all of the fancy things that you can do with machine learning and analytics, they’re all only as good as the quality of the data going into those processes and into those models. I often found myself either personally or with the teams that I was a part of, at these different companies building solutions in-house, to be able to monitor the quality of the data that we were depending on.
At Instacart, there was a table we were using to launch new markets from. That table was based upon people coming to our website, and trying to sign up and putting their zip code in. This table got switched and stopped being updated, and yet, we were still making decisions on it. So for six or nine months, we were using stale data about zip codes. … and it cost us a lot of growth. Fixing that data quality issue was one of the biggest things that we did for growth in my tenure at Instacart.
Highlights in the video version:
- Data Validation and Data Quality
Data Engineering Key Challenges: Data Quality and Validation
No Code, Monitoring, and Rich Visualization
Who uses Anomalo?
Overlap between what Anomalo does vs. Data Discovery/Data Catalog
DataOps: Monitoring, Automation, and Incident Response
Modern Cloud Data Warehouse
Challenges of Building This Solution and False Positives
Domain Knowledge, Human Knowledge, Machine Learning Approach and Scalability
Modern Data Stack
SaaS Platform Need Role Based Acess Controls
Anomalo Sucess Stories
Incease Use of Data, Importance of Quality, and Master Data Management
Structured Data and JSON
Data Quality Solutions and Automated Repair
Multi-Cloud and Centralize Data to Single Data Warehouse
Data Quality and ML: Unsupervised vs. Supervised
Reasons to Consder Anomalo
- A video version of this conversation is available on our YouTube channel.
- “Data Quality Unpacked”
- “Taking Low-Code and No-Code Development to the Next Level”
- “Model Monitoring Enables Robust Machine Learning Applications”
- “Why you should build your AI Applications with Ray”
- Abe Gong: “Data quality is key to great AI products and services”
- Sean Taylor: “Changes to the data science role and to data science tools”
- Rumman Chowdury: “The State of Responsible AI”
- Chris White: “Towards a next-generation dataflow orchestration and automation system”
Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.
[Photo: Matrix from Piqsels.]