The Data Exchange Podcast: Jesse Anderson, Ben Lorica, and Jenn Webb on new tools, processes, and best practices for building high-impact data applications.
Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.
This week our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and Jesse Anderson, Managing Director at the Big Data Institute. Jesse is the author of a recent book entitled “Data Teams: A Unified Management Model for Successful Data-Focused Teams”. This conversation was focused on key areas in data engineering.
We discussed a range of topics including:
- Streaming and real-time technologies and their applications to business intelligence and analytics.
- Data management, particularly the rise of cloud data warehouses, lakehouses
- The importance of “Ops” tools and processes, as exemplified by the growing interest in MLOps and DataOps.
- The growing interest in workflow orchestration and workflow management tools like Apache Airflow and next-generation frameworks like Prefect, Dagster, Temporal, and more.
Jesse Anderson:
If we think about our Maslow’s hierarchy of needs of data engineering, we’ve now moved on from, “I just need to store some data”, into “just storing that data isn’t enough”, to “now I need to make sure that I have a well orchestrated automation for that data to be cleaned and then have your ML pipeline trained”. This is the level of maturity that we’ve gotten to, there’s definitely this push within the companies to hit this level of maturity. … Automation is a key part of what they want, they want reproducibility, that there is a button push, and it is done the same way each time, rather than here, we kick off this one-off thing that maybe somebody messes up on.
Related content:
- A video version of this conversation is available on our YouTube channel.
- What is DataOps?
- Model Monitoring Enables Robust Machine Learning Applications
- Jesse Anderson: “A Unified Management Model for Successful Data-Focused Teams”
- Chris White: “Towards a next-generation dataflow orchestration and automation system”
- Travis Addair: “The Future of Machine Learning Lies in Better Abstractions”
- Denise Gosnell: “How graph technologies are being used to solve complex business problems”
- Zhe Zhang: “How Technology Companies Are Using Ray”
- Abe Gong: “Data quality is key to great AI products and services”
2021 Data Engineering Survey
The 2021 Data Engineering Survey is now open and we need your help. The survey takes less than 5 minutes to fill out and we’ll share the report of the survey findings with you. You’ll also be entered in a drawing for free copies of the Data Teams book and other prizes.