The Data Exchange Podcast: Chris White on building an engine that enables arbitrary proprietary workflows.
In this episode, our managing editor Jenn Webb and I speak with Chris White, CTO of Prefect, a startup building tools to help companies build, monitor, and manage dataflows. Prefect originated from lessons Chris and his co-founder learned while they were at Capital One, where they were early users and contributors to related projects like Apache Airflow.
Data pipelines have become critical as data and machine learning increasingly become essential in most modern software applications and systems. In this episode we do a deep dive into what makes data pipelines challenging, as well as the essential components of modern workflow management and orchestration solutions. Chris explained some of the design choices they made with Prefect, the current status of the open source project, as well as the near-term roadmap. Anyone building data and machine learning products and services will want to listen to this episode: it is a great overview to important areas in modern data engineering and data infrastructure.
Testability, version control, all of that should just be a default assumption of whatever you’re doing. And so we do. We did build Prefect with those elements as first classes. And that’s the reason for example, just comparing with Airflow, that you can run a workflow in an interactive terminal with Prefect, whereas with Airflow, you have to spin up a scheduler and do all this other stuff. The idea is that you can write tests for it, you can mock your tasks, you can mock your test states, you can test your retry functionality, we have a lot of dev tooling around that. We haven’t, you know, made a big deal about it, because the business problem we’re solving is slightly different from that. And that is just what I think everyone expects from any software tool is the ability to do those things.
Download a complete transcript of this episode by filling out the form below:
Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.
- A video version of this conversation is available on our YouTube channel.
- “Combine the development experience of a laptop with the scale of the cloud”
- “What is DataOps?”
- Travis Addair: “The Future of Machine Learning Lies in Better Abstractions”
- Ryan Wisnesky: “The Mathematics of Data Integration and Data Quality”
- Steve Touw: “Injecting Software Engineering Practices and Rigor into Data Governance”
- Reza Hosseini and Albert Chen: Building a flexible, intuitive, and fast forecasting library
2021 Data Engineering Survey
The 2021 Data Engineering Survey is now open and we need your help. The survey takes less than 5 minutes to fill out and we’ll share the report of the survey findings with you. You’ll also be entered in a drawing for free copies of the Data Teams book and other prizes.
[Image from pxhere.]