Towards a next-generation dataflow orchestration and automation system

The Data Exchange Podcast: Chris White on building an engine that enables arbitrary proprietary workflows.

SubscribeApple • Android • Spotify • Stitcher • Google • RSS.

In this episode, our managing editor Jenn Webb and I speak with Chris White, CTO of Prefect, a startup building tools to help companies build, monitor, and manage dataflows. Prefect originated from lessons Chris and his co-founder learned while they were at Capital One, where they were early users and contributors to related projects like Apache Airflow.

Let us know your favorite workflow orchestration tool, take the 2021 Data Engineering Survey and get a free copy of the results and be entered into a drawing for a free Data Teams book & other prizes.

Data pipelines have become critical as data and machine learning increasingly become essential in most modern software applications and systems. In this episode we do a deep dive into what makes data pipelines challenging, as well as the essential components of modern workflow management and orchestration solutions. Chris explained some of the design choices they made with Prefect, the current status of the open source project, as well as the near-term roadmap. Anyone building data and machine learning products and services will want to listen to this episode: it is a great overview to important areas in modern data engineering and data infrastructure.

Chris White

Testability, version control, all of that should just be a default assumption of whatever you’re doing. And so we do. We did build Prefect with those elements as  first classes. And that’s the reason for example, just comparing with Airflow, that you can run a workflow in an interactive terminal with Prefect, whereas with Airflow, you have to spin up a scheduler and do all this other stuff. The idea is that you can write tests for it, you can mock your tasks, you can mock your test states, you can test your retry functionality, we have a lot of dev tooling around that. We haven’t, you know, made a big deal about it, because the business problem we’re solving is slightly different from that. And that is just what I think everyone expects from any software tool is the ability to do those things.

Download a complete transcript of this episode by filling out the form below:

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

Related content:

2021 Data Engineering Survey

The 2021 Data Engineering Survey is now open and we need your help. The survey takes less than 5 minutes to fill out and we’ll share the report of the survey findings with you. You’ll also be entered in a drawing for free copies of the Data Teams book and other prizes.

[Image from pxhere.]