The Data Exchange Podcast: Michel Tricot on key components and features of modern data integration solutions.
Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.
This week’s guest is Michel Tricot, co-founder and CEO of Airbyte, a startup behind the popular open source project with the same name. While still a relatively young project, Airbyte is a favorite among data and platform engineers who are responsible for building and maintaining data integration systems within companies.
As we observed in recent posts on DataOps and Metadata Management Systems, several trends are nudging companies towards modernizing their data infrastructure: data volume and the number of data sources have exploded, the number of teams and personas that depend on data continue to grow, these new users of data expect a certain amount of reliability and freshness, and finally regulators and users are increasingly concerned with issues related to data privacy and security.
I always go back to how we think about building data stacks. All these earlier data integration solutions are trying to be full, comprehensive solution, end-to-end. They try to own 100% of your data value chain. And that works up to a point. In general, you start getting a solution internally that works for 60% of your use cases. Then you need to do something that is a little bit outside of how these solution were thought about, then you have to do a huge hack.
… Airbyte has a few key components. First are the connectors, and the protocol that powers these connectors: How do you build a connector, how do you package and how you exchange data between sources and destinations, and all the tooling that goes into this. Secondly, we have dashboards and API’s to control and orchestrate all these data pipelines. You have different ways to deploy: on a single node or you can deploy it on Kubernetes. Finally, people have written guides on how to actually put things like login, security, access control on top open source Airbyte.
Download a complete transcript of this episode by filling out the form below:
Highlights in the video version:
Origin Story of Airbyte
Modern Data Integration Solution
Maintaining Multiple Solutions and Building a Product
Parallel System and Connectors
Transformation and Control
EL Solution, EL Infrastruction, and Open Source
Singer and Airbyte Connectors
Dashboard and Orchestration
Airbyte Cloud, and Schema Evolution
License Change and New Contribution Model with the Community
Unstructed Data and Reverse ETL
Data Production Downstream and Performance
Airbyte Users: Streaming and Batch
Democratization of Data and Data Sources
ML, Dashboards, and Monitoring
Data Quality, Metadata, and Data Discovery
Airbyte is Hiring
- A video version of this conversation is available on our YouTube channel.
- “What is Dataops?”
- ”The Growing Importance of Metadata Management Systems”
- Christ White: “Towards a next-generation dataflow orchestration and automation system”
- Jesse Anderson: “What’s new in data engineering”
- Travis Addair: “The Future of Machine Learning Lies in Better Abstractions”
Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.
[Photo by geralt on Pixabay.]