Data and Machine Learning Platforms at Shopify

Azeem Ahmed on the evolution of Shopify’s data and machine learning platforms, and the power of the lakehouse architecture.

SubscribeApple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.

Azeem Ahmed, is Director of Engineering at Shopify, where he leads the team that builds the primitives and the API’s used by all data scientists, machine learning engineers, and members of Shopify’s engineering team. Prior to Shopify, Azeem led data and analytics infrastructure teams at Linkedin and Consensys. Our conversation focused on the evolution and design of data and machine learning platforms within Shopify. Azeem and I also discussed broader trends, including the rise of modern data platforms and the maturation of data lakehouses.

Download the 2022 Data Engineering Survey Report and learn how companies are designing and building data and AI platforms.

Azeem Ahmed:

We think about three large primitives: the ingest primitive in this chat interface, the transform interface, and the publisher interface. All of these apply to “data sets” – which could be tables, they could be models, they could be reports, dashboards, and all the other things that you mentioned. When you think of ingest, transform, publish, these are all operating on instead of storage.  We are building the lakehouse architecture: our storage is GCS, Iceberg table format, plus Parquet. … Trino is our query engine.

… Where I think Ray is different and excels at, is built around the idea of a unit of compute that you need to scale. Not just taking a piece of data and then you just need to distribute it everywhere and process it, as opposed to a unit of compute that you need to distribute and parallelize. That’s really useful in ML training, reinforcement learning, deep learning, and other tasks that are CPU heavy and GPU heavy.

We’re early in our experiment with Ray, I think we’re about six months into it now. So we have got a system now where users can come in and with the command line that they can run, and they get a Jupyter Notebook. They have our library that they can use to spin up a workspace and the workspace gives them a cluster. After that, they can write Python.

Highlights in the video version:

Related content:

Subscribe to our Newsletter:

[Image: “Lamps Bazaar Vintage Lantern Lanterns” from Maxpixel.]