Why You Should Optimize Your Deep Learning Inference Platform

Ben Lorica

5 years ago

The Data Exchange Podcast: Yonatan Geifman and Ran El-Yaniv on the benefits that accrue from using an inference acceleration platform.

Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.

In this episode of the Data Exchange, I speak¹ with Yonatan Geifman, CEO and co-founder of Deci, as well as with Ran El-Yaniv, Chief Scientist and co-founder of Deci and Professor of Computer Science at Technion. As companies deploy machine learning and deep learning to critical products and services, the number of predictions that models have to render can easily reach millions per day (even hundreds of trillions, in the case of Facebook).

These “prediction services” continue to grow in importance – 80% of content on Netflix is discovered through recommenders – and thus companies need to build platforms to serve predictions to an ever growing number of users and services. Deci builds tools to help companies accelerate and scale their inference platform to meet the requirements of their specific application and use case. They do so through an array of tools that looks at inference holistically and systematically:

[Image: **Inference Acceleration Stack**, from Deci.ai, used with permission.]

Download a complete transcript of this episode by filling out the form below:

Short excerpt:

Yonathan: ❛ An inference platform will have several layers. At the bottom, we have the hardware stack where we see CPUs, GPUs, and specialized hardware (ASIC) – there are a lot of startups working on new chips for deep learning. We see chips that are more focused on training, as well as chips that are more focused on inference. In our case, we are talking about dedicated hardware for deep learning inference …
On top of that layer, we have software drivers of the hardware, which determine how the the model is utilizes the hardware based on the driver. So the most familiar example is the CUDA drivers for Nvidia GPUs.

On top of that, we have the graph compilers: graph compilers are components compile a deep learning model in order to run more efficiently on the hardware. … Tools like TensorRT, Apache TVM, and OpenVINO are in the graph compiler layer.

… On top of the graph compiler, we see some open source tools and techniques that are well-known in academia (such as pruning and quantization).

Related content and resources:

A video version of this conversation is available on our YouTube channel.
FREE Report: 2021 Trends in Data, Machine Learning, and AI
Neil Thompson: “The Computational Limits of Deep Learning”
Nir Shavit: “The combination of the right software and commodity hardware will prove capable of handling most machine learning tasks”
Ameet Talwalkar: “Democratizing Machine Learning”
Andrew Feldman: “Specialized hardware for deep learning will unleash innovation”
Zhe Zhang: “How Technology Companies Are Using Ray”

[1] This post and episode are part of a collaboration between Gradient Flow and Deci. See our statement of editorial independence.

[Photo by Robynne Hu on Unsplash]

The Data Exchange Podcast: Yonatan Geifman and Ran El-Yaniv on the benefits that accrue from using an inference acceleration platform.

Share this: