Reinforcement Learning For the Win

The Data Exchange Podcast: Nicolas Hohn on challenges and best practices for using RL and machine learning in the enterprise.


SubscribeApple • Android • Spotify • Stitcher • Google • RSS.

This week’s guest is Nicolas (Nic) Hohn, Chief Data Scientist, McKinsey/QuantumBlack Australia. Nic led a team of data scientists charged with helping America’s Cup winning team,  Emirates Team New Zealand, test new designs for hydrofoils – important sailing boat components that could be modified based on rules set forth by race organizers. More precisely the QuantumBlack team used Ray RLlib to design an AI agent that could learn to sail the boat for a given design at an optimal speed, and this AI agent proved crucial during the design process.

Nic Hohn is part of an outstanding speaker lineup at the 2021 Ray Summit, a FREE virtual conference that brings together developers, machine learning practitioners, data scientists, DevOps, and cloud-native architects interested in building scalable data & AI applications.

As many listeners may be aware, I’ve recently written a series of posts describing applications of RL within companies (see: “Enterprise Applications of Reinforcement Learning: Recommenders and Simulation Modeling” and “Applications of Reinforcement Learning: recent examples from large U.S. companies”). Given Nic’s position working with a wide variety of companies, I asked him about how companies can incorporate RL into products and services:

    Nic Hohn: ❛ You need to have the right circumstances for RL to be the right answer, as opposed to trying to make RL the answer to any type of problem. In my experience, some of the key ingredients are highly complex optimizations usually involving a sequence of actions of a time. I think RL is also really good when you only learn the consequences of your actions even further in time. These are situations where the reward might not actually be instant, for instance. And so keeping this very broad lens, I think the examples … could be dynamic supply chains, or it could be next best action models are complex optimization of a telecommunication network, where your needs will vary depending on how the users are interfacing with the network.

    I think in all those situations, I think I would strongly advise against using RL as the starting point. But if you find that traditional techniques are limiting, it could be that you need a lot of heuristics in your optimizations to make it converge. Or it could be that you can only run it up to a certain scale, and then your optimization no longer works at all. There’s many reasons why you might want to go further. But you do need to do that hard work first. So that you have a clear idea of what you’re solving for, you have a good idea of your optimization function. Only then do you bring in the Deep RL.

    I think in terms of the barriers to entry, a lot of the challenges we see currently are really around getting access to a good simulation environment. RL right now only works if you have a simulation environment – you can call it the digital twin, you can call it something else, it doesn’t matter. You need some environment with which your agent can interface. Sometimes the challenge on the environment is that people think they need the highest fidelity simulator, otherwise they think it’s not worth it. I would argue against that, I think it really depends on the problem you’re solving. Sometimes all you want to know is whether you should be choosing Option A or Option B, because you have a whole bunch of other constraints. So in that case, maybe a simpler simulator is enough.

Download a complete transcript of this episode by filling out the form below:

Related content and resources:



Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.


[Image: America’s Cup World Series San Francisco by Phil Uhl on Wikimedia.]