Deep Learning in the Sciences

The Data Exchange Podcast: Bart Ramsundar on DeepChem, MoleculeNet, and machine learning in the life sciences.

SubscribeApple • Android • Spotify • Stitcher • Google • RSS.

In this episode of the Data Exchange I speak Bharath (“Bart”) Ramsundar, author and open source developer. While in graduate school, Bart created DeepChem, an open source project that aims to democratize deep learning for science. DeepChem historically was developed for researchers in the life sciences, so the working examples in its tutorials draw from areas like chemistry and bioinformatics.

Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.

Researchers in other branches of science (e.g., physics and astronomy) have long embraced machine learning and big data management systems. In fact, I remember that during the early days of Hadoop and MPP databases, creators and vendors of big data systems approached research labs known to possess massive amounts of data. And some of the most popular open source projects in the Python data ecosystem (scikit-learn, numpy) came out of the scientific computing community.

I wanted to get the perspective of someone familiar with applications of ML and data management technologies in the life sciences. Bart and I discussed:

Bart described applications of DeepChem and MoleculeNet and highlighted two areas where these projects have been influential:

    ❛ The single biggest thing in DeepChem is we host the MoleculeNet benchmark suite, which is roughly like ImageNet for molecules. In DeepChem we built out infrastructure, so anyone who imports DeepChem can auto load a MoleculeNet dataset for benchmarking. This has had a pretty wide impact, I think a lot of the work in the machine learning for molecules field uses MoleculeNet to do benchmarking work.

    I think the second biggest impact we had was probably popularizing Graph Neural Networks in the molecular machine learning field. You know, when we started, I think we were the only real quality implementation of graph convolutions in open source. Recently, that’s begun to change with things like PyTorch Geometric and Deep Graph Library (DGL). So we’re also partnering with some of the DGL developers to increasingly swap over from our hand written kernels to their optimized kernels. But I think that’s another area where we had a good bit of impact.

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

Related content and resources:

Free Report

[Image: Molecular model of Penicillin by Dorothy Hodgkin from Wikimedia.]