The Data Exchange Podcast: Bart Ramsundar on DeepChem, MoleculeNet, and machine learning in the life sciences.
In this episode of the Data Exchange I speak Bharath (“Bart”) Ramsundar, author and open source developer. While in graduate school, Bart created DeepChem, an open source project that aims to democratize deep learning for science. DeepChem historically was developed for researchers in the life sciences, so the working examples in its tutorials draw from areas like chemistry and bioinformatics.
Researchers in other branches of science (e.g., physics and astronomy) have long embraced machine learning and big data management systems. In fact, I remember that during the early days of Hadoop and MPP databases, creators and vendors of big data systems approached research labs known to possess massive amounts of data. And some of the most popular open source projects in the Python data ecosystem (scikit-learn, numpy) came out of the scientific computing community.
I wanted to get the perspective of someone familiar with applications of ML and data management technologies in the life sciences. Bart and I discussed:
- Deepchem and another open source project he co-founded, MoleculeNet benchmark suite to facilitate development of molecular algorithms. DeepChem version 2.4.0 just came out and we discussed some of the new features.
- The growth in applications of machine learning – and specifically deep learning – in the life sciences. There are the headline grabbing projects like AlphaFold, but there are many other less familiar examples.
- A book he co-authored with colleagues at Stanford – Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More.
- Creating and growing an open source software project.
Bart described applications of DeepChem and MoleculeNet and highlighted two areas where these projects have been influential:
- ❛ The single biggest thing in DeepChem is we host the MoleculeNet benchmark suite, which is roughly like ImageNet for molecules. In DeepChem we built out infrastructure, so anyone who imports DeepChem can auto load a MoleculeNet dataset for benchmarking. This has had a pretty wide impact, I think a lot of the work in the machine learning for molecules field uses MoleculeNet to do benchmarking work.
I think the second biggest impact we had was probably popularizing Graph Neural Networks in the molecular machine learning field. You know, when we started, I think we were the only real quality implementation of graph convolutions in open source. Recently, that’s begun to change with things like PyTorch Geometric and Deep Graph Library (DGL). So we’re also partnering with some of the DGL developers to increasingly swap over from our hand written kernels to their optimized kernels. But I think that’s another area where we had a good bit of impact.
Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.
Related content and resources:
- A video version of this conversation is available on our YouTube channel.
- Navigate the road to Responsible AI
- Key AI and Data Trends for 2021
- Neil Thompson: “The Computational Limits of Deep Learning”
- Rumman Chowdury: “The State of Responsible AI”
- Viral Shah: “A programming language for scientific machine learning and differentiable programming”
- Mayank Kejriwal: “Building and deploying knowledge graphs”
[Image: Molecular model of Penicillin by Dorothy Hodgkin from Wikimedia.]