Jinsung Yoon and Sercan Arik on a new, state-of-the-art neural architecture that is capable of representing diverse data modalities.
Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.
Jinsung Yoon (Senior Research Scientist) and Sercan Arik (Staff Research Scientist and Manager) are part of the Google team behind EHR-Safe, a set of tools for generating highly realistic and privacy-preserving synthetic Electronic Health Records.
Anonymizing data with conventional methods can be a tedious and expensive process. The use of synthetic data opens up new possibilities for data sharing. Two properties are essential for synthetic data to be useful:
- The synthesized data are of high fidelity (e.g., they give similar downstream performances when a diagnostic model is trained on them).
- Synthetic data meets certain privacy measures (i.e. the synthesized data do not reveal a patient’s identity).
In this episode we discuss their new, state-of-the-art neural architecture that is capable of representing diverse data modalities while maintaining data privacy. While EHR-Safe targets a very specific domain and data artifact (Electronic Health Records), we explore possible extensions to structured data in other domains (e.g., financial services), as well extensions to different data types (visual data and text).
Interview highlights – key sections from the video version:
- Electronic Health Records, and data formats covered by EHR-Safe
- What problems and challenges does EHR-Safe address?
- Implementation details
- Stationarity assumptions
- Data augmentation
- What developments in machine learning led to EHR-Safe?
- Evaluating synthetic data
- Limitations of synthetic data
- Synthetic data in the context of other tools for confidential computing
- Synthetic data for text (doctor’s notes) and medical images
- How the healtcare community has reacted to EHR-Safe
- How tools developed for EHR-Safe may translate to other domains beyond healthcare
- An update from Sercan on the use of deep learning for tabular data problems
- Trends to watch in synthetic data generation
Related content:
- A video version of this conversation is available on our YouTube channel.
- Sercan Arik: Neural Models for Tabular Data
- FREE Report: 2023 Trends in Data, Machine Learning, and AI
- Yashar Behzadi: Synthetic data technologies can enable more capable and ethical AI
- Parisa Rashidi: Machine Learning in Healthcare
- Gabriela Zanfir-Fortuna and Andrew Burt: Preparing for the Implementation of the EU AI Act and Other AI Regulations
- Peter Norvig and Alfred Spector: Data Science and AI in Context
- Jian Pei: Pricing Data Products
If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter: