Generating high-fidelity and privacy-preserving synthetic data

Jinsung Yoon and Sercan Arik on a new, state-of-the-art neural architecture that is capable of representing diverse data modalities.

Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.

Jinsung Yoon (Senior Research Scientist) and Sercan Arik (Staff Research Scientist and Manager) are part of the Google team behind EHR-Safe, a set of tools for generating highly realistic and privacy-preserving synthetic Electronic Health Records.

Jinsung Yoon and Sercan Arik will be delivering a keynote at the Healthcare NLP Summit, a FREE online conference and the biggest gathering of NLP practitioners.

Anonymizing data with conventional methods can be a tedious and expensive process. The use of synthetic data opens up new possibilities for data sharing. Two properties are essential for synthetic data to be useful:

The synthesized data are of high fidelity (e.g., they give similar downstream performances when a diagnostic model is trained on them).
Synthetic data meets certain privacy measures (i.e. the synthesized data do not reveal a patient’s identity).

In this episode we discuss their new, state-of-the-art neural architecture that is capable of representing diverse data modalities while maintaining data privacy. While EHR-Safe targets a very specific domain and data artifact (Electronic Health Records), we explore possible extensions to structured data in other domains (e.g., financial services), as well extensions to different data types (visual data and text).

FREE Online conference: April 4-5

Interview highlights – key sections from the video version:

Related content:

A video version of this conversation is available on our YouTube channel.
Sercan Arik: Neural Models for Tabular Data
FREE Report: 2023 Trends in Data, Machine Learning, and AI
Yashar Behzadi: Synthetic data technologies can enable more capable and ethical AI
Parisa Rashidi: Machine Learning in Healthcare
Gabriela Zanfir-Fortuna and Andrew Burt: Preparing for the Implementation of the EU AI Act and Other AI Regulations
Peter Norvig and Alfred Spector: Data Science and AI in Context
Jian Pei: Pricing Data Products

If you enjoyed this episode, please support our work by encouraging your friends and colleagues to subscribe to our newsletter:

Jinsung Yoon and Sercan Arik on a new, state-of-the-art neural architecture that is capable of representing diverse data modalities.

FREE Online conference: April 4-5

Share this:

Like this:

Discover more from The Data Exchange