2017
DOI: 10.1101/159756
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Privacy-preserving generative deep neural networks support clinical data sharing

Abstract: Abstract:Though it is widely recognized that data sharing enables faster scientific progress, the sensible need to protect participant privacy hampers this practice in medicine. We train deep neural networks that generate synthetic participants closely resembling study participants. Using the SPRINT trial as an example, we show that machine-learning models built from simulated participants generalize to the original dataset. We incorporate differential privacy, which offers strong guarantees on the likelihood … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
95
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 84 publications
(95 citation statements)
references
References 25 publications
0
95
0
Order By: Relevance
“…For example, to protect health records, synthetic medical datasets can be published instead of the real ones using generative models training on sensitive real-world medical datasets [3,6]. To provide a formal privacy guarantee, [2] trains GANs under the constraint of differential privacy [5] to protect against common privacy attacks. Although the architecture of our proposed framework looks similar to GANs, there are key structural and logical differences with other existing frameworks.…”
Section: Related Work and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, to protect health records, synthetic medical datasets can be published instead of the real ones using generative models training on sensitive real-world medical datasets [3,6]. To provide a formal privacy guarantee, [2] trains GANs under the constraint of differential privacy [5] to protect against common privacy attacks. Although the architecture of our proposed framework looks similar to GANs, there are key structural and logical differences with other existing frameworks.…”
Section: Related Work and Discussionmentioning
confidence: 99%
“…Unlike privacy-preserving works that only hide users' identity by sharing population data using generative models for data synthesis [2,9], our solution concerns sensitive information included in a single user's data. There are, however, some methods which transform only selected temporal sections of sensor data that correspond to predefined sensitive activities [11,12], our framework enables concurrently eliminating private information from each section of data, while keeping the utility of shared data.…”
Section: Introductionmentioning
confidence: 99%
“…A possible short-term solution is to use generative adversarial networks with differential privacy to generate synthetic data with the statistical properties of real health-care data; however, models trained on synthetic data might not be as accurate as those trained on clinical data. 5 A long-term solution to data challenges must focus on generating high-quality, deidentified data primarily for research purposes. Efforts such as the US National Institute of Health's All of Us Research Program and the UK Biobank have generated databases that are accessible to researchers globally.…”
Section: Practical Guidance On Artificial Intelligence For Health-carmentioning
confidence: 99%
“…Finally, we experiment with a transit dataset, which we denote as TRANSIT in the rest of the paper. 5 Due to nondisclosure agreement, we are unable to provide specific details about the dataset, however, we can report that the TRANSIT dataset include the transit history of passengers in the network (with |D| = 1, 200, 000); here, I represents the set of m = 342 stations in a public transportation network.…”
Section: Methodsmentioning
confidence: 99%
“…For VAE, the number of hidden units is set to 200 with single layer encoder and decoder, and a bi-dimensional latent space. We also used the rectifier 5 Note that experiments using this dataset do not appear in the ICDM'17 version of the paper. activation function (ReLu) for all neurons and the Adam optimizer [27].…”
Section: Methodsmentioning
confidence: 99%