Privacy preservation and evaluation in machine learning
Loading...
Authors
Pedersen, Joseph
Issue Date
2022-08
Type
Electronic thesis
Thesis
Thesis
Language
en_US
Keywords
Decision sciences and engineering systems
Alternative Title
Abstract
This thesis aims at improving methods for preserving privacy in machine learning for both generative and predictive models, and evaluating the privacy lost. Creating useful machine learning models requires the use of training data, but comes with a privacy risk to the subjects whose records are used. First, we address the problem of making sensitive data that must stay in a secured environment available for education or research by replacing it with sufficiently resemblant synthetic data that can be used to train accurate predictive models. Next, we consider the scenario in which researchers want to release models trained on sensitive data without the models revealing with high confidence which data were used for training. Two approaches are investigated and connections are made between them: making models differentially private, and protecting against membership inference attacks. We derive tight, componentwise bounds for the loss of a Wasserstein GAN, as well as new bounds on the norm of the loss of the gradient penalty term, and use those in a novel algorithm for a differentially private WGAN-GP. We evaluate the performance of this algorithm by using it to synthesize three real medical datasets, and using those synthetic datasets to replicate published medical studies. We find that the algorithm suffered less mode collapse than the non-differentially private version. We also develop a framework for formally analyzing the worst case privacy attack scenario. We prove several lower bounds on the accuracy of an attacker in this framework on model trainers that overfit or are insufficiently random. We also prove that any sample learned from incurs a risk of privacy loss, and that under certain assumptions black-box attacks are optimal. From our theoretical analyses, we motivate a novel protection method which we demonstrate can be used to improve the privacy of already well-protected models and simultaneously increase their accuracy.
Description
August 2022
School of Engineering
School of Engineering
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY