• Login
    View Item 
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Online (Complete)
    • View Item
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Online (Complete)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Privacy preservation and evaluation in machine learning

    Author
    Pedersen, Joseph
    ORCID
    https://orcid.org/0000-0002-6223-9848
    View/Open
    Pedersen_rpi_0185E_12075.pdf (859.5Kb)
    Other Contributors
    Bennett, Kristin P.; Wallace, William A., 1935-; Mitchell, John E.; Guyon, Isabelle;
    Date Issued
    2022-08
    Subject
    Decision sciences and engineering systems
    Degree
    PhD;
    Terms of Use
    This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.;
    Metadata
    Show full item record
    URI
    https://hdl.handle.net/20.500.13015/6261
    Abstract
    This thesis aims at improving methods for preserving privacy in machine learning for both generative and predictive models, and evaluating the privacy lost. Creating useful machine learning models requires the use of training data, but comes with a privacy risk to the subjects whose records are used. First, we address the problem of making sensitive data that must stay in a secured environment available for education or research by replacing it with sufficiently resemblant synthetic data that can be used to train accurate predictive models. Next, we consider the scenario in which researchers want to release models trained on sensitive data without the models revealing with high confidence which data were used for training. Two approaches are investigated and connections are made between them: making models differentially private, and protecting against membership inference attacks. We derive tight, componentwise bounds for the loss of a Wasserstein GAN, as well as new bounds on the norm of the loss of the gradient penalty term, and use those in a novel algorithm for a differentially private WGAN-GP. We evaluate the performance of this algorithm by using it to synthesize three real medical datasets, and using those synthetic datasets to replicate published medical studies. We find that the algorithm suffered less mode collapse than the non-differentially private version. We also develop a framework for formally analyzing the worst case privacy attack scenario. We prove several lower bounds on the accuracy of an attacker in this framework on model trainers that overfit or are insufficiently random. We also prove that any sample learned from incurs a risk of privacy loss, and that under certain assumptions black-box attacks are optimal. From our theoretical analyses, we motivate a novel protection method which we demonstrate can be used to improve the privacy of already well-protected models and simultaneously increase their accuracy.;
    Description
    August 2022; School of Engineering
    Department
    Dept. of Industrial and Systems Engineering;
    Publisher
    Rensselaer Polytechnic Institute, Troy, NY
    Relationships
    Rensselaer Theses and Dissertations Online Collection;
    Access
    Restricted to current Rensselaer faculty, staff and students in accordance with the Rensselaer Standard license. Access inquiries may be directed to the Rensselaer Libraries.;
    Collections
    • RPI Theses Online (Complete)

    Browse

    All of DSpace@RPICommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    DSpace software copyright © 2002-2023  DuraSpace
    Contact Us | Send Feedback
    DSpace Express is a service operated by 
    Atmire NV