Stochastic first order methods for distributed composite optimization with differential privacy

Thumbnail Image
Sutcher-Shepard, Colin
Issue Date
Electronic thesis
Applied mathematics
Research Projects
Organizational Units
Journal Issue
Alternative Title
ABSTRACTDistributed optimization has gained much attention over recent years as the world generates ever more data. At the same time machine learning methods are able to solve many new problems. As the desire to train models on large data sets becomes of great concern to more fields, it is more important than ever to have fast and secure methods to solve distributed optimization problems. In this work we address three fundamental concerns of distributed optimization. In Chapter 1, we provide motivation and an overview of recent works on distributedoptimization. In Chapter 2, we introduce the Async-Parallel Adaptive stochastic gradient Method (APAM) algorithm. The APAM method is designed to solve non-convex, smooth optimization problems. This method allows the use of asynchronous updates to the model during training, thus avoiding delays caused by lagging compute nodes. We first give a brief overview of important works on adaptive stochastic gradient methods, then prove convergence of APAM in both convex and non-convex settings. Also, empirical experiments are given to show improvement of APAM on practical problems. In Chapter 3, we discuss the Federated Learning framework for training a model in common among several disparate parties. In this setting different clients hold their own data privately and only model parameters, rather than data or gradients, can be communicated. We introduce the Federated Proximal Gradient Method (FedPGM) algorithm for solving non-convex and non-differentiable problems in the Federated setting. We prove convergence of FedPGM for both full and partial client participation cases. In Chapter 4, we review key ideas of Differential Privacy, a popular method for ensuring privacy while training a model. Differential Privacy gives rigorous guarantees of privacy for those individuals included in the training set when the trained model will be released publicly. We give a new analysis of the Gaussian Mechanism, a popular method in the machine learning community. Our analysis shows that one can use dynamic noise schedules in training, rather than a fixed level of additive noise in each iteration. We also show the effectiveness of our technique with experiments on several data sets. Finally we present our concluding remarks in Chapter 5, where we give a brief summaryof the contents of this dissertation.
School of Science
Full Citation
Rensselaer Polytechnic Institute, Troy, NY
Terms of Use
PubMed ID