Stochastic first order methods for distributed composite optimization with differential privacy

Sutcher-Shepard, Colin
Thumbnail Image
Other Contributors
Chakraborty, Supriyo
Bennett, Kristin
Lai, Rongjie
Xu, Yangyang
Issue Date
Applied mathematics
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.
Full Citation
ABSTRACTDistributed optimization has gained much attention over recent years as the world generates ever more data. At the same time machine learning methods are able to solve many new problems. As the desire to train models on large data sets becomes of great concern to more fields, it is more important than ever to have fast and secure methods to solve distributed optimization problems. In this work we address three fundamental concerns of distributed optimization. In Chapter 1, we provide motivation and an overview of recent works on distributedoptimization. In Chapter 2, we introduce the Async-Parallel Adaptive stochastic gradient Method (APAM) algorithm. The APAM method is designed to solve non-convex, smooth optimization problems. This method allows the use of asynchronous updates to the model during training, thus avoiding delays caused by lagging compute nodes. We first give a brief overview of important works on adaptive stochastic gradient methods, then prove convergence of APAM in both convex and non-convex settings. Also, empirical experiments are given to show improvement of APAM on practical problems. In Chapter 3, we discuss the Federated Learning framework for training a model in common among several disparate parties. In this setting different clients hold their own data privately and only model parameters, rather than data or gradients, can be communicated. We introduce the Federated Proximal Gradient Method (FedPGM) algorithm for solving non-convex and non-differentiable problems in the Federated setting. We prove convergence of FedPGM for both full and partial client participation cases. In Chapter 4, we review key ideas of Differential Privacy, a popular method for ensuring privacy while training a model. Differential Privacy gives rigorous guarantees of privacy for those individuals included in the training set when the trained model will be released publicly. We give a new analysis of the Gaussian Mechanism, a popular method in the machine learning community. Our analysis shows that one can use dynamic noise schedules in training, rather than a fixed level of additive noise in each iteration. We also show the effectiveness of our technique with experiments on several data sets. Finally we present our concluding remarks in Chapter 5, where we give a brief summaryof the contents of this dissertation.
School of Science
Dept. of Mathematical Sciences
Rensselaer Polytechnic Institute, Troy, NY
Rensselaer Theses and Dissertations Online Collection
Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 license. No commercial use or derivatives are permitted without the explicit approval of the author.