Accelerated stochastic gradient methods with adaptive techniques and distributed computing

Yan, Yonggui
Thumbnail Image
Other Contributors
Lai, Rongjie, R
Mitchell, John, J
Ji, Qiang, Q
Xu, Yangyang, Y
Issue Date
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.
Full Citation
Stochastic gradient methods (SGMs) have gained widespread usage for solving stochastic optimization problems due to their simplicity and efficient computation as first-order methods. However, vanilla SGMs suffer from slow convergence, prompting the development of many adaptive variants to expedite convergence. With the exponential growth of data, it has become increasingly challenging to process all the data on a single machine within a reasonable amount of time. To address this challenge, leveraging the power of multiple machines in parallel has become an affordable and effective solution to reduce computing time. Despite the popularity of adaptive techniques and parallelization for large-scale data processing, the analysis of adaptive SGMs and distributed methods is often restricted to problems with no constraints or easy-to-project constraints, convex problems, or nonconvex but smooth problems. Many applications are in uncovered forms and remain under-explored, such as Neyman-Pearson classification, fairness-constrained classification, phase retrieval, and sparsity-regularized deep learning. To address these challenges, my research aims to accelerate SGMs by using adaptive techniques in uncharted situations and modify the methods for distributed settings. I have proposed three methods. The first method solves expectation-constrained convex stochastic programs by an accelerated primal-dual SGM. The second method tackles nonconvex (and possibly nonsmooth) programs by an accelerated SGM in a centralized distributed system where derivatives are computed on stale variables. The third method focuses on solving nonconvex stochastic composite problems in decentralized distributed systems with heterogeneous data distributions. These methods have the potential to expand the application of SGMs to a wider range of challenging problems and save the computation time for large-scale datasets.
School of Science
Dept. of Mathematical Sciences
Rensselaer Polytechnic Institute, Troy, NY
Rensselaer Theses and Dissertations Online Collection
Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 license. No commercial use or derivatives are permitted without the explicit approval of the author.