Non-convex optimizations for machine learning with theoretical guarantee: robust matrix completion and neural network learning

ZHANG, shuai
Thumbnail Image
Other Contributors
Yazici, Birsen
Tajer, Ali
Mitchell, John
Wang, Meng
Issue Date
Electrical engineering
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute (RPI), Troy, NY. Copyright of original work retained by author.
Full Citation
Despite the recent development in machine learning, most learning systems are still under the concept of “black box”, where the performance cannot be understood and derived. With the rise of safety and privacy concerns in public, designing an explainable learning system has become a new trend in machine learning. In general, many machine learning problems are formulated as minimizing (or maximizing) some loss function. Since real data are most likely generated from non-linear models, the loss function is non-convex in general. Unlike the convex optimization problem, gradient descent algorithms will be trapped in spurious local minima in solving non-convex optimization. Therefore, it is challenging to provide explainable algorithms when studying non-convex optimization problems. In this thesis, two popular non-convex problems are studied: (1) low-rank matrix completion and (2) neural network learning. In low-rank matrix completion (MC), the objective is to recover a low-rank matrix from partial observations that may contain significant errors. MC problem is non-convex due to the natural constraint of low-rankness. However, the low-rank structure does not capture the temporal correlations in some time series, i.e., power system monitoring, magnetic resonance imaging, and array signal processing. As a result, low-rank MC cannot handle the whole column/row being fully lost. In this thesis, a new model, termed multichannel Hankel matrices, is proposed to characterize the intrinsic low-dimensional structures in some multichannel time series. By exploiting the new model in this thesis, several projected gradient-based algorithms are developed to solve the non-convex MC problems with fully lost/corrupted columns. In neural network learning, a reliable learned model requires a small generalization error, which simultaneously achieves a small training error and generalization gap. According to the classic generalization theories, bounded generalization requires a larger number of training samples than the model complexity. However, solving the optimization problems with such a number of samples is not guaranteed to find a local minimum with a small training error due to the high non-convexity of the objective functions. Therefore, studying the convergence to the global optimum when training neural networks is vital and challenging. This thesis provides the convergence analysis to the global optimum when the number of samples is larger than the model complexity for one-hidden-layer neural networks with Gaussian inputs. Also, the minimal required training samples to guarantee zero generalization error are presented for various neural network architectures. Nevertheless, there are cases where the training process is not accessible to adequate training samples due to the difficulty of generating reliable data. Therefore, this thesis further explores the methods with a limited number of training samples, focusing on network pruning and self-training algorithms. The motivation for studying self-training comes naturally from a semi-supervised framework, which leverages many unlabeled data to improve learning when the labeled data are limited. In contrast, the network pruning is mainly inspired by the recent Lottery Ticket Hypothesis (LTH), which claims that a good pruned network achieves a faster convergence rate and higher test accuracy than the original dense network.
School of Engineering
Dept. of Electrical, Computer, and Systems Engineering
Rensselaer Polytechnic Institute, Troy, NY
Rensselaer Theses and Dissertations Online Collection
Restricted to current Rensselaer faculty, staff and students in accordance with the Rensselaer Standard license. Access inquiries may be directed to the Rensselaer Libraries.