Scalable cost-efficient techniques for machine learning via sketching
Loading...
Authors
Hu, Dong
Issue Date
2024-08
Type
Electronic thesis
Thesis
Thesis
Language
en_US
Keywords
Computer science
Alternative Title
Abstract
The ever-increasing size and complexity of modern datasets present significant challenges for machine learning theorists and experimentalists, as the computational resources and time required to process and analyze these datasets are immense. Randomized Linear Algebra (randNLA) techniques offer a promising solution, enabling a quantifiable trade-off between accuracy and computation time. Sketching and sampling methods, central to modern algorithm design, serve as powerful techniques for compressing high-dimensional datasets into lower-dimensional representations while preserving properties of interest. This dissertation aims to exploit and leverage the advantages of randNLA techniques, with a primary focus on sketching methods, in common machine learning models to address the challenges posed by large-scale, high-dimensional datasets. This thesis investigates the following areas: (1) designing sparse graph-based sketching methods for fast numerical linear algebra computations; (2) obtaining low-rank approximations for matrix completion using sketching techniques, including the proposal of a novel two-modality sampling NoisyCUR algorithm; (3) developing convergent and efficient tensor decomposition algorithms through adaptive sketching with fast yet reliable sketching dimension computation; and (4) proposing a unified convex optimization framework for obtaining an optimal sampling probability distribution in reduced label complexity settings for Kernel Ridge Regression (KRR) models.
Description
August2024
School of Science
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY