Show simple item record

dc.rights.licenseCC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.
dc.contributorCarothers, Christopher D.
dc.contributorDrineas, Petros
dc.contributorZaki, Mohammed J., 1971-
dc.contributorShephard, Mark S.
dc.contributorMitchell, John E.
dc.contributor.authorIyer, Chander Jayaraman
dc.date.accessioned2021-11-03T08:59:17Z
dc.date.available2021-11-03T08:59:17Z
dc.date.created2018-07-27T14:56:12Z
dc.date.issued2018-05
dc.identifier.urihttps://hdl.handle.net/20.500.13015/2169
dc.descriptionMay 2018
dc.descriptionSchool of Science
dc.description.abstractModern technological advancements and innovation has led to an explosive growth of data in various domains, ranging from physics and biological sciences to economics and social sciences. Research on mathematical libraries has been on the leading edge of the high-performance computing(HPC) community's effort to address the long imposing set of challenges posed by Big Data. Primary among those challenges are the need for asynchronous communication and bridging the gap between computing power and network bandwidth. This has led to the advent of randomization in math libraries to develop scalable algorithms for large-scale linear algebra problems. In this dissertation, we focus on the design, implementation and analysis of randomized algorithms for scalable mining of terabyte sized matrices and above. We focus on three fundamental problems that are pervasive throughout large-scale data analytics where randomized NLA algorithms have shown significant impact over state-of-the-art approaches: least-squares regression, low-rank approximation and kernel ridge regression.
dc.description.abstractIn part III of the dissertation, we explore the behavior of large-scale kernel approximations using the Nystrom approach to solve the kernel ridge regression (KRR) problem. We demonstrate the scalability of one such Nystrom approximation approach based on the FALKON algorithm and contrast it with other state-of-the-art approaches for the KRR problem.
dc.description.abstractIn part II of the dissertation, we explore the behavior of randomized block iterative solvers to compute low rank matrix approximations for dense terabyte sized matrices. We are particularly interested in the behavior of randomized block iterative solvers on matrices with clustered singular values. We analyze the scalability and numerical stability of our block iterative solvers and demonstrate the performance of these randomized solvers for varying spectral gaps. Experiments with real-world large-scale datasets showed high quality approximations for the kernel PCA problem while achieving significant speedups over state-of-the-art direct solvers.
dc.description.abstractThis dissertation is divided into three parts. In part I, we explore the behavior of randomized matrix algorithms based on the Blendenpik algorithm in a distributed memory setting. We show that a variant of the algorithm that uses a batchwise transformation leads to an implementation that is not only faster than state-of-the-art implementations of baseline least-squares solvers, but is also able to scale to much larger matrix sizes. In particular, we show that a Blendenpik based algorithm can solve least-squares regression problems for dense terabyte-sized (and larger) input matrices as well as sparse ill-conditioned matrices that outperform state-of-the-art least-squares solvers in terms of performance and scalability while demonstrating comparable numerical stability in terms of established accuracy metrics.
dc.language.isoENG
dc.publisherRensselaer Polytechnic Institute, Troy, NY
dc.relation.ispartofRensselaer Theses and Dissertations Online Collection
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
dc.subjectComputer science
dc.titleRandomized algorithms for mining massive matrices: design & implementation at terascale and beyond
dc.typeElectronic thesis
dc.typeThesis
dc.digitool.pid178919
dc.digitool.pid178920
dc.digitool.pid178921
dc.rights.holderThis electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
dc.description.degreePhD
dc.relation.departmentDept. of Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.
Except where otherwise noted, this item's license is described as CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.