Author
Aggour, Kareem Sherif
Other Contributors
Yener, Bülent, 1959-; Gittens, Alex; Carothers, Christopher D.; Subramaniyan, Arun K.;
Date Issued
2019-05
Subject
Computer science
Degree
PhD;
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.;
Abstract
We next developed two novel algorithms that employ online learning-based approaches to dynamically select the sketching rate and regularization parameters at runtime, further optimizing CP decompositions while simultaneously eliminating the burden of manual hyperparameter selection. This work is the first to intelligently choose the sketching rate and regularization parameters at each iteration of a CPD algorithm to balance the trade-off between minimizing the runtime and maximizing the decomposition accuracy. On both synthetic and real data, it was observed that for noisy tensors, our intelligent CPD algorithm produces decompositions of accuracy comparable to the exact distributed CPD-ALS algorithm in less time, often half the time. For ill-conditioned tensors, given the same time budget, the intelligent CPD algorithm produces decompositions with significantly lower relative error, often yielding an order of magnitude improvement.; Over the past two decades, there has been a dramatic increase in the volume and variety of data generated in almost every scientific discipline. To enable the efficient storage and processing of these massive datasets, a variety of fault tolerant, scalable distributed storage and processing platforms have been popularized---most famously, Hadoop MapReduce and Spark. Novel distributed algorithms are being developed to take full advantage of these platforms, including scalable variants of algorithms such as Canonical Polyadic Decomposition (CPD), an unsupervised learning technique frequently used in data mining and machine learning applications to discover latent factors in a class of multimodal datasets referred to as tensors. Current research in scalable CPD algorithms have focused almost exclusively on the analysis of large sparse tensors, however.; This research addresses the complementary need for efficient, scalable algorithms to decompose large dense tensors that arise in many signal processing and anomaly detection applications. To that end, we developed a progression of algorithms designed for MapReduce settings that incorporate combinations of regularization and sketching to efficiently operate on dense, skewed tensors. The first MapReduce CPD algorithm utilizes an Alternating Least Squares (ALS) strategy that is mathematically equivalent to the classical sequential CPD-ALS algorithm. A second algorithm was then developed that features regularization and sketching working in tandem to accelerate and stabilize tensor decompositions. Prior research had demonstrated the benefits of applying either regularization or sketching to CPD-ALS, but to our knowledge this work is the first to demonstrate the utility of using both together, outperforming the use of either technique alone. However, this algorithm requires the manual selection of the sketching and regularization hyperparameter values.;
Description
May 2019; School of Science
Department
Dept. of Computer Science;
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection;
Access
Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.;