• Login
    View Item 
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Online (Complete)
    • View Item
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Online (Complete)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Intelligent and scalable algorithms for canonical polyadic decomposition

    Author
    Aggour, Kareem Sherif
    View/Open
    179622_Aggour_rpi_0185E_11477.pdf (7.536Mb)
    Other Contributors
    Yener, Bülent, 1959-; Gittens, Alex; Carothers, Christopher D.; Subramaniyan, Arun K.;
    Date Issued
    2019-05
    Subject
    Computer science
    Degree
    PhD;
    Terms of Use
    This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.;
    Metadata
    Show full item record
    URI
    https://hdl.handle.net/20.500.13015/2385
    Abstract
    We next developed two novel algorithms that employ online learning-based approaches to dynamically select the sketching rate and regularization parameters at runtime, further optimizing CP decompositions while simultaneously eliminating the burden of manual hyperparameter selection. This work is the first to intelligently choose the sketching rate and regularization parameters at each iteration of a CPD algorithm to balance the trade-off between minimizing the runtime and maximizing the decomposition accuracy. On both synthetic and real data, it was observed that for noisy tensors, our intelligent CPD algorithm produces decompositions of accuracy comparable to the exact distributed CPD-ALS algorithm in less time, often half the time. For ill-conditioned tensors, given the same time budget, the intelligent CPD algorithm produces decompositions with significantly lower relative error, often yielding an order of magnitude improvement.; Over the past two decades, there has been a dramatic increase in the volume and variety of data generated in almost every scientific discipline. To enable the efficient storage and processing of these massive datasets, a variety of fault tolerant, scalable distributed storage and processing platforms have been popularized---most famously, Hadoop MapReduce and Spark. Novel distributed algorithms are being developed to take full advantage of these platforms, including scalable variants of algorithms such as Canonical Polyadic Decomposition (CPD), an unsupervised learning technique frequently used in data mining and machine learning applications to discover latent factors in a class of multimodal datasets referred to as tensors. Current research in scalable CPD algorithms have focused almost exclusively on the analysis of large sparse tensors, however.; This research addresses the complementary need for efficient, scalable algorithms to decompose large dense tensors that arise in many signal processing and anomaly detection applications. To that end, we developed a progression of algorithms designed for MapReduce settings that incorporate combinations of regularization and sketching to efficiently operate on dense, skewed tensors. The first MapReduce CPD algorithm utilizes an Alternating Least Squares (ALS) strategy that is mathematically equivalent to the classical sequential CPD-ALS algorithm. A second algorithm was then developed that features regularization and sketching working in tandem to accelerate and stabilize tensor decompositions. Prior research had demonstrated the benefits of applying either regularization or sketching to CPD-ALS, but to our knowledge this work is the first to demonstrate the utility of using both together, outperforming the use of either technique alone. However, this algorithm requires the manual selection of the sketching and regularization hyperparameter values.;
    Description
    May 2019; School of Science
    Department
    Dept. of Computer Science;
    Publisher
    Rensselaer Polytechnic Institute, Troy, NY
    Relationships
    Rensselaer Theses and Dissertations Online Collection;
    Access
    Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.;
    Collections
    • RPI Theses Online (Complete)

    Browse

    All of DSpace@RPICommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    DSpace software copyright © 2002-2022  DuraSpace
    Contact Us | Send Feedback
    DSpace Express is a service operated by 
    Atmire NV