• Login
    View Item 
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Online (Complete)
    • View Item
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Online (Complete)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Nonconvex nonsmooth optimization for bioinformatics

    Author
    Shabbeer, Amina
    View/Open
    170927_Shabbeer_rpi_0185E_10215.pdf (52.80Mb)
    Other Contributors
    Bennett, Kristin P.; Yener, Bülent, 1959-; Mitchell, John E.; Magdon-Ismail, Malik;
    Date Issued
    2013-12
    Subject
    Computer science
    Degree
    PhD;
    Terms of Use
    This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.;
    Metadata
    Show full item record
    URI
    https://hdl.handle.net/20.500.13015/1057
    Abstract
    The proposed edge-crossing constraints and iterative penalty algorithm can be readily adapted to other supervised and unsupervised optimization-based embedding or dimensionality reduction methods. The constraints can be generalized to remove intersections of general convex polygons including node-edge and node-node intersections. Host-pathogen maps that visualize relationships between MTBC strains and host groups are proposed as a testbed for minimizing intersections between nodes of arbitrary shape and size. MAA+ is applied to variations of the proximity preservation and edge-crossing minimization problem, to create low stress embeddings with no overlaps between nodes, edges and subgraphs.; Generating such a graph embedding involves addressing two challenges (i) preserve proximity relations as measured by some embedding objective, and (ii) simultaneous optimization of an aesthetic criterion, no edge-crossings in the embedding, to create a clear representation of the underlying graph structure. We propose a new approach to generating such an embedding that optimizes for multiple criteria. The method uses the theorems of the alternative to express the condition for no edge-crossings as a system of nonlinear inequality constraints. This approach has an intuitive geometric interpretation closely related to support vector machine classication. While edge crossing minimization can be utilized in conjunction with any optimization based embedding objective, here we demonstrate the approach on multidimensional scaling by modifying the stress majorization algorithm to include penalties for edge crossings. We use an alternating approach to solve this nonconvex problem, iteratively performing two steps: computing the layout for a given set of constraints, and altering the constraints based on the new embedding. We provide a detailed analysis of the convergence of this algorithm. Alternating Directions of Multiplier Methods (ADMM) are proposed as a method for efficiently handling a large number of non-smooth constraints. MAA+, an iterative solution using ADMM is described for generating graph embeddings that adhere to non-intersection constraints between edges. We create spoligoforests generated for all strains of TB observed in patients diagnosed with TB in the U.S. from 2006 to 2010. We also developed a standalone tool for drawing spoligoforests. The method is also demonstrated on a suite of randomly generated graphs with corresponding Euclidean distances that have planar embeddings with high stress.; In this thesis, we investigate nonconvex nonsmooth optimization problems that arise in bioinformatics and general visual data analytics tasks. This work is motivated by a need for tools that provide a view of the genetic diversity in the Mycobacterium tuberculosis complex (MTBC) population by extracting information from molecular epidemiological data. Such tools are crucial for the effective tracking and control of tuberculosis (TB). We survey classification tools that group MTBC strains into genetic families and visualization tools for molecular epidemiology of TB. We develop TB-Lineage a classification tool that employs Bayesian Networks and domain knowledge of signature patterns in DNA fingerprint data for MTBC major lineage classification. However, there is no consensus on MTBC lineage and sublineage definitions amongst experts from the perspective of both phylogenetic analysis and epidemiology. Understanding the evolutionary history and genetic relatedness of strains is essential to developing more accurate lineage definitions. We create spoligoforests as a tool for visualization of biogeographic diversity of MTBC that accurately represents both the underlying evolutionary relationships and the genetic distances between strains thus creating new insights on lineage definitions.;
    Description
    December 2013; School of Science
    Department
    Dept. of Computer Science;
    Publisher
    Rensselaer Polytechnic Institute, Troy, NY
    Relationships
    Rensselaer Theses and Dissertations Online Collection;
    Access
    Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.;
    Collections
    • RPI Theses Online (Complete)

    Browse

    All of DSpace@RPICommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    DSpace software copyright © 2002-2022  DuraSpace
    Contact Us | Send Feedback
    DSpace Express is a service operated by 
    Atmire NV