Author
Shabbeer, Amina
Other Contributors
Bennett, Kristin P.; Yener, Bülent, 1959-; Mitchell, John E.; Magdon-Ismail, Malik;
Date Issued
2013-12
Subject
Computer science
Degree
PhD;
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.;
Abstract
The proposed edge-crossing constraints and iterative penalty algorithm can be readily adapted to other supervised and unsupervised optimization-based embedding or dimensionality reduction methods. The constraints can be generalized to remove intersections of general convex polygons including node-edge and node-node intersections. Host-pathogen maps that visualize relationships between MTBC strains and host groups are proposed as a testbed for minimizing intersections between nodes of arbitrary shape and size. MAA+ is applied to variations of the proximity preservation and edge-crossing minimization problem, to create low stress embeddings with no overlaps between nodes, edges and subgraphs.; Generating such a graph embedding involves addressing two challenges (i) preserve proximity relations as measured by some embedding objective, and (ii) simultaneous optimization of an aesthetic criterion, no edge-crossings in the embedding, to create a clear representation of the underlying graph structure. We propose a new approach to generating such an embedding that optimizes for multiple criteria. The method uses the theorems of the alternative to express the condition for no edge-crossings as a system of nonlinear inequality constraints. This approach has an intuitive geometric interpretation closely related to support vector machine classication. While edge crossing minimization can be utilized in conjunction with any optimization based embedding objective, here we demonstrate the approach on multidimensional scaling by modifying the stress majorization algorithm to include penalties for edge crossings. We use an alternating approach to solve this nonconvex problem, iteratively performing two steps: computing the layout for a given set of constraints, and altering the constraints based on the new embedding. We provide a detailed analysis of the convergence of this algorithm. Alternating Directions of Multiplier Methods (ADMM) are proposed as a method for efficiently handling a large number of non-smooth constraints. MAA+, an iterative solution using ADMM is described for generating graph embeddings that adhere to non-intersection constraints between edges. We create spoligoforests generated for all strains of TB observed in patients diagnosed with TB in the U.S. from 2006 to 2010. We also developed a standalone tool for drawing spoligoforests. The method is also demonstrated on a suite of randomly generated graphs with corresponding Euclidean distances that have planar embeddings with high stress.; In this thesis, we investigate nonconvex nonsmooth optimization problems that arise in bioinformatics and general visual data analytics tasks. This work is motivated by a need for tools that provide a view of the genetic diversity in the Mycobacterium tuberculosis complex (MTBC) population by extracting information from molecular epidemiological data. Such tools are crucial for the effective tracking and control of tuberculosis (TB). We survey classification tools that group MTBC strains into genetic families and visualization tools for molecular epidemiology of TB. We develop TB-Lineage a classification tool that employs Bayesian Networks and domain knowledge of signature patterns in DNA fingerprint data for MTBC major lineage classification. However, there is no consensus on MTBC lineage and sublineage definitions amongst experts from the perspective of both phylogenetic analysis and epidemiology. Understanding the evolutionary history and genetic relatedness of strains is essential to developing more accurate lineage definitions. We create spoligoforests as a tool for visualization of biogeographic diversity of MTBC that accurately represents both the underlying evolutionary relationships and the genetic distances between strains thus creating new insights on lineage definitions.;
Description
December 2013; School of Science
Department
Dept. of Computer Science;
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection;
Access
Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.;