Probabilistic models for phylogenetic classification of Mycobacterium tuberculosis complex genotyping data
Author
Blondin, JamesOther Contributors
Bennett, Kristin P.; Drineas, Petros; Yener, Bülent, 1959-;Date Issued
2013-05Subject
Computer scienceDegree
MS;Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.; Attribution-NonCommercial-NoDerivs 3.0 United StatesMetadata
Show full item recordAbstract
The model and algorithms presented in this thesis use spoligotype and MIRU data combined from multiple heterogeneous data sources labeled by different experts to provide a model that is able to classify MTBC isolates into a hierarchical phylogenetic structure. The model is trained on over 117064 isolate DNA fingerprints collected by the United States Centers for Disease Control and Prevention, the SITVITWEB database at Institut Pasteur de Guadeloupe, and the MIRU-VNTRplus collection of MTBC strains. The model achieves high classification accuracy, confirming many well-established lineages at all hierarchy levels, and provides visualizations of spoligotype and MIRU signatures for each lineage. In addition, the model discovers some inconsistencies in MTBC labels between data sources, and suggests possible resolutions of these inconsistencies. After further study and refinement, this approach will form the basis for a new tool for MTBC lineage identification freely available online.; This thesis presents a semi-supervised hierarchical Bayesian network to classify strains of Mycobacterium Tuberculosi complex (MTBC) into a three-tier set of genetic lineages and sublineages. MTBC is the causative agent of the infectious disease Tuberculosis (TB), which resulted in over 1.4 million deaths in 2011. Two main types of DNA fingerprinting techniques---spacer oligonucleotide typing (spoligotyping) and mycobacterial interspersed repetitive units (MIRUs)---are regularly used by public health officials and TB researchers to track and control TB.;Description
May 2013; School of ScienceDepartment
Dept. of Computer Science;Publisher
Rensselaer Polytechnic Institute, Troy, NYRelationships
Rensselaer Theses and Dissertations Online Collection;Access
CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.;Collections
Except where otherwise noted, this item's license is described as CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.