Probabilistic models for phylogenetic classification of Mycobacterium tuberculosis complex genotyping data

Blondin, James
Thumbnail Image
Other Contributors
Bennett, Kristin P.
Drineas, Petros
Yener, Bülent, 1959-
Issue Date
Computer science
Terms of Use
Attribution-NonCommercial-NoDerivs 3.0 United States
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
Full Citation
The model and algorithms presented in this thesis use spoligotype and MIRU data combined from multiple heterogeneous data sources labeled by different experts to provide a model that is able to classify MTBC isolates into a hierarchical phylogenetic structure. The model is trained on over 117064 isolate DNA fingerprints collected by the United States Centers for Disease Control and Prevention, the SITVITWEB database at Institut Pasteur de Guadeloupe, and the MIRU-VNTRplus collection of MTBC strains. The model achieves high classification accuracy, confirming many well-established lineages at all hierarchy levels, and provides visualizations of spoligotype and MIRU signatures for each lineage. In addition, the model discovers some inconsistencies in MTBC labels between data sources, and suggests possible resolutions of these inconsistencies. After further study and refinement, this approach will form the basis for a new tool for MTBC lineage identification freely available online.
This thesis presents a semi-supervised hierarchical Bayesian network to classify strains of Mycobacterium Tuberculosi complex (MTBC) into a three-tier set of genetic lineages and sublineages. MTBC is the causative agent of the infectious disease Tuberculosis (TB), which resulted in over 1.4 million deaths in 2011. Two main types of DNA fingerprinting techniques---spacer oligonucleotide typing (spoligotyping) and mycobacterial interspersed repetitive units (MIRUs)---are regularly used by public health officials and TB researchers to track and control TB.
May 2013
School of Science
Dept. of Computer Science
Rensselaer Polytechnic Institute, Troy, NY
Rensselaer Theses and Dissertations Online Collection
CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.