Probabilistic models for phylogenetic classification of Mycobacterium tuberculosis complex genotyping data
Loading...
Authors
Blondin, James
Issue Date
2013-05
Type
Electronic thesis
Thesis
Thesis
Language
ENG
Keywords
Computer science
Alternative Title
Abstract
The model and algorithms presented in this thesis use spoligotype and MIRU data combined from multiple heterogeneous data sources labeled by different experts to provide a model that is able to classify MTBC isolates into a hierarchical phylogenetic structure. The model is trained on over 117064 isolate DNA fingerprints collected by the United States Centers for Disease Control and Prevention, the SITVITWEB database at Institut Pasteur de Guadeloupe, and the MIRU-VNTRplus collection of MTBC strains. The model achieves high classification accuracy, confirming many well-established lineages at all hierarchy levels, and provides visualizations of spoligotype and MIRU signatures for each lineage. In addition, the model discovers some inconsistencies in MTBC labels between data sources, and suggests possible resolutions of these inconsistencies. After further study and refinement, this approach will form the basis for a new tool for MTBC lineage identification freely available online.
This thesis presents a semi-supervised hierarchical Bayesian network to classify strains of Mycobacterium Tuberculosi complex (MTBC) into a three-tier set of genetic lineages and sublineages. MTBC is the causative agent of the infectious disease Tuberculosis (TB), which resulted in over 1.4 million deaths in 2011. Two main types of DNA fingerprinting techniques---spacer oligonucleotide typing (spoligotyping) and mycobacterial interspersed repetitive units (MIRUs)---are regularly used by public health officials and TB researchers to track and control TB.
This thesis presents a semi-supervised hierarchical Bayesian network to classify strains of Mycobacterium Tuberculosi complex (MTBC) into a three-tier set of genetic lineages and sublineages. MTBC is the causative agent of the infectious disease Tuberculosis (TB), which resulted in over 1.4 million deaths in 2011. Two main types of DNA fingerprinting techniques---spacer oligonucleotide typing (spoligotyping) and mycobacterial interspersed repetitive units (MIRUs)---are regularly used by public health officials and TB researchers to track and control TB.
Description
May 2013
School of Science
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY