Probabilistic models for phylogenetic classification of Mycobacterium tuberculosis complex genotyping data

dc.contributorBennett, Kristin P.
dc.contributorDrineas, Petros
dc.contributorYener, Bülent, 1959-
dc.contributor.authorBlondin, James
dc.date.accessioned2021-11-03T07:58:02Z
dc.date.available2021-11-03T07:58:02Z
dc.date.created2013-09-09T14:12:15Z
dc.date.issued2013-05
dc.descriptionMay 2013
dc.descriptionSchool of Science
dc.description.abstractThe model and algorithms presented in this thesis use spoligotype and MIRU data combined from multiple heterogeneous data sources labeled by different experts to provide a model that is able to classify MTBC isolates into a hierarchical phylogenetic structure. The model is trained on over 117064 isolate DNA fingerprints collected by the United States Centers for Disease Control and Prevention, the SITVITWEB database at Institut Pasteur de Guadeloupe, and the MIRU-VNTRplus collection of MTBC strains. The model achieves high classification accuracy, confirming many well-established lineages at all hierarchy levels, and provides visualizations of spoligotype and MIRU signatures for each lineage. In addition, the model discovers some inconsistencies in MTBC labels between data sources, and suggests possible resolutions of these inconsistencies. After further study and refinement, this approach will form the basis for a new tool for MTBC lineage identification freely available online.
dc.description.abstractThis thesis presents a semi-supervised hierarchical Bayesian network to classify strains of Mycobacterium Tuberculosi complex (MTBC) into a three-tier set of genetic lineages and sublineages. MTBC is the causative agent of the infectious disease Tuberculosis (TB), which resulted in over 1.4 million deaths in 2011. Two main types of DNA fingerprinting techniques---spacer oligonucleotide typing (spoligotyping) and mycobacterial interspersed repetitive units (MIRUs)---are regularly used by public health officials and TB researchers to track and control TB.
dc.description.degreeMS
dc.digitool.pid167028
dc.digitool.pid167029
dc.digitool.pid167030
dc.identifier.urihttps://hdl.handle.net/20.500.13015/838
dc.language.isoENG
dc.publisherRensselaer Polytechnic Institute, Troy, NY
dc.relation.departmentDept. of Computer Science
dc.relation.ispartofRensselaer Theses and Dissertations Online Collection
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United States*
dc.rights.holderThis electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
dc.rights.licenseCC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
dc.subjectComputer science
dc.titleProbabilistic models for phylogenetic classification of Mycobacterium tuberculosis complex genotyping data
dc.typeElectronic thesis
dc.typeThesis
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
167029_Blondin_rpi_0185N_10088.pdf
Size:
1.28 MB
Format:
Adobe Portable Document Format