Support vector machine-based algorithms on cytochrome P450 (CYP) isoform specificity study

Zhen, Zhuo
Thumbnail Image
Other Contributors
Bennett, Kristin P.
Magdon-Ismail, Malik
Yener, Bülent, 1959-
Breneman, Curt M.
Issue Date
Computer science
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
Full Citation
Specifically, knowing what CYP isoforms are responsible for metabolizing a small molecule is vital in avoiding drug toxicity and increasing the drug efficiency. Therefore, a large quantity of in silico models has been constructed for predicting the responsible CYPs, which is known as CYP specificity study. Different machine learning algorithms have been used, and it is discovered that Support Vector Machine (SVM) algorithm shows it superior due to its low sensitivity to data over-fitting. In addition, the theory and objective function of SVM algorithm is easy to be understood and to be modified for further development. One problem in CYP specificity study is that the correlation among different CYP isoforms is always over-looked. To address the over-looking of the correlation among CYPs, the multi-task classification concept was introduced in this thesis. Utilizing the correlation among CYPs is believed being able to improve the model prediction performance and to provide extra information on the relationship among different CYP isoforms. Therefore, an SVM-based multi-task classification algorithm (MT-SVM) was implemented. The MT-SVM algorithm and its modified version proved the importance of correlation information among CYPs in the model training process. The prediction accuracies were increased comparing to the standard SVM. The standard SVM and MT-SVM are all supervised learning algorithms, and the elimination of unlabeled data points may cause the training set to be less representative. To solve the problem, a deterministic annealing based semi-supervised SVM (S3VM) was implemented and applied to the CYP specificity study. The modeling results showed that S3VM is able to fix the over-training problem and increase the prediction accuracy when the size of the labeled training data set is limited.
December 2014
School of Science
Dept. of Computer Science
Rensselaer Polytechnic Institute, Troy, NY
Rensselaer Theses and Dissertations Online Collection
Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.