Support vector machine-based algorithms on cytochrome P450 (CYP) isoform specificity study
Loading...
Authors
Zhen, Zhuo
Issue Date
2014-12
Type
Electronic thesis
Thesis
Thesis
Language
ENG
Keywords
Computer science
Alternative Title
Abstract
Specifically, knowing what CYP isoforms are responsible for metabolizing a small molecule is vital in avoiding drug toxicity and increasing the drug efficiency. Therefore, a large quantity of in silico models has been constructed for predicting the responsible CYPs, which is known as CYP specificity study. Different machine learning algorithms have been used, and it is discovered that Support Vector Machine (SVM) algorithm shows it superior due to its low sensitivity to data over-fitting. In addition, the theory and objective function of SVM algorithm is easy to be understood and to be modified for further development. One problem in CYP specificity study is that the correlation among different CYP isoforms is always over-looked. To address the over-looking of the correlation among CYPs, the multi-task classification concept was introduced in this thesis. Utilizing the correlation among CYPs is believed being able to improve the model prediction performance and to provide extra information on the relationship among different CYP isoforms. Therefore, an SVM-based multi-task classification algorithm (MT-SVM) was implemented. The MT-SVM algorithm and its modified version proved the importance of correlation information among CYPs in the model training process. The prediction accuracies were increased comparing to the standard SVM. The standard SVM and MT-SVM are all supervised learning algorithms, and the elimination of unlabeled data points may cause the training set to be less representative. To solve the problem, a deterministic annealing based semi-supervised SVM (S3VM) was implemented and applied to the CYP specificity study. The modeling results showed that S3VM is able to fix the over-training problem and increase the prediction accuracy when the size of the labeled training data set is limited.
Description
December 2014
School of Science
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY