Font Size: a A A

Prediction Of HIV-1 Protease Cleavage Site And Classification Of Membrane Proteins

Posted on:2017-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:2310330512469709Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
HIV-1 protease inhibitors is an important measure of the clinical treatment of AIDS, confirm the correct HIV-1 protease cleavage site in protein molecules will refine our understanding of its specificity, which is an important task for designing efficient HIV inhibitors. In this article, we collected a HIV-1 protease cleavage site dataset contained 5830 samples, and characterized each residue of the protein sequence with 531 amino acid's physicochemical properties, thus each sample had 4248 initial features, which must include lots of irrelevant features and redundant features, that meant it needed feature selection. Minimal Redundancy Maximal Relevance (mRMR) is a feature selection method which considered the redundant between each feature when selected relevant features, but when dependent variable is binary-classification and features continuous variable, the relevance measure t-score and redundancy measure |R| are not comparable, besides, it needs a cross-validation based on training set to determine the termination conditions. In this article, we developed a new feature selection method dCor-share based on distance correlation (dCor) and relevance share, dCor-share can terminate feature introduction automatically, which achieved the relevance measure and the redundancy measure be compared under the nonlinear condition. A average value of 5 times independent prediction based on Support Vector Classification (SVC) and remained features that selected by dCor-share showed, dCor-share used the least features and had a better prediction performance than other models and existing literature related. Analyzed the 33 common features in five repeated experiments, we found that three residue sites P1,P1', P2'were the most important residue sites, but the others were also important that cannot be ignored, and an amino acid sequence can or cannot by cleaved by HIV-1 protease was mostly affected by alpha and turn propensities and hydrophobicity.Membrane protein, with unique structures, complex categories and special functions, is known to be an important drug target. Prediction of membrane protein type is helpful in understanding the functional characteristics of membrane proteins. In this article, we used a membrane protein dataset which contained 7582 samples divided into 8 categories, and used Hierarchical Clustering (HC) to transfer this multiple-classification into several binary-classifications. Using Principal Component Analysis (PCA) to analysis the 531 physic and chemical properties of amino acid, and applied with geostatistics to extract association features under the step of 25, which solved the problem of unequal length of sequences, after this, we got 475 initial features for each sequence. We used dCor-share for feature selection and SVC for prediction. The result showed, the prediction of membrane protein based on dCor-share and Hierarchical Clustering can improve the prediction precision effectively, which was better than reference models and literature reported.DCor-share has a great application prospect in the field of high dimension feature selection for classification problems.
Keywords/Search Tags:Feature selection, HIV-1 protease, Membrane protein, Redundancy share, distance correlation, Hierarchical classification
PDF Full Text Request
Related items