Font Size: a A A

Research On Protein Metal And Radical Ion-Binding Sites Prediction By Sequence Information

Posted on:2020-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2370330596970882Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
After the completion of the Human Genome Project,Proteomics which focuses on the study of protein interactions and recognition.Among them,the docking method is one of the main contents for studying the interaction and recognition between proteins.It can provide important theoretical reference value for the study of protein sequence-structure-function relationship in cells,research on protein interaction,protein complex prediction and computational aid drug design.There are already well-established methods for predicting protein-protein and protein-ligand ion binding sites using protein sequence-based methods.In particular,as the measurement accuracy of protein structures continues to increase,researchers can obtain higher-precision three-dimensional structures of proteins.Then,an accurate protein binding site prediction model is constructed.Feature extraction and selection is an important step in the successful representation of features,and is also a key component of the subsequent effective model.In order to express protein sequence information more effectively,this paper studies the methods of protein metal ion and free radical ion information representation from both feature extraction and feature selection.In this paper,14 kinds of features such as PSSM scoring matrix,secondary structure,amino acid composition,CKSAAP structural information,solvent accessible surface area,positive and negative charge,etc.of protein sequence were extracted,and then all features were combined by series to obtain a high-dimensional,sparse matrix to express characteristic information.Then a weighted feature selection method(WFS)proposed to select features.Redundant and uncorrelated features deleted by feature selection to further reduce the dimension and the running time.Because the number of samples in the dataset processed in this paper is very different,in order to maximize the quality of feature selection,the paper uses the chi-square test feature selection,sfm feature selection,random forest feature selection,and WFS feature selection.Data sets use dynamic selection strategies.Experiments show that the strategy has a positive effect.After extracting features,in order to construct a more effective prediction model,this paper proposes a multi-classifier dynamic selection integration model based on Tscore score and classifier inconsistency metric score.The model include two steps,single classifier sorting and multi-classifier dynamic selection integration.In the first phase,the training set is trained using a single classifier in the classifier pool.Each classifier gets a Tscore score,and then all the classifiers are sorted in descending order according to this score.In the second phase,the single classifier is sequentially selected in the integrated classifier pool.If the current integrated classifier Tscore score is greater than the score of the previous step and theclassifier inconsistency metric is greater than a certain threshold,the classifier continues to be added,otherwise Stop the integration.Finally,the method proposed in this paper is applied to the prediction of protein metal ion and free radical ion binding sites.By conducting experiments on the public dataset,a better prediction effect is obtained.Then compared with the classical prediction algorithm,the effectiveness of the proposed method is verified.
Keywords/Search Tags:Protein Sequence, Site Prediction, Feature Extraction and Selection, Classifier Dynamic Selection Integration
PDF Full Text Request
Related items