Font Size: a A A

Prediction Of Protein Structure Based On Multi-Information Fusion

Posted on:2019-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:L F LouFull Text:PDF
GTID:2310330566465861Subject:Statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of sequencing technologies,the number of protein sequence increasing exponentially,and the traditional experimental methods are hard to process massively increasing protein data.Therefore,there is an need to develop theoretical computational methods to explore the relationship between protein structure and function.Bioinformatics is a interdisciplinary,uses information science technology to acquire biological information and processes biological data on the basis of biology.By analyzing and processing data,it can acquire more biological knowledge and gain a deeper understanding of biology world.Protein structure prediction is one of the hot research areas in bioinformatics.There's important meaning in promoting the development of biology.The main contents of this paper are as follows:The methods of protein feature extraction and machine learning algorithms are reviewed in detail in this paper.The feature extraction methods include amino acid component,polypeptide component,pseudo-amino acid component,position-specific scoring matrix,average chemical shift and auto-correlation function.Machine learning algorithms include support vector machine,Na?ve Bayes classifier,K nearest neighbor algorithm and linear discriminant analysis classifier.Feature extraction and classification algorithms are important factors that affect the prediction of protein structure.The above reviews provide the theoretical support for this study.Through the fusion of multiple information of protein sequences,a new protein structural classes prediction method was proposed.Firstly,we used the auto covariance transformation of the position specific scoring matrix(ACCPSSM)to extract evolutionary information,extracted protein sequence information based on Chous pseudo amino acid composition(PseAAC),extracted secondary structure information based on the PORTER online service.Then multidimensional scaling analysis wasused to reduce the dimension of the extracted feature vector from ACCPSSM,combined secondary structure information with protein sequence to extract average chemical shift information,fused the three extracted important information.Finally,the optimal feature vectors are input to support vector machine.Using jackknife test on three low-similarity datasets 25 PDB,1189 and 640,and compared our method with previous methods.The results indicate that the method proposed in this paper can effectively improve the prediction accuracy of protein structural class.Accord the theory of wavelet denoising,this paper presents a novel method of prediction of protein structural class.Firstly,the features of the protein sequence are extracted by using Chous pseudo amino acid composition(PseAAC).Then the extracted feature information is denoised by two-dimensional(2D)wavelet.Finally,the optimal feature vectors are input to support vector machine(SVM)classifier to predict protein structural classes.We obtained significant predictive results using jackknife test on three low-similarity protein structural class datasets 25 PDB,1189 and640,and compared our method with previous methods.The results indicate that the method proposed in this paper will be a reliable tool for prediction of protein structural class,especially for low-similarity sequences.
Keywords/Search Tags:protein structure prediction, position specific scoring matrix, average chemical shifts, pseudo-amino acid composition, wavelet denoising, support vector machine
PDF Full Text Request
Related items