Font Size: a A A

Research On Recognition Method Of Protein Folding Pattern Based On Multi-source Information Fusion

Posted on:2018-06-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:L KonFull Text:PDF
GTID:1310330533463163Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
It is a great challenge to analyze the structure and biological function of proteins from the massive sequence data information in post-genome era.Protein folding pattern is an important concept describing the spatial topological structure of proteins.Developing the recognition method of protein folding pattern based on protein amino acid sequences is regarded as an important step to predict protein structure.Furthermore,this exploitation can accelerate protein structure identification and promote the research on the relationship between protein structure and function.According to the granularity of protein folding pattern,protein folding pattern recognition contains two sub-problems which are protein structural class prediction and protein fold recognition.However,the hierarchical relationship between protein structural class and protein fold is usually ignored in the existing methods where the two sub-problems are studied independantly.In this thesis,we focus on the problem of protein folding pattern recognition,in which the feature representation from multi-source information and the designation of multi-source information fusing classification systerm are studied based on the theories and methods of morden pattern recognition.This work begins with exploring the protein structural class problem and ends with a hierarchical protein fold recognition method fusing the information of protein structural class.Firstly,a sequence evolutionary mode of pseudo amino acid composition based protein structural class prediction method is proposed.In order to utilize the sequence evolutionary conservation information hidden in the position specific scoring matrix,six kinds of sequence evolutionary modes of pseudo amino acid composition are proposed.These features describe the global and local order information among the amino acid residues in the protein sequences from many aspacts.To integrate various features,feature-level and decision-level information fusing classification systems are designed,respectively.The experiment results on two low-similarity protein datasets show that the proposed method is superior to othe similar methods proposed recently.Secondly,a secondary structure based protein structural class prediction method is proposed.Considering the fact that most existing secondary structure based features merely describe the content information of protein secondary structure,novel feature representation methods are proposed to characterize the cooperativity and interaction among the amino acid residues during the formation process of various stable secondary structures,and the spatial arrangements of the secondary structures.A two-layer information fusing classification system is designed to integrate various features,where the information fusion implements both on feature space and decision space.The experimental results on two low-similarity protein datasets show that the proposed method is very useful for accurate prediction of protein structural class.Moreover,the proposed method is essential to achieve good performance for the ?/? and ?+? proteins.Finally,a hierarchical protein fold recognition method is proposed which fuses the information of protein structural class.According to the hierarchical relationship between protein structural class and protein fold,a soft classification based hierarchical classification architecture is designed.The experiment results show that the proposed hierarchical classification architecture can greatly weaken the negative effect from the misclassifications on protein structural class.Moreover,the proposed hierarchical classification architecture is much helpful to improve the recognition performance on the protein folds whose structural classes are difficult to be identified.Finally,a hierarchical protein fold recognition method is proposed by nesting various features(sequence evolutionary conservation features,secondary structure based features,amino acid sequence and physic-chemical features)and classification algorithms(support vector machine and random forest)into the hierarchical classification architecture.The experimental results show that the proposed method is an effective computational tool for protein fold recognition and provides better performance when compared with modern and competing methods.
Keywords/Search Tags:bioinformatics, machine learning, protein folding pattern, hierarchical architecture, position specific scoring matrix, secondary structure
PDF Full Text Request
Related items