Font Size: a A A

Research On Prediction Of Protein Supersecondary Structures

Posted on:2010-01-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:D S ZouFull Text:PDF
GTID:1100360275474205Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Proteomics is becoming an important research domain in the life science with the approach of post-genome era. Prediction of protein structure is a challenge at present in the research of proteomics. Successful prediction of protein structure will be helpful to know better the folding mechanism and the function of protein. In addition, it will be an important assistant to relevant industries such as biomedical engineering, agbio-tech, etc. The 3D structures of proteins are tightly associated with specific functions. Supersecondary structures are the basic building blocks of the 3D structures of proteins. It is very diffcult to predict the 3D structures from the primary structures. Secondary structures and supersecondary structures can be the connecting tie between 3D structures and primary structures. There is thus a need to predict the supersecondary structures of proteins.The available methods for protein structure prediction mainly aim at secondary structures. The methods for supersecondary structure predicition are quite few. Secondary structure prediction is the first step and precondition of supersecondary structure prediction. Encoding approaches of amino acids influence the accuracy of sencondary structre prediction dramatically. It is necessary to compare and analyze different encoding approaches. The analysis can be the foundation of choosing suitable encoding approaches. The available methods for superseconary structure prediction have limitation on feature representation of structural motifs. Only basic composition of amino acids is considered. The information of feateure representation is incomplete. In addition, the classification methods for supersecondary structure prediction need further explorer.By using machine learning techniques, this paper develops research on supersecondary structure prediction. Firstly, this paper analyzes encoding approaches of amino acids for secondary structure prediction. Secondly, this paper researches on the prediction ofβ-hairpins, which are one special supersecondary structure and occur frequently in protiens. Finally, the research on special supersecondary structures is extended to research on general upersecondary structures. The main contributions of this paper are summarized as follows.①Analyze the influence of different encoding approaches of amino acid on the secondary structure prediction by using support vector machines. Encoding approaches are important factor influencing accuracies of prediction of protein. Support vector machines are used as classifier. Prediction model of protein secondary structure is built. Five approaches, ENCOrth,ENCFive,ENCCodBas,ENCCodExt and ENCProf, are used and compared. Different kenel functions of support vector machines are also compared. The experimental results show that the accuracy of Q3 can be apparently higher than the other four approaches by using ENCProf approach, in which more envolutionary information are contained.②A new method of feature representation forβ-hairpins is proposed. Measure of diversity and increment of diversity are used to represent the feature information of protein sequential pattern. Amino acid basic compositions, dipeptide components and amino acid composition distribution are combined to represent the compositional features. Eachβ-βmotif is represented as a feature vector of 18 dimensions, which is used as input of classifier. The method is trained and tested on ArchDB40 dataset (3088 proteins), Kumar dataset (2088 proteins) and CASP6 dataset (63 proteins). Support vector machines are used as classifer. The experimental results show that higher overall accuracies of prediction are obtained.③By using the method of feature representation forβ-hairpins proposed above, a method of increment of diversity combined with quadratic discriminant analysis, for the first time, is used to predictβ-hairpin motifs. The method is trained and tested on ArchDB40 dataset, Kumar dataset and CASP6 dataset. The experimental results show that higher overall accuracies of prediction are obtained. These works show that the new method of feature representation forβ-hairpins proposed in this paper is very effective.④The approach of feature representation for special supersecondary structure (β-hairpins) is extended to represent the features of general supersecondary structure. Measure of diversity and increment of diversity are used to represent the feature information of protein sequential pattern. Amino acid basic compositions, dipeptide components and amino acid composition distribution are combined to represent the compositional features. Each supersecondary structural motif is represented as a feature vector of 36 dimensions, which is used as input of classifier. Support vector machines are used as classifer. The method is trained and tested on ArchDB40 dataset containing 9180β–βmotifs, 5737β–αmotifs, 6378α–βmotifs and 4176α–αmotifs. The experimental results show that higher overall accuracies of prediction are obtained on both the training dataset and the independent testing dataset.⑤The method of increment of diversity combined with quadratic discriminant analysis, for the first time, is used to predict supersecondary structural motifs. The method is trained and tested on ArchDB40 dataset. The experimental results show that higher overall accuracies of prediction are obtained on both the training dataset and the independent testing dataset. These works show that the approach of feature representation for general supersecondary structure, which is extended from that of special supersecondary structure (β-hairpins), is very effective.
Keywords/Search Tags:β-hairpins, Supersecondary Structure, Support Vector Machines, Measure of diversity, Quadratic Discriminant
PDF Full Text Request
Related items