Font Size: a A A

Study Of Protein Secondary Structure Prediction Methods

Posted on:2005-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:H X ZhangFull Text:PDF
GTID:2190360122997299Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
As long as the human genome project has been finished, people collected lots of genetic information, and a great number of protein sequences have been measured. By the time Apr. 13, 2004, there have been 148516 protein structures in SWISS-PROT data bank. A protein can bring into playing some function just when it forms the special shape, and people hope to study the structures to find out the relationship between structures and functions. So it is the key task and aim to study protein structures in post-genome project.By now, there are 25176 structures of protein sequences in PDB data bank. People need to study the structures in theory. Unfortunately, it is very difficult to predict the tertiary structures just from their primary sequences. However people found that the fashions composed by secondary structures are limited. So it is an effective approach that to predict secondary structures firstly and then to predict tertiary structures as followed. Here, protein secondary structure prediction is not only a bridge but also a key step to the protein structure prediction.In this paper, the main work is summarized as follows:1. From 20 century 60 ages to now, it has developed about 40 years in protein secondary structure prediction field. Many methods have been come forth based on different datasets, different definitions of secondary structure and different evaluate indexes, so they cannot be compared fairly. Moreover, these upper factors can affect the results deeply. It urgently needs to evaluate the different methods under a uniform standard to find out the best ones and to boost the research all the more. In this work, a uniform standard was chose, and mainly 10 methods were evaluated, these are GOR I, PROF, GORIV, NNPREDICT, PHDsec, SSpro v 2.0, PSIPRED, PREDATOR, SOPMA and APSSP2. This work is complexity and time-consuming. No other people do this.2. FDOD is a new measure of information discrepancy. In this paper, it is firstly used in protein secondary structure prediction, and the result is exciting. FDOD method has two advantages. First, the complete information set can describe the sequence more correctly than methods. Second, FDOD function is based on the concept about entropy in the Information Theory. The input vector is a probability distributing. It only relates to sum operation, so it has not limit to dimensions of input vectors. FDOD is good for predicting protein secondary structures.3. Artificial Neural Network (ANN) is an important method. In 1988, Qian and Sejnowski have firstly used this method to predict protein secondary structure.From then on, this method has been improved quickly. In this paper, it is used to predict protein secondary structure by a novel BP network. However, because of the restriction of time and experience, the author just got an elementary result. It should be studied in work for the future.4. Support vector machine (SVM) is a new machine learning method. It is firstly successful applied in protein secondary structure prediction by Chinese researcher Prof. Sun Zhi-rong and other people. However, it is very costly to process the large amounts of high dimensional data and memory consumption, whereas Incremental Learning (IL) may solve this problem. A novel algorithm has been advanced with combining the characteristics of SVM and the processing of IL. The results show that about half time is reduced with the accuracy slightly fall (under 1%). So this new SVM method is efficient to the protein secondary structure prediction.
Keywords/Search Tags:Protein Secondary Structure Prediction, Post-genome Era, Bioinformatics, Artificial Neural Network (ANN), Support Vector Machine (SVM), FDOD function
PDF Full Text Request
Related items