Font Size: a A A

Protein Secondary Structure Prediction Based On The Hidden Markov Model

Posted on:2013-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:J H LinFull Text:PDF
GTID:2210330374962869Subject:Biological Information Science and Technology
Abstract/Summary:PDF Full Text Request
In the field of Bioinformatics, protein spatial structure and function predictionare very important tasks, and the protein secondary structure prediction is very helpfulfor those tasks. Using the amino acid sequences whose secondary structures areknown, the main object of protein secondary structure prediction is to establish theoptimal model between those amino acid sequences and their secondary structures,and then this model is used to predict the secondary structure of a new amino acidsequence. Protein secondary structure prediction problem is one of the patternrecognition problems, and it is also one of the optimization problems. Normally, ifprediction accuracy is higher than80%, those results can be used to predict proteinspatial structure better. So, an efficient protein secondary structure predictionalgorithm is very helpful for protein spatial structure and function prediction.Hidden Markov Model (HMM) is a typical statistical method, and is aprobability model which uses parameters to represent problem. It is used to describethe statistical characteristics of the stochastic process. Many domestic and foreignscholars have use this method to predict protein secondary structure, and muchachievements have been achieved, but the accuracy is rarely more than80%. One ofthe main reason is that the Baum-Welch algorithm, which is used estimate parametersfor HMM, has some shortages such as ease to strap into local optima, prematureconvergence, and slow convergence speed, etc. And yet in intelligent optimizationareas, there are many optimization algorithms which have good global optimizationability, such as Particle Swarm Optimization (PSO) algorithm, etc. This paper studieshow to use the PSO algorithm to improve the Baum-Wellch algorithm, in order toimprove the accuracy of the protein secondary structure prediction by HMM.Specifically, this dissertation focus on the following studies:Firstly, this dissertation compared the performance of the3-states HMM and7-states HMM in the CB513data set. Scaling factor was used to overcome trainingunderflow problem, and single base probability is defined to initialize symbols outputmatrix. The simulation results show that the7-states HMM is better than the3-statesHMM, but their overall accuracy Q3are less than60%.Secondly, the basic PSO algorithm is used to improve the Baum-Welchalgorithm. The simulation results show that the Q3have reached more than75%, theaccuracy between helix H and coil C, QCand QHhave a certain degree of improvingfor their own, have reached more than70%, the accuracy of sheet E QEhas improved greatly, and also reached about70%.Thirdly, an improved PSO algorithm was used to improve the Baum-Welchalgorithm. The simulation results show that,7-states HMM substantially is better than3-states HMM, and Q3is almost79%,QHand QChave reached more than80%.Lastly, an comparative evaluation is made between the improved algorithm andmany other kinds of methods, which verify that this hybrid algorithm has an certainadvantage to predict the protein secondary structure.The results of this dissertation show that, it is effective to use the PSO algorithmto improve the Baum-Welch algorithm. The improved algorithm can overcome theweaknesses of the Baum-Welch algorithm, such as ease to strap into local optima andpremature convergence, etc, so it can improve the accuracy of HMM for proteinsecondary structure prediction problem.
Keywords/Search Tags:Bioinformatics, Amino acid sequence, Protein secondary structureprediction, Hidden Markov Model, Particle Swarm Optimization algorithm
PDF Full Text Request
Related items