Font Size: a A A

The Design And Implementation Of Protein Class Prediction System Based On Intelligent Computing

Posted on:2014-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiuFull Text:PDF
GTID:2268330425981145Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the completion of human genome project, various data related with the proteingrow rapidly, but the gap between the protein sequence data and structure data is bigger andbigger. Therefore, there is an urgent need to develop a rapid and accurate tool to predict thetertiary structure of protein. This paper introduces the theory related with protein structureprediction, The content includes the feature extraction methods of amino acid sequence,thedesign of classification model and the choice of intelligent optimization algorithms. Based onthe theoretical research to build protein analysis tools can be competent in stability, speed andease. The system can complete large-scale protein data analysis high flux and automation.The protein structure prediction system includes feature extraction module, classificationmodule and valuation module and so on. This system uses the Microsoft Visual Studio2008platform, and is realized through C#language. Research process is as follows:(1) Feature extraction of Amino acid sequence. Feature extraction is a priority step ofprotein tertiary structure prediction. Feature extraction is a process of translating amino acidsequence data into digital vectors of fixed dimension. Amino acids feature extraction is a veryimportant part of the protein tertiary structure prediction. Different feature extraction methodsfor different data sets and classification model have different effect. In this paper, using ofseven class model, dipeptide composition model, tripeptide frequency method and distributioncomposition model to extract feature of amino acids from the different Angle, can also bedifferent feature extraction methods for fusion in order to improve prediction accuracy.(2) Establish classification model. Through analysis and study the useful informationextracting from the amino acid, summed up the rule, and realize the structure of the aminoacid sequence of unknown structure prediction. Because the protein secondary structureprediction information three high dimension, large amount of calculation, using differentclassification models for time efficiency and prediction accuracy is essential. This paper usessome classification models of current mainstream: Artificial Neural Network, BackPropagation neural network and K-nearest neighbor classification model. For the neuralnetwork has self-organizing, self-learning and adaptive characteristics and is very good atdealing with nonlinear optimization problem of bioinformatics. So we first select neural network model to predict the protein structure. At the same time in order to avoid falling intothe local optimum, we use particle swarm optimization and gradient descent optimizationalgorithm to train parameters. K-nearest neighbor algorithm is a kind of classification methodbased on distance measure, with the characteristics of directness, without prior knowledge ofstatistics, unsupervised learning, and has become an important method of nonparametricclassification. According to the shortage of K-nearest neighbor algorithm, a weighted K-nearest neighbor algorithm is proposed in this paper, to the protein structure prediction withgood results.(3) Design and implementation of protein structure prediction system. In the VS2008platform, using C#language design and implementation of Intelligent Computing Based onprotein structure prediction system, Realization of upload data, feature extraction, structureprediction module, testing and improving application system.
Keywords/Search Tags:feature extraction, prediction of protein structure, K-nearest neighbor, ANN, BP neural network
PDF Full Text Request
Related items