Font Size: a A A

Research On Hierarchical Classification Method For Prediction Of Protein Function

Posted on:2011-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2120360305455248Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Protein function prediction has achieved certain development, while different subject technologies are progressing in recent years. For rapid growth of the amount of protein sequence data and the complexity of protein structure, however, protein function prediction is still a challenging problem. Understanding of protein function is helpful for people to know disease forming mechanism, design more effective drugs and guide better treatment and diagnosis of diseases, etc. Protein function prediction has important practical meaning therefore.In the research background of bioinformatics, this paper researches and discusses protein function prediction that bases on hierarchical classification. The cells always contain most protein which is organic compounds, a particular category of biological macromolecules and one of the most important substances within a living organism (the other one is nucleic acid). The more complex a living organism is, the more structures and functions protein owns, while nearly all biological phenomena should be presented through protein. The basic task of exploring the mystery of life is therefore to get a detailed understanding of protein structure and function. The paper researches the relationship between protein structure and function and the features of them. It points out that protein is an important material base of all human life and has a variety of physiological functions: enzyme catalysis, signal reception and transmission, transport and storage of substance, support and coordination of substance, nutrition storage and immune function, etc. This work introduces structure and function of GPCR that an important protein plays the signal transaction function, resulting from information stimulation, in the human cell. GPCR is component in the process that cell surface receive a kind of signal and change it to other types of signal within the cell. Extracellular N termini and three loops of GPCR protein interacts with a special signal molecule(ligand).When a signal molecule combines with GPCR, it can change the shape of the combined ligand, cause the chain reaction within cell and make the cell response to the information. GPCR has become effect target of 12 kinds of drug within 20 kinds of best-selling drugs; therefore there is important practical significance for research on such protein function. In this paper, the experiment subject is GPCR family data.Because of the complexity of protein structure and variability of protein function, we think that it can be classified in many different ways, and the hierarchical description of its function is more helpful for understanding protein function. We analyze the difference of hierarchical classification and flat classification, discuss relevant problems of hierarchical classification in detail, and point out that how the difference of predicting depth affects the degree of hierarchical classification. Reviewing previous relevant methods on the prediction of protein function, it can be found that most prediction of protein function still stays on flat classification. When the minority classifies a data set of protein function hierarchically, it still transforms hierarchical classification problems to flat classification problems on every stage of cycle and makes the prediction.We propose to use the immune particle swarm optimization algorithm for hierarchical data sets at all nodes to optimize the selection of the classifier. The following areas has been optimized by Immune Particle Swarm Optimization:First of all, the immune particle swarm optimization algorithm randomly generate particles (antibodies) swarm, which each particle (antibodies) represents a classifier combination, the particle dimension is the number of nodes in the classifier tree, there will be a random distribution of classifier in every dimension. Each particle is a classifier tree, the generation of each classifier on every node does not depend on local data. Secondly, each particle (antibodies) which represented an entire classifier tree was trained by data sets of known types, and then validates using the validation set. Average the values of accuracy at each node, which is obtained through each validation set. The resulting value is the fitness value of each particle (antibodies),also it is the fitness value of the entire classifier tree. This fitness value is calculated considering the entire classifier tree classification performance, rather than the performance of a single classifier on individual node.Once again, think about the update method of every particle (antibodies).The position and velocity of current particle (antibodies) is impacted not only by the particles, but also by the optimal solution of itself, as well as by the optimal solution of particles (antibodies) swarm in immune memory library. So this algorithm gives full consideration to the interaction between classifiers on different node. Finally, we combined this algorithm with immune mechanism including diversity of antibodies, immune memory and inhibitory concentration. This method is to ensure the diversity of particles (antibodies) swarm, so will not lead to a local minimum during the update process. This would give a further and comprehensive consideration of different classifiers interaction between every node.Based on simulation experiments carried out on the GPCR family, the classification obtained in the three levels of classification accuracy values, this optimization has more obvious advantages in deeper level of hierarchical data setsThe method using immune particle swarm optimization algorithm to improve the performance of the entire classifier tree for hierarchical data set still exist shortcomings. The main idea of this algorithm is based on the particle (antibody) encoding and the update process of particles(antibodies) swarm. So we considerate the classification accuracy of the entire classifier tree and the interaction between all classifiers on every node, without taking into account the interaction between the parent node with its direct child nodes.We can still improve the performance of hierarchical classification performance on protein function prediction from the following aspects in future work:1) Try to use other optimization algorithms for hierarchical classification of protein function.2) Give a more rational design for the performance evaluation of the hierarchical classification method.3) Go into another type of protein function hierarchical data set -directed acyclic graph(DAG)hierarchical data set.
Keywords/Search Tags:Immune Particle Swarm Optimization Algorithm, Prediction of Protein Function, Hierarchical Classification
PDF Full Text Request
Related items