Font Size: a A A

Predicting Protein Subcellular Localization Using The Algorithm Of Increment Of Diversity Combined With Artificial Neural Network

Posted on:2015-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WuFull Text:PDF
GTID:2298330431478609Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the coming of the post genome era, life science research contents develop with highspeed. the rapid expansion of biological information data. And it produces a large number ofbioinformatics data. As the large scale and complexity of biological data, bioinformatics isproduced. The biological functions of protein depend on its subcellular location and proteinwhich is in the right position could play the corresponding function. If the protein istransported to error subcellular position, it will have a significant impact on cell function andorganisms. In addition, if we know the subcellular localization of protein, we can obtain thefunctional and structural information of protein. In this way, people can understand themechanism of disease and it has important application in the development of new drugs andcell medical field. Therefore, predicting protein subcellular localization has become one of themain research content of bioinformatics in the post genome era.In the view of biological, the protein sequence determines structure, structure determinesfunction. Bioinformatics method of protein subcellular localization prediction use theintelligent algorithm and the initial amino acid sequence to predict the subcellular location.Bioinformatics method of protein subcellular localization prediction generally involves foursteps. First, we should establish an objective and effective data set. Second, we choose theappropriate feature extraction method to encode protein sequences. Third, using intelligentalgorithms to build effective classifiers and we use classifiers to predict protein subcellularlocalization. Fourth, according to the results to evaluate the classifier algorithm.This paper introduces the basic theory of protein subcellular localization, protein featureextraction method, increment of diversity and neural network. We propose a proteinsubcellular localization method which is based on increment of diversity and neural network.The study is focused on the combination of increment of diversity and neural network andprotein extraction method so that we can obtain high prediction accuracy.If we want to predict protein subcellular localization, we must make protein sequencesinto digital information through feature extraction method so that it can be identified bycomputer. Feature extraction method is very important to improve subcellular localizationprediction accuracy. Feature extraction methods include amino acid composition(AAC), Dipeptide, amino acid hydration composition (AAHC), pseudo amino acid composition(PseAA), physical and chemical composition (PCC), encoding based on grouped weigh(EBGW), N-terminal signal (NTS) etc..In this paper, increment of diversity is used as feature extraction method. We use thecommon protein feature extraction method to extract common protein features. And we inputthese protein features to increment of diversity, then we can obtain discrete finite coefficientswhich are the protein features. At the same time, we fuse the discrete finite coefficients whichare extracted by a variety of common feature extraction methods. The results show that, theeffective fusion of multiple features is better than single features. N-terminal signal plays a keyrole in protein subcellular localization.Classification methods usually includes K nearest neighbor method (KNN), the Biasnetwork, artificial neural network (ANN), the flexible neural tree (FNT) etc.. But Proteinsubcellular localization is the typical multiple classification problems, these classifiers are notsuitable for this problem. Therefore, treatment of protein subcellular localization is convertingit to two-classify problem, then they can be dealed with commonly used classifiers. In thispaper, we use SNL6data set.We use error correcting output coding (ECOC) to classificymulti-label sequences and base classifiers are artifical neural network (ANN). Theoptimization algorithm is particle swarm optimization algorithm (PSO) and our methodachieve good predicting results.
Keywords/Search Tags:predicting subcellular localization, feature extraction, increment ofdiversity(ID), artificial neural network(ANN), error correcting output coding (ECOC), particle swarm optimization (PSO)
PDF Full Text Request
Related items