Font Size: a A A

Research And Application On Protein Subcellular Localization Prediction

Posted on:2018-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HongFull Text:PDF
GTID:2370330575967041Subject:Agriculture
Abstract/Summary:PDF Full Text Request
Protein subcellular localization prediction is a fundamental component of bioinformatics based understanding of unknown sequence's function and analysis of genome annotation,and it can greatly improve target identification in the course of drug discovery.Because of the rapid accumulation of new-found protein and updates of biological database,experimentally determining the subcellular localization of a protein is not only laborious,but also time consuming to highlight the demands for an effective approach.We used computational tools to increase the prediction accuracy and efficiency of subcellular localization predictor.It had been proved that the combinations of different feature extraction methods and various classification models had significant distinction on different datasets.We utilized two procedures to come up with a best prediction combination.(1)To get the optimal feature extraction method,principal component analysis was employed to convert the high-dimensional GO feature to low-dimensional PCAGO feature via a set of linear transformation.By contrastive experimentation between feature selection methods based on single type of features and mixed features extraction methods,we found that to fuse PCAGO feature and PseAAC feature had been demonstrated to be superior to other features.To bulid the optimum assembled forecast model,we chose SVM,BP neural networks and KNN to carry out comparative tests with best hybrid coding methods based on GO.It was seen that SVM prediction algorithm combined with mixed feature extraction method(PCAGO and PseAAC)basically obtained best results.It had been indicated that even though most proteins can only function in one specific subcellular while some other proteins simultaneously exist more than one location.However,because the multi-location proteins were discovered in few years ago which suggested the biological materials were insufficient for computational experimental.In addition,there was a technical barrier which cannot be neglected as well.Hence,using machine learning to predict the multi-location protein subcellular location only started in recent years.In order to optimize the prediction of multi-location protein subcellular localization,we presented a valid model which is called CL-RBF,originating from the traditional RBF neural networks for multi-label learning.The detailed enhancement aspects are shown as follows:(2)In the process of ML-RBF training,classic K-means is used to generate the center on hidden layer.However,using K-means clustering algorithm to obtain centroids is not only unstable but also random.Hence,silhouette coefficient was introduced to calculate the optimal number of centroids on second layer.Besides,the previous approaches only considered optimization of clustering algorithms within the same label and neglected the interaction of centroids between different labels.In this paper,we took larger distance between two centroids which were generated from two labels into account when there were less samples covering these two labels.We put forwards an adaptive gradient descent algorithm.The algorithm also used adaptive gradient descent algorithm to adjust the parameters so that to compensate error causing from model training process.Finally,we formulated an improvement way of prediction results based on optimization clustering algorithm.In this paper,the training samples which do not belong to label L are clustered.The final adjustment is made by analyzing the distance between train samples,the hidden centers obtained by label L and the clustering centers not belonging to label L.Compared with the methods which had been introduced previously for bacterial protein subcellular localization prediction,the new predictor performed more powerful and flexible.In addition,for conveniently observing and analyzing of research achievement,on the basis of SVM model and improved ML-RBF model,we establish two user-friendly web-server for bacterial protein.
Keywords/Search Tags:subcellular localization, feature of protein sequence, principal component analysis, RBF neural networks
PDF Full Text Request
Related items