Font Size: a A A

Research On Prediction Of Sequence-based Multilocus Subcellular Localization

Posted on:2018-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z W TianFull Text:PDF
GTID:2310330515969294Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The function of protein is closely related to its location in the cell,and in order to function correctly,the newly synthesized protein must be transported to a specific organelle(that is,subcellular).Prediction of protein subcellular localization can help us determine an unknown protein function,understand protein interactions and various biological processes,also,has an important significance on the pathogenesis of the disease.The traditional biological technology such as subcellular fractionation,fusion green fluorescent protein,mass spectrometry and isotope affinity tags can predict the subcellular localization precisely,but these experiments more expensive and time-consuming.In recent years,with the increasing number of biological data,bioinformatics has been rapid development,more and more researchers are keen to use a variety of computing technology to help solve the hot biological problems,using machine learning methods to study protein subcellular localization prediction is one of the hot spot,also,it is the main target of our work.After years of efforts of researchers,research on machine learning to predict the subcellular localization has achieved a series of significant achievements,various computational methods have been developed,the accuracy of prediction subcellular localization have increased continuously,a lot of subcellular localization prediction platform have appeared,provides valuable information for the subsequent analysis of the protein function,but,there is still a need for improvement,which roughly divided into the following three points:(1)Most existing the method is only applicable to the two classification data,but in fact,many proteins may have one or more subcellular locations,we need classifier that be able to deal with multi label problem.(2)Although some methods introduced multi-label learning techniques but their "multiplicity degree" is too low.(3)Some classifier use Gene Ontology method to improve the accuracy,but the feature dimension is too large and the extraction process is tedious.Based on fully comparing and analyzing the prediction of protein subcellular localization algorithms,aiming at solving the shortage of the existing classifier,we put forward some corresponding improvement measures,and we will elaborate from the following four aspects: data set construction,feature extraction,protein subcellular localization prediction algorithm and the results evaluation.The data set that we used is derived from the widely recognized tool iLoc-Animal,whose category "multiplicity degree" reaches 1.8922,and the total number of categories is reach up to 20,as for feature,we choose amino acid composition(AAC)and a feature named LIFT,overcoming the tedious and time-consuming of using GO feature;after comparing with other multi-label problem forecasting algorithms,we choose multi-label K-nearest neighbor algorithm as our classifier;in the stage of performance test,using ten fold cross validation method,five test indicators: Precision,Accuracy,Recall Absolute-True and Absolute-False were evaluated,and compared with the classical algorithm i Loc-Animal.The experimental results show that our method successfully classify our data into target data set with the Accuracy of 74.35% and the Absolute-True of 71.17%,significantly higher than the iLoc-Animal method' Accuracy(62.28%)and Absolute-True(45.62%),also the results of each evaluation index are higher than those of iLoc-Animal.In addition,our prediction method is simple and fast response,It is hoped that this work will be helpful to the study of protein subcellular localization prediction.
Keywords/Search Tags:Subcellular Localization, Multi-label Learning, Features, Multi-label k Nearest Neighbor, Classifier
PDF Full Text Request
Related items