Font Size: a A A

Research On Protein Subcellular Localization Prediction Method Based On Reduced Representation Of Amino Acid And Position-specific Scoring Matrix

Posted on:2016-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:S J YanFull Text:PDF
GTID:2308330467473436Subject:Biology
Abstract/Summary:PDF Full Text Request
It has become a very important research topic about how to predict protein localization incells by theoretical calculations or statistical methods for accelerating the process of proteinstructural and functional annotation. Protein subcellular localization prediction is important tostudy protein function, protein interaction and their regulation mechanism. The research resultsnot only contribute to the study of protein-protein interactions and the development of new drugs,but also offer new ideas for Proteome information analysis and algorithm design.Currently, the research of protein subcellular localization focuses on the following aspects:(1) construction or selection of a valid benchmark dataset for training and testing predictionmodel;(2) formulation an effective mathematical expression that can truly reflect the essence ofthe prediction sequences;(3) introduction or development of a powerful prediction method;(4)reasonable verification method for evaluation the prediction method accuracies objectively;(5)establishment of a user-friendly web server for the predictor. In this paper, we mainly focus onfeature extraction, selection and fusion to study the prediction of protein subcellular localizationwith the machine learning. The main work is as follows:Firstly, we propose a feature representation method which is based on position-specificscoring matrix (PSSM). Three kinds of new feature representation are obtained based on PSSM,evolutionary distance, domain content and family gathering. We construct a fusion modelaccording to the above three representation and adopt principal component analysis (PCA) toselect the key information from fusion model. Meanwhile, we also discuss the effects of differentparameters on the experimental results. The specific experimental and comparison results showthe effectiveness of the proposed method.Secondly, we propose a feature representation using four amino acids physicochemicalproperties and structural properties and describe the local and global information of sequence by‘component’,‘transition’ and ‘distribution’. We also put forward a new feature representation(NSBH), which is a hydrophilic amino acid-based numerical statistical characteristic. Wecompare the prediction results between the above methods and fusion method by classificationalgorithm KNN, SVM and BP. The results show that fusion method with SVM can get better prediction accuracies.Finally, we achieve the graphical user interface (GUI) design for related algorithms byMATLAB. Here, we mainly describe the design, compiled and packaged software process ofGUI with the specific examples. What’s more, we illustrate the installation and usage of softwarein detail. According to their needs, users can validate corresponding algorithms and choose themto use.
Keywords/Search Tags:subcellular localization, position-specific scoring matrix, the graphical user interface(GUI), support vector machine (SVM)
PDF Full Text Request
Related items