Font Size: a A A

The Prediction Of Human Proteinsubcellular Localization

Posted on:2020-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y N ShenFull Text:PDF
GTID:2504306518963239Subject:Computer Technology and Engineering
Abstract/Summary:PDF Full Text Request
Protein subcellular localization is an important research direction in computational biology,which aims to determine the location of proteins in cells under normal conditions.Human protein subcellular localization prediction is an important research branch.This research can help us understand the function of protein,and has important applications in many fields such as drug design.With the exponential growth of protein quantity,the traditional method of wet experiment verification has been unable to meet the demand.There are a variety of high-precision,low-cost calculation based prediction methods,which greatly improve the efficiency of protein location judgment.The core of the calculation based method is the classification model,and the construction of a reasonable model is obtained through the training of a large number of data.In this paper,we introduce two methods to predict protein subcellular location,which use different information of protein.In the second chapter,the prediction method based on multi-kernel SVM is proposed.Three different feature matrixes are extracted by using the physical and chemical properties and evolution information of protein sequences,which integrate the whole and local features of sequences.The best kernel function is selected for different features,and then a new kernel matrix is combined by kernel fusion to build SVM classification model.This method mainly uses the idea of kernel fusion,and combines several different features together.The results show that the kernel fusion method can improve the accuracy of prediction,and the extracted features should involve as many aspects as possible.In the third chapter,we first summarize the prediction methods for subcellular location of human protein in recent years,introduce nine typical prediction methods,which use sequence information or Gene Ontology information,and provide the prediction methods for external websites.Then we improve these methods and propose a new integrated prediction method based on Gene Ontology.The existing data sets for subcellular localization of human protein with multiple labels are statistically analyzed.The data set with 3106 samples are selected as the most frequently used data set,and a new data set is constructed from the August 2018 version of the SWISS-PROT database.On these two data sets,the prediction methods mentioned in Chapter 3 are verified.In the new method,the hierarchical relationship between GO terms are used.In the experiment,the influence of different kinds of GO and homologous proteins on the results is considered,and the prediction results in different situations are compared.In addition,the method of constructing feature matrix by using GO semantic similarity is also tried,and the results of four methods of constructing semantic similarity matrix are compared.Although the research in this field has made some progress,there are still some improvements.In the aspect of feature extraction,the use of GO information still needs to be further explored,and semantic similarity matrix is the direction that can be tried.Secondly,in the classification algorithm,to build an algorithm that directly affects the multi label classification,although the use of two classifiers can also get satisfactory prediction results,but ignore the relationship between different labels,which may cause the lack of prediction information,and the improvement of prediction accuracy needs to integrate all aspects of the impact.
Keywords/Search Tags:Human protein subcellular localization, Multi-label, Gene Ontology, Protein sequence information
PDF Full Text Request
Related items