Font Size: a A A

Study Of Multiple Information Fusion In Protein Subcellular Localization Prediction

Posted on:2017-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:H M XuFull Text:PDF
GTID:2180330482980740Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Proteins must be in a particular area in the cell for the participation in normal life activities,such as the mitochondria, nucleus, cytoplasm, etc. Protein subcellular localization information can provide useful clues for the prediction of protein function. Meanwhile, it also provides the important information on protein interactions and evolutionary studies. However, biological experimental method is time-consuming and laborious. In practical application, it cannot meet the needs of the rapid growth of biological sequences only with experiments. In order to speed up the annotation process of the protein structure and function, more and more researchers adopt intelligent calculation methods for prediction of protein subcellular localization. In this paper, we mainly focus on information processing problem: the extraction, fusion and prediction of protein information. The main work is represented as follows:1. We reviewed the research progress of protein subcellular location prediction in recent decades. It is mainly about the feature extraction information of protein sequences and prediction classification algorithms. The common feature representations are amino acid compositions,pseudo-amino acid compositions, evolutionary-based(position specific-scoring matrix), Gene Ontology(GO)-based, Function Domain(FD)-based and others. The prediction classification algorithms are mainly support vector machine(SVM) and K-nearest neighbor(KNN).2. We proposed a prediction method for protein subcellular localization, which is based on evolutionary information, conservative information and position specific-scoring position. Based on the amino acid sequence, the evolutionary information and the conservative information of protein sequences are proposed. Then, the split ratio is used to divide the position specific-scoring matrix. The different split ration results show that the golden ration can reach the better results. Thus, the golden ratio is used to divide the position specific-scoring matrix. We count component information and location information from each segment. At last, the fusion information of the above representations were used into predict apoptosis proteins subcellular localization. The accuracies of ZD98 and CL317 datasets are 98.98% and 91.11%, respectively.3. We proposed a prediction method, which is based on consensus sequence composition and gene ontology information. Prediction of bacteria proteins with the original sequence andconsensus sequence information extraction, physical and chemical properties of amino acids and GO annotation information. Principal component analysis algorithm is used for feature selection and support vector machine(SVM) method is used to forecast. The accuracies of Gram-positive and Gram-negative datasets are 96.15% and 95.95%, respectively. Compared to the existing methods, the proposed methods show the better results.
Keywords/Search Tags:Subcellular localization prediction, Pseudo amino acid composition, Position specific-scoring matrix, Support vector machine
PDF Full Text Request
Related items