Font Size: a A A

Research On Several Sequence Information Extraction Methods And Subcellular Location Prediction Of Proteins

Posted on:2017-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2180330482980739Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The research of protein subcellular localization are closely related to the function of cell processes, and play an important role in disease mechanisms and development of new drug.Traditional methods of biochemical experiments continue a long period time and cost a lot. Thus,machine learning has become one of the important research contents in current bioinformatics.At present, the key problem of protein subcellular localization prediction is feature information extraction and construction of classification algorithms. In this paper, we mainly focus on feature information extraction and the main contents are as follows:(1) Protein sequence information extraction and subcellular location prediction based on amino acid index. The various physicochemical properties of amino acids affect the function and structure of protein directly. Protein sequences can be transformed into numerical sequence by using amino acid index. We proposed AAF algorithm to extract the feature information in the new numerical sequence. Then a feature vector of m+2 dimension is obtained. We discussed the influence of the parameter m on experiment results. Compared to traditional ACF algorithm, the prediction accuracy of ACF improved about 10%.(2) The protein subcellular localization prediction based on Gapped k-mer. The information of amino acid composition is one of the most simple information extraction algorithms. However,when promote it into polypeptide, feature vector dimension will be too high and the predict results are not satisfactory. Feature information extraction algorithm based on Gapped k-mer is proposed in this paper. Reduction of amino acids and insertion of spaces in polypeptides not only reduces the dimension, but also improves the accuracy. When fusing the information extraction of Gapped k-mer and GO information, the prediction accuracy of Gneg1456 dataset reached93.28%.(3) Protein sequence information extraction and subcellular localization prediction based on position specific scoring matrix(PSSM). PSSM gotten by PSI-BLAST searching protein database represents the evolution and conservative information of proteins. In order to extract the feature information, we proposed two kinds of transformation processing of PSSM matrix.One is based on average evolutionary information, another is based on position relativity.Experimental results on NNPSL data sets show that the two processing methods improve the accuracy.
Keywords/Search Tags:protein subcellular localization prediction, amino acid index, classifier, position specific scoring matrix, support vector machine
PDF Full Text Request
Related items