Font Size: a A A

Analysis Of Protein Evolution Relationship And Identifying Antimicrobial Peptides Based On Sequence

Posted on:2019-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:H D ChenFull Text:PDF
GTID:2370330590973890Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Protein plays an important role in the process of life,which is closely related to many diseases.With the rapid development of sequencing technology,protein sequence information has shown explosive growth,but due to labour power and material constraints,protein structure and function analysis is relatively slow.Through complex experiments traditional artificial methods can effectively identify the functional structure of proteins,which will consume a lot of time and energy.Based on the primary structure of proteins,this paper used the machine learing technology solve of protein remote homology detection and the prediction of antimicrobial peptides from two different perspectives.One is to solve the problem of protein remote homology detection and folding recognition of proteins by introducing more evolutionary information,which the point of view is research on proteins evolutionary relationship based on primary structure.The other is to solve the problem of predicting the function of antimicrobial peptides by transforming the problem into a multi-label problem based on the correlation between labels.Focus on how to efficiently solve the problem of protein remote homology detection and fold recognition,this paper introduces evolutionary information based on the protein sequence profile and improves the quality of sequence profile information.Two methods Dekmer-Top and Dekmer-MSA for feature extraction are proposed respectively,which used different ways to extract protein evolution information from profile.Feature extraction often leads to the problem of dimension disaster,so this paper uses reduced alphabet to control the explosive growth of dimension.Furthermore,two denoising methods are proposed to enhance the quality of generated features and further improve the prediction performance of the method.In order to improve the performance of existing methods in the identification of antimicrobial peptides,two level framework preditor CHDAMP was proposed.The first level aimed to distinguish whether protein is antibacterial peptide,and second level task is marking the active function of antimicrobial peptides which is Multi-label problem.This paper proposes a multi-label method RAKELECC,which considers the labels correlation from two different perspectives and improves the predictive effect of the preditor.An updated antimicrobial peptide dataset was established,which contained 8 100 non-AMP samples,2 700 AMP samples and 8 different functional categories,1821 more AMP samples and 3 new functional categories than the previous APD3 dataset.Aiming at the problem of imbalance of datasets in the antimicrobial peptides.In this paper,we propose an oversampling method,NML-SMOTE,which is applied to deal with the imbalance problem of multi-label datasets.Reasonable expansion of functional category with fewer samples in the dataset to balance,which can reduce the predictive bias of classifiers caused by the imbalance of datasets.The performance of experiment on the benchmarket that the Hamming Loss is reduced at 0.1527 and the Subset Accuracy is achieved at 0.5006.
Keywords/Search Tags:Protein remote homology detection, Fold recognition, Identifying antimicrobial peptides, Oversampling method, Multi-label method
PDF Full Text Request
Related items