Font Size: a A A

Research On Glutarylation Site Prediction Method Based On PU Learning

Posted on:2024-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z D QiFull Text:PDF
GTID:2530307295451834Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Protein is a polymer organic compound composed of amino acids,which is the pillar and cornerstone of life activities.Proteins are synthesized in cells by connecting a series of amino acids in turn,but these amino acid chains cannot directly perform their biological functions,and need to undergo a series of post-translational modifications to obtain specific biological functions.This chemical modification is called protein post-translational modification,and glutarylation is a post-translational modification that plays an irreplaceable role in various functions of cells.Therefore,it is of great research significance to accurately identify glutarylated substrates and their corresponding glutarylation sites.In recent years,many calculation methods for glutarylation sites have emerged one after another,but there are still many limitations,among which the class imbalance caused by the uncertainty of noisy data and non-glutarylation sites is a great challenge.Therefore,it is important to select reliable negative samples from unlabeled samples,and two glutarylation site prediction models based on PU learning algorithms are proposed in this thesis,and the main research content is as follows:(1)A PU learning prediction method based on core cluster clustering is designed to FCCCSR_Glu for selecting reliable negative samples.Firstly FCCCSR_Glu effective feature information is extracted by combining multi-view feature coding methods,including CKSAAP(composition of K-spaced amino acid pairs),AAC(amino acid composition),BLOSUM62 and AAF(amino acid factor),in order to extract more important feature information,incremental feature selection is carried out based on GINI index.Then,it is proposed that the FCCCSR algorithm selects reliable negative samples from unlabeled samples,and the main idea of FCCCSR algorithm is to select the core set as the structural support data of positive samples from positive samples,cluster the core set in the positive sample,and use the clustering results of the core set to select reliable negative samples from unlabeled samples,so as to alleviate the problem of information loss in random undersampling and achieve balanced operations while maintaining global information and structure to the greatest extent.Finally,the XGBoost algorithm is used as a classifier,and the parameters of the XGBoost model are optimized using the differential evolution algorithm(DE).(2)A PU learning prediction method based on improved WGAN-GP was constructed,WGAN-GP_Glu,for selecting reliable negative samples,WGAN-GP_Glu model mainly includes reliable negative sample selection module,deep feature extraction module,glutarylation site prediction module,reliable negative sample selection module designed an improved method of WGAN-GP,named Reliable WGAN-GP,Reliable WGAN-GP mainly includes two generators G1,G2 and discriminator D three parts,the positive sample is sent into the discriminator as real data,the generator generates fake data through randomly selected unlabeled samples and inputs the fake data into the discriminator,the generator and the discriminator fight against the loop training,and the discriminator selects reliable negative samples to achieve data balance;The deep feature extraction module combines convolutional neural network,bidirectional long short-term memory network and attention mechanism to extract deep features.Finally,a glutarylation site prediction module based on the three-layer fully connected layer is designed to make class prediction for samples.
Keywords/Search Tags:PU learning, Glutarylation site prediction, Selecting reliable negative samples
PDF Full Text Request
Related items