Font Size: a A A

Recognition Of Protein Folding Based On Machine Learning

Posted on:2022-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:X H YangFull Text:PDF
GTID:2480306548996999Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,big data information will be used in many aspects to better understand research projects.In the current biomedicine,big data is also used to provide more materials for research.Protein folding recognition is to provide effective and convenient research data by studying biological proteomics from the biological protein sequence.These researches are of great significance to medical research.This article mainly studies the three protein data DD datasets,RDD datasets and TG datasets.This project is based on machine learning for protein folding recognition,The main research contents are as follows:1.Propose a method based on random forest,called RF-fold.Firstly,we used four feature extraction methods,Detrended Cross-correlation Analysis(DCCA),Pseudo Amino Acid Composition(Pse AAC),Pairwise Frequncy(PF1)and Bi-gram representations.By fusing the feature vectors of four different features obtained by the above method,a feature space with mixed feature information can be obtained.Secondly,Linear Fisher Discriminant Analysis(LFDA)is used to further select the feature information of the extracted protein sequence,so as to reduce repetitive or unnecessary feature information and select the most effective feature subset in the multi feature data.Finally,the feature information obtained after dimensionality reduction is input into RF classifier for protein folding recognition and prediction.This method has higher prediction results in both the training set DD dataset and the testset TG dataset.2.Propose an ensemble classifier method based on Bagging,called BAG-fold.Firstly,we used four feature extraction methods,Pseudo Position Specific Score Matrix(Pse PSSM),Secondary Structure(SS),Encoding Based on Grouped Weight(EBGW)and Detrended Cross-correlation Analysis(DCCA)are used to extract the features of the data.The mixed feature space is obtained from the above four kinds of feature information.Secondly,Linear Fisher Discriminant Analysis(LFDA)is used to reduce redundant information to select the optimal feature subset.Finally,the feature information obtained after dimensionality reduction is input into bagging ensemble classifier for protein folding recognition and prediction.This method has higher prediction results in both the training set DD dataset and the testset RDD dataset.
Keywords/Search Tags:ensemble learning, protein folding, multi feature fusion, linear fisher discriminant analysis
PDF Full Text Request
Related items