Font Size: a A A

Research On Essential Protein Prediction And Proteoforms Characterization Algorithms

Posted on:2021-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y S SunFull Text:PDF
GTID:2370330611959893Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Proteins are dominant executors of living processes.Essential proteins as a vital part of maintaining the cells' life play an important role in the study of biology and drug design.Compared to genetic variations,changes in the molecular structure and state of a protein(i.e.,proteoforms)are more directly related to pathological changes in diseases.Characterizing proteoforms involves identifying and locating primary structure alterations(PSA)in proteoforms,which is of practical importance for the advancement of the medical profession.This article focuses on two important research directions in the field of proteomics.The main innovations are as follows:With the generation of large amounts of biological data related to essential proteins,an increasing number of computational methods for predicting essential proteins have been proposed.Different from the methods which adopt a single machine learning method or an ensemble machine learning method,this paper designs a predicting framework named by XGBFEMF for identifying essential proteins,which includes a SUB-EXPAND-SHRINK method for constructing the composite features with original features and obtaining the better subset of features for essential protein prediction,and also consists of a model fusion method for getting a more effective prediction model.We carry out experiments on Yeast data to assess the performance of the XGBFEMF with Receiver Operating Characteristic(ROC)analysis,accuracy analysis,and top analysis.Meanwhile,we set up experiments on E.coli data for the validation of performance.The test results show that the XGBFEMF framework can effectively improve many essential indicators.With the development of mass spectrometry technology,the characterization of proteoforms based on top-down mass spectrometry technology has become possible.In the analysis of high-throughput proteomics,the identification of proteoforms requires the alignment of millions of spectra with tens of thousands of protein sequences,causing the spectral alignment-based algorithm extremely slow.Therefore,filtration algorithms are essential in the proteomic level analysis.This paper proposes a filtering algorithm called ETASF,which combines the speed advantage of the error tolerance method with the sequence tag method and the accuracy advantage of the ASF method.We performed experiments using the histone H3.1 dataset and breast cancer subtype(WHIM2-P32)dataset.The experimental results prove that using the ETASF algorithm can improve the identification performance and significantly reduce the algorithm complexity.
Keywords/Search Tags:top-down mass spectrometry technology, proteoform, essential protein
PDF Full Text Request
Related items