Font Size: a A A

Research On Detection Method Of Colorectal Cancer Based On Mining Technology

Posted on:2021-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:K YanFull Text:PDF
GTID:2404330620968372Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Colorectal cancer(CRC)is the third most common type of cancer in China,and the timely detection and prevention of CRC is necessary.This study proposes a technical process for the detection of CRC and the discovery of cancer biomarkers based on proteomics and machine learning methods.This research mainly contains four contents: Filling in missing values of CRC proteomics data,data normalization and preprocessing,CRC biomarkers screening and cancer prediction.First,the missing value imputation methods were evaluated,the simulated MS data with missing value was used to test five imputation methods(Single,KNN,K-means,Linear,Multiple),it was found that the data after linear imputation had the highest similarity to simulated data.Secondly,the normalization and pre-processing method of MS data were evaluated,and IRS method can effectively eliminate the batch effects,but it cannot eliminate the noise and global intensity bias.The Quantile and CONSTANd methods can deal with the problems caused by global intensity bias,but these two methods can cause a certain batch effect.Subsequently,we proposed an EPNS method utilizing endogenous proteins for normalization,EPNS can effectively reduce the Median CV of raw data.Regarding the screening of biomarkers,a two-step screening method of LR + LASSO + MCCV method and Random Forest + greedy method were proposed,and five biomarkers related to the occurrence of CRC were selected: NUP205,GTPBP4,CNN2,GNL3,S100A11.Finally,the generalization performance of four classification algorithms: Logistic Regression(LR),Support Vector Machine(SVM),Random Forest(RF),Back Propagation Neural Network(BP),it was found that both RF and BP models can obtain an AUC of more than 0.99 on the validation set,and the robust test effect can also be obtained on the independent test set,the RF model produced a false negative case,and BP model can completely distinguish between positive and negative samples.In summary,the detection and biomarker discovery of CRC based on proteomics and machine learning method can achieve good recognition effects,this research provided a new route for detection technology of cancer.
Keywords/Search Tags:colorectal cancer detection, proteomics, imputation and preprocessing, biomarkers, classification model
PDF Full Text Request
Related items