Font Size: a A A

Screening Of Characteristic Genes And Prediction In Breast Cancer

Posted on:2021-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2404330623470053Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Breast cancer is the most common malignant cancer among women of childbearing age in the world.In recent years,the mortality and incidence rate of female breast cancer patients in China have been increasing by 3% year by year,seriously threatening women's life and health.The cure rate of early stage breast cancer patients in China is relatively high,reaching over 80% to 90%.However,the disease of early stage breast cancer patients is not obvious,which is easy to be ignored.Many breast cancer patients are already in the middle and advanced stage when they are diagnosed,so the treatment is difficult and the survival rate is relatively low.Therefore,scientific and effective prediction method for the diagnosis of breast cancer has a great role,as early as possible to find the disease,timely with the doctor for the corresponding treatment can effectively improve the survival rate,reduce the pain of patients.The main cause of cancer is mutation of the cancer driver gene.Screening of the breast cancer driver gene is of great significance for studying the pathogenesis of breast cancer,finding out the effective treatment plan and developing new anticancer drugs.In this paper,the gene expression data of breast cancer were obtained by highthroughput sequencing technology,the characteristic genes were screened and the breast cancer was predicted effectively.This paper choose breast cancer gene and downloaded from TCGA database by the cancer genome data for research,on the 113 samples,19754 genes after data preprocessing,using R software edgeR packages for multiple changes in the selection,with 2 times as the control threshold,p < 0.05,have raised 1997 genes,cut 12487 genes,and chi-square test was carried out on the cut on the gene,61 genes screened out after setting threshold,finally using the recursive feature elimination method to select the difference 30 genes: FIGF,CD300 LG,HEPACAM,PLIN4,GPD1,CA4,HSD17B13,TSLP,LPL,CD36,BTNL9,SCARA5,LYVE1,CHRDL1,CLEC3 B,ANGPTL7,RDH5,NPR1,HLF,RBP4,ITGA7,ITIH5,BMX,ADAMTS5,SAMD5,TGFBR3,SLC19A3,C1QTNF9,ASPA,SVEP1.Select the 30 genes by building Adaboost model,forecast whether breast cancer,the resulting 99.7743% precision and 99.7743% recall,MCC value of 0.974487,AUC value as high as 0.997743,the area under the curve similar to 1,shows characteristics near the gene for breast cancer screening and cancer classification has the very good degree of differentiation,reliable screening methods.In order to verify the prediction effect of Adaboost model,decision tree,neural network and logistic regression method were used for prediction,and the ROC curve and relevant indexes such as precision,recall,AUC value and Matthews correlation coefficient of each model were compared.The Adaboost integrated learning device has the best predictive effect on breast cancer.Finally,this paper summarizes the research ideas,and at the same time,the deficiencies of this paper and the direction of the next research.Conclusion: The combined feature screening method of multiple variation method,chi-square test and recursive feature elimination method effectively reduced the variable dimension,and among the 30 feature difference genes screened,FIGF and TSLP genes have been proved to be correlated with the formation of breast cancer in relevant literature,and the screening effect is good.At the same time,the Adaboost model has a more significant prediction effect compared with the individual learner.Doctors can more accurately screen out patients with early breast cancer based on the prediction results,combined with molybdenum target X-ray,color ultrasound and other medical means.
Keywords/Search Tags:TCGA, Gene Screening, Breast Cancer, Adaboost
PDF Full Text Request
Related items