Font Size: a A A

Cancer Marker Recognition And Classification Of Tumor Progression Based On Data Analysis

Posted on:2021-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z WuFull Text:PDF
GTID:2404330611464187Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
There are many kinds of cancer,which are seriously threatening patients’ life and health.The causes of cancer are complex and uncertain,the cure of cancer is not ideal.Early diagnosis and early treatment can improve the survival rate of cancer patients,but affected by many factors,many cancer patients are in the middle and late stage when they are diagnosed,the 5-year survival rate is very low.With the rapid development of machine learning technology and the continuous deepening of bioinformatics research,using gene expression data to identify cancer diagnostic markers and to study classification of tumor progression stage has gradually become one of the hot spots of people’s attention,which has a positive and far-reaching significance for early diagnosis of cancer.The purpose of this study is to analyze a number of chip data,high-throughput screen out specific expression genes in cancer,identify early non-small cell lung cancer diagnostic markers,to study the effective classification methods of tumor progression stage,so as to improve the classification accuracy of cancer stage.In view of these two aspects,this paper has referred to a large number of relevant literature and carried out the following work:(1)We performed a series of bioinformatics analysis on a set of important gene expression data with 76 samples were downloaded from GEO database in early stage of non-small cell lung cancer,including 40 adenocarcinoma samples,16 squamous cell carcinoma samples and 20 normal samples.In order to identify the specific markers for diagnosis,we compared the two subtypes with the normal samples respectively to determine the gene expression characteristics.Through the unsupervised multidimensional scaling plot,we found that the samples were clustered well according to the disease cases.Based on the classification results,statistical inference by linear model fitting and empirical Bayes method,486 important genes associated with the disease were identified.We constructed gene functions and gene pathways to verify our result and explain the pathogenicity factor andprocess.We generated a protein-protein interaction network based on the mutual interaction between the selected genes and found that the top thirteen hub genes were highly associated with lung cancer or some other cancers through our method.Finally,we combined with TCGA clinical follow-up data to analyze the clinical prognostic value of core genes.The results of this study indicated that contrast on the gene expression between different subtypes and normal samples provide important information for the detection of non-small cell lung cancer and helps exploration of the disease pathogenesis.(2)We performed a classification analysis in early stage of non-small cell lung cancer,including 32 first stage samples,24 second stage samples.Through unsupervised multi-dimensional scaling analysis,we found that there is no ideal classification effect according to tumor stage,there was no clear boundary between samples.In order to study the effective classification method of early stage of non-small cell lung cancer and improve the classification accuracy of tumor stage,a machine learning algorithm combining different feature selection algorithm and classification model was proposed.In order to obtain reliable feature sets,features were extracted from the training set.In the first stage,the intersection of the five features sorting of filtering method was used for initial screening.In the second stage,Lasso method was used for fine screening,20 feature variables were finally screened out,then,these 20 feature variables were used for classification.Based on the 10 folds cross validation and test set,the performance of different classification models was evaluated.These 20 characteristic variables finally made the average accuracy of more than 95%.In order to make the feature selection and classification method adopted in this paper more convincing,We applied the method to the other set of data GSE2990,the results showed that the accuracy of the classification model,which was established by the feature selection results in this section,was higher than that of the classification model,which was established by randomly selecting the same number of genes.The results of this study showed that the data analysis based on gene expression profile could be used in the classification analysis of tumor progression stage,providing important information for its early detection,contributing to the exploration of disease pathogenesis.
Keywords/Search Tags:Gene expression data, non-small cell lung cancer, diagnostic marker, tumor progression
PDF Full Text Request
Related items