Font Size: a A A

Research On Prediction Model And Metastatic Critical Stage Of Non-small Cell Lung Cancer

Posted on:2020-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y H PanFull Text:PDF
GTID:2370330578963894Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Lung cancer is one of the most common malignant tumors in the world,of which non-small cell lung cancer(NSCLC)accounts for 85% of all lung cancer cases.More than75% of patients were diagnosed with advanced lung cancer due to the asymptomatic nature of the early stages and a lack of efective screening modalities.Therefore,novel biomarkers for diagnosis,prognosis and drug response are urgently needed.In this thesis,the biological data of NSCLC is taken as the research object,the clustering algorithm of protein sequences encoded by NSCLC-related genes is designed,the prediction model of NSCLC diagnosis is constructed and the early warning signal and critical stage of NSCLC metastasis are detected.The main work is as follows:(1)Protein is the executor of gene function,so the clustering algorithm is studied for the protein sequences encoded by the discovered NSCLC-related genes.Firstly,the protein sequences are converted into numerical sequences based on the two physical-chemical properties of the amino acids.Discrete Fourier transform(DFT)is performed on the numerical sequences to obtain the power spectrum of original protein sequences.Then,the power spectrum of different lengths is evenly scaled to equal length.Finally,the Euclidean distance of the new power spectra sequences is employed as a measurement of the similarities.The clustering result of protein sequences encoded by 62 NSCLC-related genes reveals that clustering is classified according to gene functions.Different kinds of genes with the same function can be identified by the clustering algorithm.The unknown functions of NSCLC-related genes can be predicted according to the clustering result.Through the study of NSCLC-related gene functions,the molecular mechanism of the occurrence and development of NSCLC is preliminarily understood.(2)In order to obtain new biomarkers of NSCLC and establish an effective prediction model of NSCLC diagnosis,the transcription profiles of GSE19188 and GSE40791 downloaded from the Gene Expression Omnibus(GEO)database are studied.Firstly,differential analysis of gene expression data is performed to obtain 805 differentially expressed genes(DEGs).Then,DEGs are used to construct a protein-protein interaction(PPI)network,in which 123 key genes are significantly enriched in 11 cellular pathways.Cancer samples and normal samples can be clearly distinguished based on the differential scores of the 11 key pathways.Finally,the prediction model of NSCLC diagnosis is established by using 18 crosstalk genes in significantly related pathways combined support vector machine(SVM).The test shows that the classification accuracy of the model reaches 97%,indicating that these 18 genes have good robustness and sensitivity as NSCLC diagnostic predictive genes.(3)To further understand the molecular mechanism of NSCLC metastasis,a new method for detecting early warning signals of tumor metastasis using single time-point sample dynamic network biomarkers(tDNB)is proposed.The tDNB module is determined by differential correlation information between normal samples and cancer samples at different time points,and has the ability of disease prediction and disease status change warning.Themethod is applied to the NSCLC metastasis data set.The IIA stage is accurately identified as the critical stage of NSCLC metastasis according to the single time-point sample.The tDNB module of IIA stage is the dominant module of NSCLC metastasis.Functional enrichment analysis shows that the genes contained in the dominant module are associated with cancer cells proliferation and tumor metastasis of stage IIB.173 genes in the leading module have been confirmed to be associated with lung cancer.The above results indicate that using tDNB to determine the critical state of the disease has certain accuracy.In addition,unconfirmed NSCLC-related genes have certain follow-up research value.
Keywords/Search Tags:non-small cell lung cancer(NSCLC), protein sequences encoded by genes, crosstalk genes, single time-point sample dynamic network biomarkers(tDNB)
PDF Full Text Request
Related items