Font Size: a A A

Application Of Combined Epigenetics Markers In The Early Diagnosis Of Lung Cancer Based On Data Mining Techniques

Posted on:2014-11-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J FengFull Text:PDF
GTID:1224330431495690Subject:Internal medicine
Abstract/Summary:PDF Full Text Request
At present,lung cancer is a leading cause of cancer death, and also a majorthreat to public health. Therefore, the key to reduce lung cancer mortality will focuson early prevention. The occurrence of lung cancer involves in a multi-factor,multi-phase and multi-gene expression changes, and complex biological processesoccur. Epigenetics DNA methylation and telomere damage of genetic changes areearly biological molecular events in the process of lung cancer.DNA methylation is the main form of epigenetic modification. It mainly dependson the modification of cytosine methylation of CpG sequences to regulate geneexpression. It does not change about DNA sequence. CpG island methylations oftumor-suppressor gene promoter will make the gene inactivation at the transcriptionallevel, and lead to lack of related protein expression, indirectly promote the occurrenceof tumor.Under the catalysis of the DNA methyltransferase (DNA methyltransferase,DNMTs), S adenosine methionine is as a methyl donor for DNA methylation, andtransfers the methyl to the fifth carbon atom of DNA CpG dinucleotide cytosine. Theup-regulated expression is prior to the abnormal methylation patterns, it is consideredto be the characteristics of tumor cells in the early molecular changes, and it mayparticipate in the development of tumor by high DNA methylation.Similar to abnormal DNA methylation, histone modification also plays animportant role in gene expression regulation. In the histone modification, acetylationhas been focused on. Acetylation and deacetylation of histone will be catalyzed by thehistone acetyltransferase (histone deacetylases, HAT) and to acetylation transferase(histone deacetylases, HDAC), respectively. It is shown that HDAC can block the keygene pathway, which may control the occurrence of lung cancer, and HDAC activation may promote tumor formation.Since the process of tumor is extremely complex, the specificities of earlymolecular markers are not high for the diagnosis of lung cancer. Now combineddetection is often used to improve the sensitivity and specificity. But there are manyproblems about parameters and the interaction between tumor markers. Thetraditional statistical methods generally require data with normal distribution.However, the actual data rarely meet such conditions, these methods are unable to usefor processing these data. With the development of data mining, it takes on uniqueadvantages in solving a lot of variables to bring the multi-parameter problem.The appropriate and intelligent classification model will be developed by datamining combined with early molecular biomarkers and clinical parameters. It willimprove the accuracy of the early diagnosis of lung cancer.Most previous studies were focused on the tumor tissue of surgery from patientswith lung cancer, but little on the peripheral blood. This study intends to detect theserum DNA methyltransferase and histone acetylation transferase1, and thetumor-suppressor genes methylation level and relative length telomeres from theperipheral blood DNA. DNA methyltransferase and histone acetylation and DNAmethylation transferase1will be evaluated as early lung cancer molecular events.The value of joint detection for diagnosis of lung cancer and the relationship amongthem will be explored. The suitable prediction model will be developed by datamining technology. The accuracy for early warning or lung cancer diagnosis and thesignificance of combined detection will be probed to realize the early diagnosis oflung cancer and screen for high-risk population. It will provide a valuable tool for thediagnosis of lung cancer.Objective(1) The protein expressions of DNA methyltransferase DNMT1, DNMT3a andDNMT3b and histone acetylation transferase HDAC1would be detected among lungcancer group, lung benign disease group and normal control group. The methylationlevels of tumor suppressor genes FHIT, RASSF1A, and MGMT and relative telomere length would be also determined.(2) The intelligent models are established based on data mining technology,including decision tree, artificial neural network and support vector machine (SVM).And these models would be compared with the traditional Logistic regressionanalysis, the specific sensitive biomarkers will be screened out, and combined witheach other. The early molecular diagnosis model of lung cancer would be developed.Materials and methods1. Selection of the object:From September2012to June2013, a group of136cases with lung cancer were recruited from Department of Oncology and RespiratoryMedicine, the First Affiliated Hospital of Zhengzhou University,140patients withlung benign diseases from the Sixth People’s Hospital of Zhengzhou,145healthyindividuals from the Sixth People’s Hospital of Zhengzhou, who visited the hospitalfor physical examinations. All the study subjects provided written consents and theresearch protocol was approved by the Institutional Review Board of the Hospital.The object epidemiological data were inquired by professional investigators anddoctors and blood samples were collected.2. Detection of DNA methyltransferase and histone acetylation transferase: Theprotein levels of serum DNMT1, DNMT3a, DNMT3b and HDAC1were detected byenzyme-linked immunosorbent (ELISA).3. Real time fluorescence quantitative methylation specific PCR: The FHIT,RASSF1A and MGMT gene methylation level and relative telomere length from theperipheral blood DNA were determined by the real-time fluorescent quantitative PCRanalysis.4. Statistical analysis under the SPSS12.0software running. The statisticalmethods were selected according to the type of data distribution and statistical testingmethod among groups.2test, t test, F test, unconditioned logistic regression andother methods were used for analysis of peripheral blood DNA methyltransferase andhistone acetylation transferase1protein expression level, and FHIT, RASSF1A,MGMT gene methylation level and relative telomere length. The relationships between DNA methyltransferase, histone acetylation transferase1and FHIT,RASSF1A, MGMT gene methylation and relative length telomeres and early lungcancer were explored. The effective index of the discriminant model could be used astumor markers for early diagnosis of lung cancer. Inspection level for a=0.05.5. Development of intelligent models: By SPSS12.0software Clementine,logistic regression analysis, decision tree, artificial neural network and support vectormachine (SVM) were used to develop the models. The data of DNMT1, DNMT3a,DNMT3b, HDAC1, MGMT, RASSF1A, gender, age, smoking history and FHITvariable were put into the models, the samples were randomly divided into trainingset and prediction set according to the proportion of3:1, the training set was used todevelop the model and the prediction set was used to evaluate for the trained model.The predictions of the models were evaluated in combination with evaluation indexesof diagnostic tests.Results1. The levels of serum DNMT1, DNMT3a and DNMT3b, HDAC1proteinexpression of lung cancer group were higher than those of benign disease group andcontrol group, the difference was statistically significant (P<0.05). There were nocorrelation between the protein expressions of DNMT1, DNMT3a and DNMT3b,HDAC1and the histological type and clinical staging (P>0.05).2. The levels of MGMT, RASSF1A and FHIT gene methylation from theperipheral blood DNA in lung cancer group were higher than those of control groupand lung benign diseases, there was statistically significant difference (P <0.05).According to single factor analysis, MGMT methylation level in lung cancer group isrelated to gender, age and histological type (P <0.05). RASSF1A methylation in lungcancer group is related to age and clinical staging (P <0.05). FHIT methylation isrelated to age and histology in lung cancer group (P <0.05). Three genes methylationlevels were divided into four fractions and2fractions, respectively. With the increaseof methylation, the risk of lung cancer rises (Ptrend<0.05).3. Telomere length from the peripheral blood DNA in lung cancer group was significantly shorter than that of the lung benign diseases and normal group (P <0.001). From multiple linear regression analysis, it was shown that gender, age andsmoking history were associated with telomere length (P <0.001), and with increaseof age, the telomere length shortens (P <0.001).4. By logistic regression analysis, the sensitivity, specificity, accuracy, positivepredictive value, negative predictive value for early diagnosis of lung cancer, andAUC were68.0%,88.6%,70.9%,60.7%,95.1%and0.923, respectively. For the101cases with clinical stageⅠ+Ⅱ, the predictive accuracy was89.11%. Throughdecision tree, the sensitivity, specificity, accuracy, positive predictive value, negativepredictive value for early diagnosis of lung cancer, and AUC were77.8%,95.1%,81.2%,75.0%,95.1%and0.946, respectively. For the101patients with clinicalstageⅠ+Ⅱ, the predictive accuracy was99.01%. According to the results of neuralnetwork, the sensitivity, specificity, accuracy, positive predictive value, negativepredictive value for early diagnosis of lung cancer, and AUC were59.1%,78.0%,60.5%,46.4%,95.1%and0.877, respectively. For the101cases with clinical stageⅠ+Ⅱ, the predictive accuracy was88.12%. By Support vector machine (SVM), thesensitivity, specificity, accuracy, positive predictive value, negative predictive valuefor early diagnosis of lung cancer, and AUC were54.5%,87.5%,62.6%,64.3%,85.4%and0.851, respectively. For the101cases with clinical stage Ⅰ+Ⅱ, thepredictive accuracy was92.08%. For the prediction of101cases with clinical stageⅠ+Ⅱ, AUC of decision tree is more than that of the support vector machine (SVM),and they were significantly higher than logistic regression and neural network.Conclusion:1. The high protein expressions of serum DNA methyltransferase DNMT1,DNMT3a and DNMT3b and histone acetylation transferase HDAC1were associatedwith lung cancer. The abnormalities of DNMT1, DNMT3a and DNMT3b, higherHDAC1fron peripheral blood in patients with lung cancer may be the early effect oflung cancer. There were no correction with histology and clinical staging of lungcancer. 2. The levels of MGMT, RASSF1A and FHIT promoter methylation from theperipheral blood DNA were associated with lung cancer. MGMT and FHIT promotermethylations were associated with the histological types of lung cancer. RASSF1Apromoter methylation was relevant to clinical stage. Peripheral blood relativetelomere length shortening may increase the risk of developing lung cancer.3. The decision tree in data mining technologies for early diagnosis of lungcancer is superior to logistic regression, support vector machine and neural network.It can be used as an optimized method for early diagnosis of lung cancer.
Keywords/Search Tags:DNA methylation transferase, HDAC1, Tumor-suppressor genesmethylation, Telomere, Lung cancer, Early dignosis, Data mining
PDF Full Text Request
Related items