Font Size: a A A

The Application Of Machine Learning Algorithms In The Pattern Analysis Of Non-small Cell Lung Cancer Tumor Progression

Posted on:2017-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:2334330512480443Subject:Chemical Process Equipment
Abstract/Summary:PDF Full Text Request
With the development of genome-wide sequencing technology,cancer related data shows explosive growth.How to dig out hidden patterns and useful biological information from these massive biological data has become an important issue in the field of cancer-related bioinformatics.Recently,therefore,advanced machine learning algorithms have been applied in cancer research.Non-small cell lung cancer(NSCLC),the most common type of lung cancer,has been a hot and difficult problem in cancer research with its primacy mortality.A lack of understanding of the mechanism of tumor progression and the limited related research are the main reasons for the high mortality rate of NSCLC.In this paper,the application of machine learning algorithms to NSCLC tumor progression pattern analysis and signature genes identification were studied.The purpose is to identify signature genes from the whole genome for the tumor progression of two major NSCLC subtypes,namely,lung squamous cell carcinoma(LUSC)and lung adenocarcinoma(LUAD),and to provide a deeper understanding of tumor progression mechanism.It builds up the foundation for the personalized treatments development.For NSCLC tumor progression pattern recognition and signature genes identification,the pattern recognition method was applied to analyze the genome-wide mRNA gene expression(GE)values,methylation(ME)values and copy number variations(CNVs)data from a comprehensive point of view.Data used in this case is characterized as ultra-high-dimensional-small-size,high-noise and multi-correlation among variables.A new iterative multiple variable selection strategy considering about the genes' statistical importance,biological correlation,cancer-relation and the contributions to pattern analysis was applied to identify signature genes.For LUSC,GE?ME and CNVs data in TCGA were applied to building tumor progression models.Performance of three kinds of signature gene sets in GE?ME and CNVs classification models were compared and analyzed.In addition,pathway analysis and genetic network analysis using KEGG(Kyoto Encyclopedia of Genes and Genomes)and IPA(Ingenuity Pathway Analysis)indicated the highly related relationship among these three gene sets,and also indicated their immediate relationship with the progression of LUSC.For adenocarcinomas tumor progression research,two independent GE datasets of lung,breast and colon adenocarcinomas are used as training and validation dataset,respectively.Consequently,different signature gene sets were identified in these three adenocarcinomas classification models.Using only their gene expression values,the training samples can be classified very well but the validation samples couldn't be classified at all.Different adenocarcinoma types proved our methods are not tumor type specific.In addition,IPA and function analysis also proved the reasonability of the selected signature genes.
Keywords/Search Tags:Machine learning algorithms, Pattern Recognition, Tumor progression, Lung squamous cell carcinoma, Adenocarcinoma
PDF Full Text Request
Related items