Font Size: a A A

Microarray Data Mining And Bioinformatics Analysis Based On Tuberculosis Gene Chip Data

Posted on:2019-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:L HeFull Text:PDF
GTID:2370330566979105Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In this report,we tried to look for molecular markers that can be used for the di-agnosis and treatment of tuberculosis.Firstly,we analyzed the gene expression profile of active tuberculosis and latent tuberculosis and used random forest algorithm to es-tablish a predictive model to identify susceptibility genes associated with the status of tuberculosis infection.Secondly,we performed comparative and bioinformatics analysis of the genes of peripheral blood mononuclear cells from healthy people and pulmonary tuberculosis patients.We found the potential biomarkers both tuberculosis and healthy control individuals.In the first chapter,we introduced the background of Mycobacterium tuberculosis,the pathogenic mechanism of tuberculosis,the research progress of molecular markers for the diagnosis and identification of tuberculosis at home and abroad,and the theoretical knowledge needed in this paper.In the second chapter,the gene expression profiling of active tuberculosis and latent tuberculosis was analyzed and compared.First,the differentially expressed genes were screened by variance filter.Secondly,a random forest algorithm was used to build a model to predict the genes associated with the status of TB infection and select the relevant genes that are ranked first.We refered to the previous literature and biological analysis to show that these genes are closely related to the state of tuberculosis infection.In the comparison of model prediction methods,we found that the random forest model is simpler,faster,and has a better fitting effect.In the third chapter,the comparison of the peripheral blood mononuclear cells from patients with tuberculosis and healthy controls were performed.Then,GO functional enrichment analysis and KEGG pathway analysis were performed on the differentially ex-pressed genes screened.At the same time,through the construction of protein interaction networks and module analysis.We found potential genes related to tuberculosis.Finally,the reliability of the markers was verified by constructing a classification model.In the last chapter,we make a brief review of the above conclusion,and it is emphat-ically explained that the data mining and integration in this paper can be an effective tool to study the diagnostic markers of tuberculosis and its occurrence and develop-ment mechanism.And we analyze some shortcomings of this paper and point out some questions and future work.
Keywords/Search Tags:Tuberculosis, Random forest, GO enrichment, KEGG pathway, Biomarkers
PDF Full Text Request
Related items