Font Size: a A A

Data Mining For Special Cancer Biomarker Based On DMarker System

Posted on:2012-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2178330335450976Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cancer is an abnormal growth of cells caused by multiple changes in the process of gene expression. The fatality rate of cancer has exceeded the rate of heart disease in 2010, becoming a leading killer in the whole world. It is estimated that nearly 12 million people all over the world will be suffered from certain types of cancer, and 0.8 million of them would die in 2011. While "Early detection early treatment" becomes golden rule for saving patients' life, better biomarkers are urgently needed to improve the effectiveness of cancer diagnosis and clinical confirmation. During malignant transformation, genetic alterations in tumor cells can disrupt autocrine and paracrine signaling networks, leading to the over-expression of some classes of proteins such as growth factors, cytokines and hormones that may be secreted outside the cancerous cells. These secreted proteins may get into blood, urine or other body fluids through various complex secretion pathways and can potentially be used as marker proteins for blood or urine tests. Recent genomic studies on various cancer specimens have identified numerous genes that are consistently over-expressed and some of these genes encode secreted proteins.Our study here is to find secreted combination in blood biomarkers through the research on bioinformatics. In the wake of high-throughput genomic and proteomic technology has become research hot-spot, great amount of concentration has been devoted into cancer biomarkers and related study. Microarrays allow researchers to monitor in the process of gene expression that let tens of thousands of genes involve into wide range of cellular responses, phenotype and conditions. Therefore, to select typical small amount of subset of discriminate genes from thousands of genes is an important event for classification of diseases and phenotypes precisely. In our research, the dataset of lung cancer is picked up from DMarker system which is a database and querying system developed by our team.GDS2771 (lung cancer) has been chosen by us, which can be divided into two categories, i.e., normal (90 samples) and disease (97 samples). Our study apply the following data analysis process to discover the combination biomarkers.First, Genes differentially expressed analysis. We applied T test to the microarray data which consists of 12236 genes and detected 2577 differential gene expression.Second, filter the 2577 genes by the attribute of blood secreted.1995 blood secreted genes were predicted by Cui's method, than we found the overlap of the blood secreted genes (1995) and the differential expression genes (2577) is 303.Third, for reducing the computational complex, a gene ranking step is needed. Here we have many choices, such as p-value, AUC, S score. After ranking, we applied the top 30 to the next step.The last, combine the filtered and ranked biomarkers. We treat each gene as an evidence for diagnosis. Combine the biomarkers by using Dempster-Shafer combination rule. Ranking the combination results by basic probability and test them through DMarker system. Finally, we got several outstanding combination biomarkers.Our method is not only can apply to the cancer dataset, but also can accommodate other disease datasets. Other potential combination of biomarkers and biological experiment for confirming are needed to be done in the near future. However, the results developed by us in so far, have powerful capability to provide appropriate explanations and clinical evidence for biologists and medical doctors.
Keywords/Search Tags:Microarray, DMarker system, blood secreted protein predict, combination biomarker, Dempster-Shafer theory of evidence
PDF Full Text Request
Related items