Font Size: a A A

Breast Cancer Related Genes Screened By Distillation Algorithm Based On AP Cluster Analysis

Posted on:2015-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z M LiuFull Text:PDF
GTID:2284330464966628Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
High-throughput sequencing technology provides a new approach to explore the relation between human gene expression and breast cancer via obtaining gene information precisely and comprehensively, and meanwhile leads to an enormous challenge- how to select out cancer-related genes from nearly 30,000 known human genes. Based on the expression difference of single gene, traditional method is unable to handle with interactions among genes. Statistical significance of noise features produced during sequencing process is usually higher than that of normal gene features, as a result, some noise features which are difficult to interpret biologically are selected as cancer-related genes. To overcome these shortcomings, this paper proposes a novel distillation algorithm based on AP cluster analysis to screen breast cancer-related genes. The major achievements are outlined as follows:1. An original distillation algorithm based on AP cluster is proposed to screen breast cancer-related genes. Distillation algorithm firstly clusters all genes into several gene sets, followed by screening cancer-related genes in individual gene set, and finally combines all screened genes together. The whole process is kept iterative until number of screened genes meets the pre-set value. Similar to physical distillation process, this algorithm consists of AP cluster process, screening process and condensation process. AP process is to cluster the genes with similar function or with interactions into the same gene set. Screening process is able to select out cancer-related genes by adjusting relative parameters. Statistical significance of these screened genes may not be high, but there may be interactions among them. Condensation process is to combine the screened genes together for the next iteration.2. This paper screens 473 breast cancer-related genes from 20141 human genes, and clusters 473 genes into 9 gene sets via distillation algorithm. Considering the fact that the known breast cancer-related genes mainly are clustered into the eighth gene set, this paper presents a biological interpretation of relationship among 77 genes in this set and breast cancer, and lists new 66 genes in this set needed to pay more attention insubsequent studies. In addition, this paper analyzes methylation state of 77 genes and filters out 4 bases whose methylation state shares strong correlation with corresponding genes.3. This paper implements gene set enrichment analysis on 9 clustered gene sets to verify the technological effectiveness and biological rationality.Besides, this paper explores the influence of parameters setting on algorithm and advises a standard of parameter optimization. Compared with traditionally single gene analysis, distillation algorithm is able to detect interactions among genes and screen breast cancer-related genes with better biological interpretation. Considering the fact that the screening process is independently carried out in individual set, several screening processes can be parallel to implement efficiently in different sets. In addition, distillation algorithm is able to screen better cancer-related gene sets and to be applied flexibly in kinds of situations by adjusting parameters.
Keywords/Search Tags:Affinity Propagation, Gene Set Enrichment Analysis, Distillation Algorithm, Breast Cancer
PDF Full Text Request
Related items