Font Size: a A A

Association Rules Mining And Its Applications In Microarray Gene Expression Data

Posted on:2009-10-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:B PengFull Text:PDF
GTID:1118360272961529Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
The completion of human genome draft (HGD) shows that modern life science research has entered the post-genomic era, the research focus has shifted from structural genomics to functional genomics, and strong interest has arisen regarding the elucidation of interactions between genes. The DNA microarray, a high-throughput method, is able to routinely measure the expression levels of hundreds of thousands of genes simultaneously, so it's a powerful tool to find the relations among genes. Due to its high-throughput experimental data, data mining technique has become an important method to extract useful information from them.To address the problem of association rule mining in microarray gene expression data, this dissertation thoroughly studied the following three aspects: the mining of inter-transaction association rules from time series microarray data, the problem of the absence of gene expression status information in traditional association rules, and the clustering of association rules. The main contributions of this dissertation are summarized as follows:(1) The study of the mining of inter-transaction association rules from time series microarray dataDue to the ignoring of temporal information in time series microarray data, the traditional association rules only reflect the relations among genes at the same time point, and they fail to present the dynamic relations. So we proposed to mine the inter-transaction association rules from such data, and inter-transaction association rules was introduced in details. Some biological information databases, such as gene ontology (GO), iHOP (Information Hyperlinked over Proteins) and DAVID (The Database for Annotation, Visualization and Integrated Discovery), were used to help understanding the inter-transaction association rules. Results show that the rules can extract efficiently hidden information from time series microarray data, and the rules describing the behaviors of genes over times are in accordance with biological background. Therefore, the inter-transaction association rule can be used as a new approach to predict the functions of genes.(2) The study of the absence of gene expression status information in traditional association rulesBy analyzing deeply the problem of the absence of gene expression status information in traditional association rules, we proposed a new type of association rules, differential expression association rules (DEAR), and their definition and relative concept were introduced. In order to mine DEAR efficiently, differential expression association rules matrix algorithm (DEARM algorithm) was proposed, and a detailed description was given. Experimental results indicate that DEAR has better performance than traditional association rules on extracting gene expression patterns and controlling redundant rules. DEAR as a new type of association rules enriches the association rules mining technique, which will help researcher to reveal the hidden interactions among genes from microarray data.(3) The study of the clustering of association rulesA large number of association rules are usually discovered from microarray data, and it is difficult to analyze and utilize them. For the sake of tackling this problem, we proposed to cluster association rules. In this paper, we proposed a new similarity metric to cluster association rules efficiently, which measures the similarity between both the structure and the contents of two rules. Hence it overcomes the drawback of traditional similarity metrics focusing only on contents. By analyzing intensively the sub-cluster of association rules together with the Gene Ontology (GO) annotation database, we found that the genes consisting of association rules in the same sub-cluster have similar or relevant biological background, indicating the value of clustering for association rules. Accordingly, clustering is an important visual technique for association rules mining to find hidden interesting patterns.
Keywords/Search Tags:Association Rules Mining, Inter-Transaction Association Rules, Differential Expression Association Rules, Gene Chips, Microarray, Gene Expression Data, Gene Interactions, Clustering Analysis, Similarity, Data Mining, Apriori Algorithm
PDF Full Text Request
Related items