Font Size: a A A

Mining Top-K Frequent Closed Itemsets From Gene Expression Data

Posted on:2011-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:L C ZhaoFull Text:PDF
GTID:2178330338989578Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Gene expression data are with many hidden features and gene regulation of gene network information. Data mining can help biologists to quickly find valuable information. Frequent closed pattern mining is one of the important methods for expression data analysis. Thus in recent years, frequent closed itemsets mining has become a hot research topic. During the mining of frequent itemsets or frequent closed itemsets, it is often difficult to determine the minimum support threshold. As such, mining top-k frequent closed itemsets was proposed recently.As mining top-k frequent closed itemsets usually involves large amount of data with high computation complexity, top-k frequent closed itemsets mining has become a great challenge. Existing frequent closed itemsets mining algorithms generate too many redundant patterns with either low supports or small pattern length, which makes them rather inefficient.In this thesis, we have proposed a novel efficient algorithm to mine top-k frequent closed items. Our main contributions are as follows:First, we adopt the fp-tree structure by using best first search so as to avoid generating redundant patterns with small pattern length and low supports, thereby significantly improving the algorithm efficiency.Second, we employ hash function for closeness checking, such that all nodes with the same support could be checked in one round, which greatly decrease the time complexity of our algorithm.Third, as the mining of nodes with same support is independent, our algorithm is really for parallelism. Hence, it could be employed to mine large and dense datasets.Finally, we experiment with real biological data and synthetic data. The experimental results show that our proposed algorithm outperforms existing algorithms with high efficiency, especially for large and dense datasets.
Keywords/Search Tags:top-k, frequent closed itemsets, data mining, gene expression data
PDF Full Text Request
Related items