Mining Top-K Frequent Closed Itemsets From Gene Expression Data

Posted on:2011-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:L C Zhao

Full Text:PDF

GTID:2178330338989578

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Gene expression data are with many hidden features and gene regulation of gene network information. Data mining can help biologists to quickly find valuable information. Frequent closed pattern mining is one of the important methods for expression data analysis. Thus in recent years, frequent closed itemsets mining has become a hot research topic. During the mining of frequent itemsets or frequent closed itemsets, it is often difficult to determine the minimum support threshold. As such, mining top-k frequent closed itemsets was proposed recently.As mining top-k frequent closed itemsets usually involves large amount of data with high computation complexity, top-k frequent closed itemsets mining has become a great challenge. Existing frequent closed itemsets mining algorithms generate too many redundant patterns with either low supports or small pattern length, which makes them rather inefficient.In this thesis, we have proposed a novel efficient algorithm to mine top-k frequent closed items. Our main contributions are as follows:First, we adopt the fp-tree structure by using best first search so as to avoid generating redundant patterns with small pattern length and low supports, thereby significantly improving the algorithm efficiency.Second, we employ hash function for closeness checking, such that all nodes with the same support could be checked in one round, which greatly decrease the time complexity of our algorithm.Third, as the mining of nodes with same support is independent, our algorithm is really for parallelism. Hence, it could be employed to mine large and dense datasets.Finally, we experiment with real biological data and synthetic data. The experimental results show that our proposed algorithm outperforms existing algorithms with high efficiency, especially for large and dense datasets.

Keywords/Search Tags:

top-k, frequent closed itemsets, data mining, gene expression data

PDF Full Text Request

Related items

1	Mining Top-K Frequent Closed Itemsets From Gene Expression Data
2	Mining Top-k Frequent Closed Itemsets From Gene Expression Data
3	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application
4	Research On Frequent Closed Itemsets Mining Algorithms
5	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
6	Algorithms Of Frequent Closed Itemsets Mining And Their Applications
7	Research On Large Data Streams Mining Technology Applied In Network Automation Management
8	FP-Tree Based Mining Frequent Itemsets Over Data Streams
9	Research On Algorithm For Mining Top-k Frequent Closed Itemsets Over Data Stream
10	Association Rules Algorithm And Its Applications In Medical Data Mining