Font Size: a A A

Research Of Mining Frequent Closed Itemsets From Gene Expression Datasets

Posted on:2011-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:J J ShiFull Text:PDF
GTID:2178330338978111Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Some wealth bioinformation has been hidden in gene expression datasets. However, due to the feature of high-dimensional and large volumes of data, the high-performance means is necessary to obtain this informance. The association analysis is simple in form and the result is easy to understand, which becomes gradually an important analysis method in gene expression data analysis. Mining frequent closed itemsets is emphases and difficulty in association analysis.In this paper, the algorithms of mining frequent closed itemset are researched deeply and utterly in gene expression data. An improved algorithms is proposed based on the disadvantages of current algorithms. In gene expression dataset, the minimum support must be given before mining frequent closed itemsets. According to the situations, the paper presented a model of mining top-k frequent closed itemsets and designed the algorithm in gene expression datasets.The main research contents and contributions are described as follows:(1) The existing mining algorithms of frequent itemsets and frequent closed itemsets were anatomized deeply, and then analyzed the advantages and disadvantages of these algorithms from usage and data structure of different algorithm.(2) In row enumeration space, the bottom-up search strategy for mining frequent closed patterns cannot make full use of minimum support threshold to prune search space and results in long runtime and much memory overhead. TP+close algorithm based on top-down search strategy addressed the problem. However, it determined a frequent itemset was closed by scanning the set of frequent closed itemset that have been found. For dense datasets, the algorithm performance will be seriously affected by the scan time. This paper proposed an improved tree structure, TTP+tree. Based on the tree, a top-down algorithm, TTP+close, was developed for mining frequent closed itemsets in gene expression data. TTP+close checked the closeness property of itemset by the trace-based method and thus avoided scanning the set of frequent closed itemsets.(3) Most previous mining frequent closed itemset require the specification of a minimum support threshold in gene expression data. However, in practice, it is difficult for users to provide an appropriate minimum support threshold. An alternative mining task in gene expression data is proposed to mining top-k frequent closed itemsets and an algorithm TBtop is designed. The algorithm uses top-down breadth-first search strategy to mining top-k frequent closed itemsets of length no less than given value min_l and pruning the search space effectively.
Keywords/Search Tags:Gene expression data, Association rules, Frequent closed itemsets, top-k frequent closed itemset, Top-down, Breadth-first
PDF Full Text Request
Related items