Font Size: a A A

Research On Frequent Itemsets Algorithm Based On Projection Array And Fp-tree

Posted on:2011-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:H Y CaoFull Text:PDF
GTID:2198330338491110Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Frequent itemsets mining is a crucial problem in the field of data mining. But there are three main difficult problems when mining frequent itemsets from large dense database. First, the efficiency of algorithms is not very high; Second, large numbers of frequent itemsets will be generated; Third, few algorithms refer to the reasonable constraint method, so they can't mine interesting patterns efficienctly. To resolve these problems, this paper has mainly focused on the research of algorithms for mining frequent itemsets. These algorithms can be widely used in customer's purchasing behavior pattern forecast, sequence analysis and software security analysis and so on.Firstly, a new frequent itemsets mining algorithm MFIPA is proposed based on projection array. Making use of data horizontally and vertically, those itemsets that co_occurence with single frequent items are found by computing intersection, and projection array PArray is constructed. Then all of the frequent itemsets can be mined by connecting the single frequent item with every nonempty subsets of its projection and using depth-first search strategy.Secondly, to decrease the number of frequent itemsets, a novel algorithm called FCIL-Mine for mining frequent closed itemsets is proposed in this paper. Firstly, a new data structure frequent closed itemsets lattice FCIL is proposed based on PArray, which is used for maintaining the essential information of frequent closed itemsets. Then, all of the frequent closed itemsets can be found by using hash table check and subsume-check pruning strategy.Lastly, a novel algorithm LWFI-Mine for mining weighted frequent itemsets with length decreasing support constraint is presented, which can mine interesting patterns with users'requirement efficienctly. Firstly, a data structure wighted FP-tree is constructed by scanning database. Then the notion of the weighted smallest valid extension (WSVE) property is proposed, based on WSVE, transaction pruning, node pruning and path pruning are applied to reduce the search space of FP-tree. Finally, all of frequent itemsets with constraint can be mined.Our algorithm is written in C++, sparse synthetic dataset T40I10D100K and dense real dataset Connect are adopted for mining frequent itemsets in the experiments.
Keywords/Search Tags:frequent itemsets, frequent closed itemsets, projection array, length decreasing support constraint, weight constraint, depth-first search
PDF Full Text Request
Related items