Font Size: a A A

Research On Mining Frequent Itemsets Algorithm Based On Bittable

Posted on:2013-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:S S FengFull Text:PDF
GTID:2218330362962841Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Frequent Itemsets Mining (FIM) is one of the most crucial and essential problems inthe field of data mining, there have already been many researches for it at present.However, three main bottlenecks exist when frequent itemsets are mined from large densedatabase. First, the speed of processing data is not high. Second, large number of space isoccupied in the mining process. Third, the frequent itemsets mined is not overall and someusers interesting itemsets can not be mined efficiently. To solve the above problems, threealgorithms based on BitTable are proposed to mine users interesting frequent itemsets,respectively. The research results can be widely used in the market analysis, customer'spurchasing behavior forecast and stock analysis and so on.Firstly, in order to improve the speed of processing data, this paper proposes a newalgorithm based on BitTable, BTFIM, to mine frequent itemsets. The transaction databaseis compressed into BitTable, and the support of every single item is counted. Then thecandidate itemset tree, CITree, is built level by level. The linked-frequent-itemset isdetermined, meanwhile, the frequency of the non-linked-frequent-itemset is tested todelete the infrequent itemsets, while each level is constructed. Finally, the CITree istraversed in breath-first search stragegy to record all frequent itemsets.Secondly, to reduce the memory usage in the mining process, a novel algorithm basedon Index Array, Index-FCI, is proposed in this paper for mining frequent closed itemsets.The transaction database is scanned once and compressed into BitTable. Then the IndexArray corresponding to the database is constructed through the BitTable. Afterwards, thegreat frequent itemset, GFI, and frequent itemsets are found from the Index Array. Finally,all of the frequent closed itemsets can be found by using the hash checking strategy.Lastly, to mine user interested itemsets, an algorithm based on BitTable and IndexArray, Index-BTFITI, is proposed to mine frequent inter-transaction itemsets. First, thedatabase is compressed into BitTable. Second, the representative item and thecorresponding subsume index in Index Array can be calculated using the horizontalBitTable. Final, the element of Index Array is classified to find all frequent inter-transaction itemsets.Our algorithms are implemented with C++language. Experimental results show thatthe algorithms proposed in this paper are more effective in resolving their respectiveproblems than the current ones, and the desired goals are realized.
Keywords/Search Tags:frequent itemset, frequent closed itemset, frequent inter-transaction itemset, BitTable, Index Array
PDF Full Text Request
Related items