Font Size: a A A

Research On Frequent Itemsetalgorithm Based On Bit-sequence

Posted on:2013-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:J YiFull Text:PDF
GTID:2218330362462979Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Frequent itemsets mining is a crucial problem in the field of data mining. But inreal-word applications, the database is very large, the traditional method of frequentitemsets mining cannot satisfied the mining requirements, so in order to improve themining efficiency of these algorithms, we should overcome the following issues: First, thememory space taken by the database is too large; Second, large numbers of candidateitemsets will be generated; Third, the database needs to be rescanned when newtransactions are inserted into a database. To resolve these problems, this paper has mainlyfocused on the research of algorithms for mining frequent itemsets. These algorithms canbe widely used in the areas of frequent patterns mining.Firstly, this paper researches the search method and data representation method of thefrequent pattern, and some updating problems in the incremental mining, and some relatedalgorithms are also given. The breadth and depth first search are included in the searchmethod, the vertical and horizontal data representation are included in the datarepresentation way, and the updating problems of the incremental mining also includethree kinds of cases.Secondly, to decrease the number of candidate itemsets based on bit-sequence, anovel algorithm called FIM-BS for mining frequent itemset is proposed in this paper. Firstwe adopt a vertical data format—bit-sequence to compress the database to save thememory space and define Ilink-array and FNC-tree to store the related information of thefrequent itemset. Then this algorithm uses the top-down traversal strategy in miningprocess and uses the Apriori anti-monotone property to prune FNC-tree to get all thefrequent itemsets quickly.Lastly, a novel incremental algorithm based on bit-sequence for mining frequentpattern IFPM-BS is presented, when new transactions added into the original database, thealgorithm can effectively mine the new frequent patterns. First the concept of the pre-largeitemset is cited, while an item is not large in the original database but large in the newtransactions, we can use the safety number to avoide scanning the original database. And then the FCS-tree is constructed according to the bit-sequence to store the relatedinformation and the original FCS-tree is updated by the existing frequent items. Finally wecan get the mining result by FP-growth algorithm, which can effectively reduce the timeconsumption and obviously improve the mining efficiency.Our algorithm is written in C++, sparse synthetic dataset T10I4D100K and dense realdatasets Mushroom and BMS-POS are adopted for mining frequent itemsets in theexperiments.
Keywords/Search Tags:frequent pattern, incremental mining, depth-first search, vertical data format, bit-sequence, pre-large itemset
PDF Full Text Request
Related items