Research On Frequent Itemsetalgorithm Based On Bit-sequence

Posted on:2013-01-13

Degree:Master

Type:Thesis

Country:China

Candidate:J Yi

Full Text:PDF

GTID:2218330362462979

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Frequent itemsets mining is a crucial problem in the field of data mining. But inreal-word applications, the database is very large, the traditional method of frequentitemsets mining cannot satisfied the mining requirements, so in order to improve themining efficiency of these algorithms, we should overcome the following issues: First, thememory space taken by the database is too large; Second, large numbers of candidateitemsets will be generated; Third, the database needs to be rescanned when newtransactions are inserted into a database. To resolve these problems, this paper has mainlyfocused on the research of algorithms for mining frequent itemsets. These algorithms canbe widely used in the areas of frequent patterns mining.Firstly, this paper researches the search method and data representation method of thefrequent pattern, and some updating problems in the incremental mining, and some relatedalgorithms are also given. The breadth and depth first search are included in the searchmethod, the vertical and horizontal data representation are included in the datarepresentation way, and the updating problems of the incremental mining also includethree kinds of cases.Secondly, to decrease the number of candidate itemsets based on bit-sequence, anovel algorithm called FIM-BS for mining frequent itemset is proposed in this paper. Firstwe adopt a vertical data format—bit-sequence to compress the database to save thememory space and define Ilink-array and FNC-tree to store the related information of thefrequent itemset. Then this algorithm uses the top-down traversal strategy in miningprocess and uses the Apriori anti-monotone property to prune FNC-tree to get all thefrequent itemsets quickly.Lastly, a novel incremental algorithm based on bit-sequence for mining frequentpattern IFPM-BS is presented, when new transactions added into the original database, thealgorithm can effectively mine the new frequent patterns. First the concept of the pre-largeitemset is cited, while an item is not large in the original database but large in the newtransactions, we can use the safety number to avoide scanning the original database. And then the FCS-tree is constructed according to the bit-sequence to store the relatedinformation and the original FCS-tree is updated by the existing frequent items. Finally wecan get the mining result by FP-growth algorithm, which can effectively reduce the timeconsumption and obviously improve the mining efficiency.Our algorithm is written in C++, sparse synthetic dataset T10I4D100K and dense realdatasets Mushroom and BMS-POS are adopted for mining frequent itemsets in theexperiments.

Keywords/Search Tags:

frequent pattern, incremental mining, depth-first search, vertical data format, bit-sequence, pre-large itemset

PDF Full Text Request

Related items

1	Multi-Relational Frequent Pattern Mining Algorithm And Its Application Research
2	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
3	Research On Intersection Cache Based On Frequent Itemset Mining In Search Engines
4	Research On Frequent Itemsets Mining Algorithm Based On Vertical Format
5	Research And Application On Association Rule Mining
6	The Research And Relization Of Mining Frequent Patterns On Business Data Straems
7	Research Of Frequent Itemsets Mining Algorithms Based On Vertical Data Format
8	New algorithms for frequent sequential pattern and itemset data mining in certain and uncertain databases
9	The Research On Models And Algorithms Of Closed Sequence Mining
10	The Research On IDS Based On Mining Max Frequent Itemset Using Big Step Pruning Strategy