Research On Succinct Frequent Pattern Mining Algorithm Based On Positional Information

Posted on:2012-12-17

Degree:Master

Type:Thesis

Country:China

Candidate:A G Zhang

Full Text:PDF

GTID:2178330338491084

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, data stream is an emerging hot research direction, and it carries a variety of application in some areas. Different from data in traditional static databases, data stream is unbounded, high-speed and dynamic. Frequent itemset mining is one of the important research areas in data stream mining. It has been paid more attention by lots of domestic and foreign scholars, and plenty of effective algorithms have been proposed.Firstly, according to the characteristics of data stream and data stream mining, the paper mainly introduces the key issues in data stream mining. The paper also analyzes and summarizes some classical data stream mining algorithms.Secondly, on the basis of the above work, the paper proposes a bit-vector based algorithm of mining maximal frequent itemsets called MMFI-BV and an updating algorithm of mining maximal frequent itemsets called UMMFI-BV. Initially, MMFI-BV transformed every item of each transaction into bit-vector. Then all frequent itemsets were obtained by the bitwise AND operation of the bit-vectors. Finally, a FI-BV-Tree was built based on the bit-vectors, and the maximal frequent itemsets could be found by checking the linking information of the nodes. Besides, an updating algorithm of mining maximal frequent itemsets called UMMFI-BV was proposed. As a new transaction coming, the maximal frequent itemsets could be obtained by implementing bit-shift operation on the bit-vectors in the new sliding window. The algorithm needs not to mine the maximal frequent itemsets repeatedly, and improves the mining efficiency.What is more, in order to reduce the cost of discovering the sequential pattern in the database, the paper presents a new method called PMLS with pattern matching based on the positional information for finding the longest sequential pattern. The method only scans the database once to construct the positional information table for all the frequent items, and then searches the positional information table with pattern matching to find the longest sequence patterns gradually. At the same time, the method makes use of the stack structure to store the matched items so as to save the storage space.Lastly, we implement the above two algorithms with C++. The experimental results show that they are effective in solving their problems, the algorithms proposed in this paper are more efficient than the current ones, and the anticipated results are realized.

Keywords/Search Tags:

Data stream, maximal frequent itemsets, sliding window, bit-vector, longest sequential pattern, positional information, pattern matching

PDF Full Text Request

Related items

1	Research On Frequent Patterns Mining Algorithm Based Sliding Window In Data Streams
2	Research On Frequent Pattern Mining Algorithm Oriented To Data Stream
3	FP-Tree Based Mining Frequent Itemsets Over Data Streams
4	Research On Algorithm For Mining Maximal Frequent Itemsets Over Data Streams
5	Research On Frequent Pattern Mining Algorithm Of Data Stream Based On Sliding Window
6	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
7	Research On Algorithms For Mining Maximal Frequent Itemsets
8	Research On Frequent Itemsets Mining Algorithm In Data Stream
9	The Research And Implementation Of Mining Frequent Itemsets Algorithm Over Streaming Data
10	Research On Optimization Of Data Stream Frequent Itemsets Mining Algorithm Based On Sliding Window