Font Size: a A A

Research On Succinct Frequent Pattern Mining Algorithm Based On Positional Information

Posted on:2012-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:A G ZhangFull Text:PDF
GTID:2178330338491084Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, data stream is an emerging hot research direction, and it carries a variety of application in some areas. Different from data in traditional static databases, data stream is unbounded, high-speed and dynamic. Frequent itemset mining is one of the important research areas in data stream mining. It has been paid more attention by lots of domestic and foreign scholars, and plenty of effective algorithms have been proposed.Firstly, according to the characteristics of data stream and data stream mining, the paper mainly introduces the key issues in data stream mining. The paper also analyzes and summarizes some classical data stream mining algorithms.Secondly, on the basis of the above work, the paper proposes a bit-vector based algorithm of mining maximal frequent itemsets called MMFI-BV and an updating algorithm of mining maximal frequent itemsets called UMMFI-BV. Initially, MMFI-BV transformed every item of each transaction into bit-vector. Then all frequent itemsets were obtained by the bitwise AND operation of the bit-vectors. Finally, a FI-BV-Tree was built based on the bit-vectors, and the maximal frequent itemsets could be found by checking the linking information of the nodes. Besides, an updating algorithm of mining maximal frequent itemsets called UMMFI-BV was proposed. As a new transaction coming, the maximal frequent itemsets could be obtained by implementing bit-shift operation on the bit-vectors in the new sliding window. The algorithm needs not to mine the maximal frequent itemsets repeatedly, and improves the mining efficiency.What is more, in order to reduce the cost of discovering the sequential pattern in the database, the paper presents a new method called PMLS with pattern matching based on the positional information for finding the longest sequential pattern. The method only scans the database once to construct the positional information table for all the frequent items, and then searches the positional information table with pattern matching to find the longest sequence patterns gradually. At the same time, the method makes use of the stack structure to store the matched items so as to save the storage space.Lastly, we implement the above two algorithms with C++. The experimental results show that they are effective in solving their problems, the algorithms proposed in this paper are more efficient than the current ones, and the anticipated results are realized.
Keywords/Search Tags:Data stream, maximal frequent itemsets, sliding window, bit-vector, longest sequential pattern, positional information, pattern matching
PDF Full Text Request
Related items