Font Size: a A A

Research On Mining And Querying Frequent Patterns Based On Simplified Frequent Pattern Tree

Posted on:2012-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:X F HuFull Text:PDF
GTID:2218330338951655Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Frequent pattern mining is an important work in data mining and a critical step in association rules mining, it can also be used in many data mining tasks such as classification, clustering, prediction etc. Currently, the association rules mining results are exported interactively online, how to query a specified frequent pattern from a large number of frequent patterns becomes an important factor which influences the overall performance of online query system. In this paper, we research efficient algorithm for frequent pattern mining and querying.Currently, the results of most frequent pattern mining algorithms are based on the form of itemsets which result in not only a large number of common prefix itemsets redundant storage but also the patterns matching(itemsets querying) more time-consuming. In this paper we propose a new simplified representation for completed frequent patterns on the basis of research for the basic concepts and knowledge of data mining and association rules. Considering the size of storage, SFP-tree is much smaller than completed frequent patterns and it can be expanded into frequent patterns easily. Pattern matching can be achieved efficiently through the SFP-tree. The algorithm for producing the SFP-tree is proposed. We visualize the SFP-tree rely on tree control in MFC, we also propose the algorithm of querying frequent patterns which based on SFP-tree.The main research contents in this paper are as follows:1. A frequent patterns mining algorithm, which use a structure of simplified frequent pattern tree(SFP-tree) to represent the mining results is proposed. MSFT algorithm uses an improved depth-first search strategy to generate frequent itemsets, meanwhile using bit-vector approach to store support sets of frequent itemsets. SFP-tree's logical storage structure and the organization of the tree nodes are also introduced. Experiments show that MSFT is better than other algorithms.2. A frequent patterns querying algorithm based on simplified frequent patterns tree is proposed. Firstly the way how to expand the SFP-tree into completed frequent patterns is proposed, we need deal with an important kind of node in SFP-tree which called single children path node. We regard their child nodes as their sibling nodes at the same time. Secondly we proposed a new algorithm based on SFP-tree for frequent pattern querying. The algorithm scans the SFP-tree's child nodes in order from its root node, once a node contained an item in a specified frequent itemset is found, searching downward along the node found. The search process can also expand the SFP-tree into a frequent patterns tree.
Keywords/Search Tags:data mining, frequent patterns mining, simplified frequent patterns tree, patterns query
PDF Full Text Request
Related items