Font Size: a A A

Research On Closed Frequent Subtree Mining

Posted on:2017-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y TangFull Text:PDF
GTID:2348330512955963Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technology has enabled us to enter a big data era.The application of big data has been constantly changing our daily life.With the further application of Internet technology, the rapid increase of semi structured data on the Internet has become a hot research topic in the academic field. Because of the continuous development of frequent pattern mining algorithms, the tree structure mining has become an efficient way to deal with semi structured data. Mining frequent subtrees can effectively extract the hidden information in semi-structured data, so it is widely used in medical, Internet, communication, bioinformatics and web mining fields.This paper studies closed frequent subtree mining algorithm and proposes a closed frequent subtree mining algorithm——PCTM. PCTM algorithm uses the pattern growth strategy. The algorithm finds frequent subtrees along the direction of the decrease in frequency. It gradually compresses tree structures. For each round, the algorithm compresses edges in line with the current frequency and mines frequent subtrees in compressed structures. With the decline in the frequency of the process, each tree will be compressed. When all edges of the tree structure are processed, the whole tree is compressed to be a node. When all the tree dataset is compressed, the algorithm terminates to obtain frequent subtree set.PCTM algorithm uses a top-down approach.The algorithm starts from the edge of the maximum support, and continuously processes the edge with the large support. Due to the tree compression model, each iteration of the process can deal with multiple edges rather than processing a single node.So in the process of the algorithm, it will quickly deal with multiple nodes, improving the efficiency of the algorithm.When obtaining the compressed subtree, the algorithm needs to determine whether the sub structure contains frequent subtree.Using prefix match method, the node in compression structure is taken as a root node to generate frequent subtree.In the process of construction of each new frequent subtree, nodes in the infrequent substructure are taken as the root nodes to find possible frequent subtrees. The algorithm will mine frequent subtrees on smaller data sets.This ensures that the algorithm can quickly mine all possible frequent subtrees. Finally, experiments carried out on the artificial data and real data set prove the feasibility and high efficiency of PCTM algorithm.
Keywords/Search Tags:Data Mining, Frequent Subtree, Frequent Pattern Mining
PDF Full Text Request
Related items