Font Size: a A A

Research On Mining Maximal Frequent Subtrees Uickly And Efficiently

Posted on:2013-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:X K WuFull Text:PDF
GTID:2248330371983556Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the quick development of Internet, mass data which most exists insemi-structured form, is also increasing rapidly. Moreover, the tree shaped data ofrelative features is the commonly used and important part of the mass data. It iscommonly accepted that mining frequent subtrees play pivotal roles in areas like Weblog analysis, XML document analysis, semi-structured data analysis, as well asbiometric information analysis, chemical compound structure analysis, etc.Now,mining frequent subtrees has become an important research topic, both at homeand abroad. And the researchers pay attention to it closely and highly. By studyingfrequent subtree mining, its relative research activities have spawned great advancesuniversally.An improved algorithm, i.e. MFPTM algorithm, which based on fusioncompression and FP-tree principle, was proposed in this paper to mine frequentsubtrees quickly and efficiently and then get maximal frequent subtrees bymaximization processing. The algorithm firstly retains subtrees which only containfrequent nodes by fusion compression, then according to FP-tree principle constructsMP trees and mines frequent subtrees, and finally obtain maximal frequent subtreesby maximization processing. In the process of mining frequent subtrees, MFPTMalgorithm is the means by which we attempt to satisfy our appetite for savingsearching space of mining candidate patterns, and our craving to solve problems offrequent pattern mining based on Apriori algorithm which is generating a largequantity of candidate patterns. Through the analysis of experimental results, MFPTMalgorithm is better than PathJoin algorithm, both of the amount of frequent subtreesand execution time. MFPTM algorithm, which actively represents as many viewpointsas is both possible and feasible as an advanced algorithm, improves the efficiency of mining frequent subtrees.This paper mainly discusses the relevant principle and technology of miningmaximal frequent subtrees quickly and efficiently and its applications. Its generalstructure is organized as follows:Chapter1systematically introduces the background and significance of minefrequent subtrees and the current research situation both at home and abroad.Chapter2presents the basic introductions of the data mining, the definition oftree of the tree shaped data, the different type of subtree and representation of branch.Meanwhile this chapter also introduces the application of mining frequent subtrees.Chapter3introduces the related methods of the data preprocessing and its effectand elaborates the data preparation and the main aim of fusion compression. Then thischapter puts forward the technology of fusion compression for the tree shaped data,introduces the process of fusion compression, the definition of fusion compressiontree and the idea of pruned process.Chapter4fully explains the principle of FP-tree and its performance, introducesthe basic step of constructing MP tree, which is based on FP-tree principle and itsapplications, and presents the relevant definitions and MP tree generation algorithm.Chapter5introduces the different algorithms of mining frequent subtrees andfinds their shortcomings. Meanwhile this chapter also introduces an improvedalgorithm, i.e. MFPTM algorithm, which based on fusion compression and FP-treeprinciple. MFPTM algorithm not only save searching space of mining candidatepatterns, but also solve problems of frequent pattern mining based on Apriorialgorithm which is generating a large quantity of candidate patterns. Then the relevantdefinition and descriptions of the related algorithm is given in this chapter.Chapter6summarizes the content of this paper, and concludes the shortcomingswhich need to be improved and look forward to the next step of work.
Keywords/Search Tags:data mining, fusion compression, FP-tree, frequent nodes, frequent subtrees
PDF Full Text Request
Related items