Font Size: a A A

Embedded And Export Of Frequent Subtree Mining Algorithm

Posted on:2010-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:P C KongFull Text:PDF
GTID:2208360278476231Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Tree-structure Data, with characteristics of expressing things distinctly and perfectly, is increasingly applied in many important areas such as computer networks, Web mining, bioinformatics, XML document mining, and so on. To improve the method on constructing project database, decrease the size of project database and increase node-counting effectively, algorithms of frequent embedded and induced subtrees mining are researched based on the technique of right-most expanding for the ordered trees. The main research works are as follows:(1) A DIFTM algorithm of frequent embedded subtree mining based on discrete interval is presented. First, by making use of the right-most path expanding, frequent k-subtrees are directly expanded to frequent (k+1)-subtrees without generating candidate subtree. Second, constructing the right-most path project database eliminates some redundant projections by making use of the discrete interval, so that the size of the project database is effectively reduced. Furthermore, discrete interval is firstly searched for, and frequent nodes are counted on them, so that scaning some of projection repeatedly is avoided. In the end, experimental results on CSLOGS data confirm that the DIFTM algorithm is correct and efficient.(2) An EFITM algorithm of frequent induced subtree mining is proposed based on encoding. Firstly, width-first encoding is used to express the initial database and make redundant nodes in the right-most path project no exist, which greatly diminishs the encoding size of every single projection in the project database. Secondly, intervals with encoding are used to denote the project database of the nodes on the right-most path of subtrees, and all the redundant projections are eliminated and the size of the project database is decreased. Finally, theory analysis and experiment results show the correctness and the validity of the EFITM algorithm.
Keywords/Search Tags:Data mining, Frequent embedded subtree, Frequent induced subtree, Discrete interval, Project database, Redundant projection, Encoding
PDF Full Text Request
Related items