Font Size: a A A

Research On Frequent Subtree Mining

Posted on:2008-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2178360212495302Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the fast development of Internet and memory technology, the massive data with complicated structured has appeared, such as the biology data, the Web data and the XML data and so on, most of these data may be expressed by the tree or the graph structure. How to extract useful information from such data has became a hot spot in the research of data mining domain. This paper analyzes the current situation of the sequence and the subtree mining method, and researches for the problem of soluting subtree mining by the sequence mining method.Firstly, an algorithm named PETreeMiner is proposed, which is based on projection and encodeing method and used to mining frequent embedded subtrees. The method of prefix projected growth, which is used in mining sequential patterns without candidate generation, is used in the algorithm. The attribute of node's scope is added in the preorder sequence of tree. Each node is encoded during the projection; so that the encoded frequent sub-sequence can be directly corresponded to a frequent subtree. The algorithm is proved to be correctness by theorm and be efficiency by the experimental result.Secondly, an algorithm named IncPETM is proposed, which is used to incrementally mining frequent embedded subtrees. When the database has been changed slightly, mining from scratch will make the previous result useless; also bring about the waste of time and space. In view of such a situation, the existing mining results can be used to mine the new patterns of the updated database, so can avoid mining the new database from scrach. The algorithm is proved to be correctness by theorm and be efficiency by the experimental result.Finally, an algorithm named PEITM is proposed, which is used to mining frequent induced subtrees. The algorithm is proved to be correctness by theorm. The algorithm is compared with a projection based algorithm, named PFTM, byan example. Through the comparison, Similarities and differences between the two algorithms are discovered, and the efficiency similarities are also found out, but the PEITM algorithm has a better scalability compared to the PFTM algorithm.
Keywords/Search Tags:Data Mining, Sequence Mining, Tree Mining, Frequent Subtree, Incremental Mining
PDF Full Text Request
Related items