Research On Frequent Subtree Mining

Posted on:2008-05-13

Degree:Master

Type:Thesis

Country:China

Candidate:W Li

Full Text:PDF

GTID:2178360212495302

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Along with the fast development of Internet and memory technology, the massive data with complicated structured has appeared, such as the biology data, the Web data and the XML data and so on, most of these data may be expressed by the tree or the graph structure. How to extract useful information from such data has became a hot spot in the research of data mining domain. This paper analyzes the current situation of the sequence and the subtree mining method, and researches for the problem of soluting subtree mining by the sequence mining method.Firstly, an algorithm named PETreeMiner is proposed, which is based on projection and encodeing method and used to mining frequent embedded subtrees. The method of prefix projected growth, which is used in mining sequential patterns without candidate generation, is used in the algorithm. The attribute of node's scope is added in the preorder sequence of tree. Each node is encoded during the projection; so that the encoded frequent sub-sequence can be directly corresponded to a frequent subtree. The algorithm is proved to be correctness by theorm and be efficiency by the experimental result.Secondly, an algorithm named IncPETM is proposed, which is used to incrementally mining frequent embedded subtrees. When the database has been changed slightly, mining from scratch will make the previous result useless; also bring about the waste of time and space. In view of such a situation, the existing mining results can be used to mine the new patterns of the updated database, so can avoid mining the new database from scrach. The algorithm is proved to be correctness by theorm and be efficiency by the experimental result.Finally, an algorithm named PEITM is proposed, which is used to mining frequent induced subtrees. The algorithm is proved to be correctness by theorm. The algorithm is compared with a projection based algorithm, named PFTM, byan example. Through the comparison, Similarities and differences between the two algorithms are discovered, and the efficiency similarities are also found out, but the PEITM algorithm has a better scalability compared to the PFTM algorithm.

Keywords/Search Tags:

Data Mining, Sequence Mining, Tree Mining, Frequent Subtree, Incremental Mining

PDF Full Text Request

Related items

1	Study On Frequent Subtree Mining And Its Application In XML Mining
2	Study On Frequent Subtree Sequence Mining
3	The Research On Frequent Subtrees Mining And Corresponding Techniques
4	Research And Application Of Data Mining Algorithm Based On Graph Pattern
5	Research On Frequent-subtree Mining Algorithm Based On The Sequence Coding
6	Frequent Subtree Mining Application In Xml Mining
7	Embedded And Export Of Frequent Subtree Mining Algorithm
8	Research On Frequent Pattern Mining In XML
9	Research On Closed Frequent Subtree Mining
10	Research Of Data Mining Technique Based On Complex Structure