Font Size: a A A

The Research On Frequent Subtrees Mining And Corresponding Techniques

Posted on:2010-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:X GuoFull Text:PDF
GTID:2178360275496325Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer and information technology, a great amount of data is accumulated in daily word and in scientific research. How to extract useful information from these data is a great challenge for today's researchers in information science. Data mining appears in this situation.Recently, data mining and its applications have already come into many disciplines and achieved plentiful fruits in diversified fields, including artificial intelligence and machine learning, database, pattern recognition, bioinformatics, neural computing, and so on. Frequent pattern mining is a basic problem of data mining, including mining transactions, sequences, tress and graphs. The algorithm for it has been prevalently used in many other data mining task, such as association analysis, period's analysis, maximal and closed patterns, query, classification and index technology etc.In large amounts of data accumulated in reality, exist many structured data, such as trees or graphs. They have a great ability to express hierarchies and simulate almost all patterns of links. So the tree-based data mining can be applied widely in areas such as Web mining, spatial data mining, protein structure mining of bioinformatics, drug design and its function prediction and so on. To find the efficient algorithm of frequent subtree mining become a new hotspot in the data mining field. Researching the frequent subtree mining algorithm will be a study of great theoretical significance and application value mining. The main body of this thesis includes:(1) An algorithm ITMSV(induced subtrees mining based on subtree vector) is presented to discover frequent induced subtrees quickly by taking full advantages of the features of subtree vector and combining with the hash table. The algorithm, as a result of constructing a multi-layered data structure, can lessen the time of distinguishing isomorphism during mining, and need scan database only once so that it induces times of scanning and improves the efficiency of algorithm.(2) In this paper, it presents an algorithm UTMiner (unordered trees miner) that can quickly discover frequent unordered trees in large forest. We propose standardized methodology that can quickly convert unordered subtrees into standard subtrees and use algorithm of ordered trees to mine all of standard subtrees.(3) A tree cluster and classification algorithm was proposed based on least closed tree, which effectively solved problems in large amount of data in practical application. The basic method is bringing forward least closed tree as the candidate cluster and classification feature, using dynamic threshold by similarity cluster to make tree cluster operation be more quick and accurate, meanwhile the concept of tree classification rule grade proposed is used in tree classification algorithm, so that the unknown tree structure could be predicted promptly.
Keywords/Search Tags:data mining, frequent subtree, closed tree pattern, induced subtree, isomorphic subtree, unordered trees, free tree, tree cluster, tree classification, web log, frequent subgraph, subgraph isomorphism
PDF Full Text Request
Related items