Font Size: a A A

Research Of Data Mining Technique Based On Complex Structure

Posted on:2006-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2178360182977265Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Structured mining is a new branch of Data Mining, and is picking up unconspicuous knowledge, relation or significative pattern, including trees, graphs, molecules and XML documents, that is, based on structured database, utilizing Statistics, Artificial Intelligence and Neural Networks technique, extracting authentic, novel, interested, implicit, undiscovered, available, and apprehensible knowledge from large database, and realizing knowledge to obtain automatically. Structured mining plays an important role in XML documents mining, web page traffic mining, analysis of molecular evolution, packet routing, biological informatics, biological computing, communication system, image database and city management.However, as we have discovered in our previous study, because of large size of structured database, the number of frequent subtrees usually grows exponentially with the tree size. This is the case especially when the transactions in the database are strongly correlated. This phenomenon has two effects: first, there are too many frequent subtrees for users to manage and use, and second, an algorithm that discovers all frequent subtrees is not able to handle frequent subtrees with large size. The research work of the Dissertation is involved in classical data mining and structured mining and the main contents are as follows:Firstly, we investigate data mining concepts and principles, data pre-processing technologies and tasks, objects, methods, tools, processes and problems of data mining. We also place emphasis on research of Apriori and FP-growth Algorithm and compare the performance of two algorithms.Secondly, we do research on concepts of structured and nonstructured data, actual state and problems of tree structure mining, and FreeTreeMiner algorithm theory. We also study canonical form and pre-processing technologies of free-tree, concepts and properties of closed and maximal tree, and pruning, growing and mining of tree structure.Finally, we design and implement a universal tree structure mining prototype as well as system testing analysis. So, we combine classical data mining methods with structured mining technologies and improve on canonical form and pre-processing, pruning, growing and mining technologies. Based on analysis of prototype system, we can conclude that the demonstration of correctness and validity of algorithm.
Keywords/Search Tags:Structure Mining, Free-tree, Closed Tree, Maximal Tree, Frequent Subtree, Tree Structure Mining
PDF Full Text Request
Related items