Font Size: a A A

Frequent Subtree Mining Application In Xml Mining

Posted on:2010-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y M YanFull Text:PDF
GTID:2198330338982204Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet in recent years, the number (abundance) and types of web data increase a lot. How to analyze and make use of these data and extract useful information for users, has become a hot research issue.As one of the important research direction of data mining, frequent sub-tree mining has important impact on the following fields: XML mining, bioinformatics, weblog analysis, molecular drug design and molecular drug functions prediction, etc. And more and more experts and scholars pay attention to it. XML, as data description and interaction standard in Internet, owns the characteristics as structured, extensibility, openness, universality, flexibility, etc. Furthermore,XML and tree has the similar structure, so we can apply frequent sub-tree mining technology to XML mining and solve the mining puzzles of XML data with complex layer structure.This paper studies the application of frequent sub-tree mining algorithm in XML mining, puts forward a novel frequent sub-tree mining algorithm, and discusses the frequent pattern mining processes with tree pattern to describe XML data. The main research work includes the following contents:(1)Introducing the related theory of frequent sub-tree and XML mining, includeing the fundamentals of frequent sub-tree mining technology and XML data mining technology, the source, definition, basic structure and characteristics of XML language, the usual algorithms and processing procedures of frequent sub-tree mining, the concepts of frequent induced sub-tree and frequent embedded sub-tree mining.(2)Introducing the concepts of uncertain treee set, certain tree probability and uncertain wish support degree, putting forward a kind of uncertain tree mining algorithm, using characteristic of rapid suitness of Hash table to decrease the time complexity of tree isomorphism judgment in solving wish supporting degree, using layer searching space to mine uncertain tree to make uncertain tree mining fast and accurate, and this algorithm efficiently solve the uncertain puzzle of tree in the application. (3)Studying the application of uncertain tree mining in XML, and demonstating frequent pattern mining amd clustering method of XML documents, and the measuring method of the similarity of XML documents. Converting XML documents to uncertain tree patterns, and applying uncertain tree mining algorithm to mine the information.
Keywords/Search Tags:data mining, frequent sub-tree, frequent sub-tree mining, XML data mining, uncertain tree
PDF Full Text Request
Related items