Font Size: a A A

Research On Frequent Pattern Mining In XML

Posted on:2007-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2178360182986293Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
XML is a self-descriptioned meta-tag language, and it is properly oriented-data. Because of its extensibility and flexibility it can descript different structured data on website, and it can combine with data from different sources, so people gradually use it as the standard of signifying and exchanging information. Moreover, because the data based on XML is self-descriptioned, so it can be managed without internal description, which provides many convenient conditions for organizations, software developers, websites and terminal users.With wide using of XML, it is more and more important to extract valuable information, especially to mine potential rules and patterns in XML. So mining frequent patterns from XML become an important research domain.The thesis introduces concepts and present research status about data mining, semi-structured data mining and XML, and produces an oriented-XML treelike object model named TOM. Then we research frequent patterns discovering problem on XML, and produce an algorithm based on XML named XMLMINER. Finally we produce a pruning method to improve our algorithm.The major contributions of the thesis are as follows:1. Semi-structured data models and data contents of XML are analyzed, and pointing to the limitations of which semi-structured data models descript data of XML, a treelike object model named TOM is produced and it is used as data model when we mine frequent patterns in XML.2. An algorithm named XMLMINER to mine frequent patterns in XML is produced. The keys of the algorithm are both the generations of candidate subtrees and their frequency counting. The technique named prefix equivalence class that used in TreeMiner is improved to generate candidate subtrees, and occurrence lists is used in counting the frequency of candidate subtrees.3. A pruning method is produced to improve our algorithm. The pruning method can permits us to directly get some undiscovered frequent patterns from some discovered frequent patterns, so that deceases quantity of candidate subtrees and time that used to count the frequency of their, thereby improves the efficiency of our algorithm.
Keywords/Search Tags:Web mining, XML, semi-structured data model, labeled ordered tree, frequent subtree
PDF Full Text Request
Related items