Research On Frequent Pattern Mining In XML

Posted on:2007-12-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2178360182986293

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

XML is a self-descriptioned meta-tag language, and it is properly oriented-data. Because of its extensibility and flexibility it can descript different structured data on website, and it can combine with data from different sources, so people gradually use it as the standard of signifying and exchanging information. Moreover, because the data based on XML is self-descriptioned, so it can be managed without internal description, which provides many convenient conditions for organizations, software developers, websites and terminal users.With wide using of XML, it is more and more important to extract valuable information, especially to mine potential rules and patterns in XML. So mining frequent patterns from XML become an important research domain.The thesis introduces concepts and present research status about data mining, semi-structured data mining and XML, and produces an oriented-XML treelike object model named TOM. Then we research frequent patterns discovering problem on XML, and produce an algorithm based on XML named XMLMINER. Finally we produce a pruning method to improve our algorithm.The major contributions of the thesis are as follows:1. Semi-structured data models and data contents of XML are analyzed, and pointing to the limitations of which semi-structured data models descript data of XML, a treelike object model named TOM is produced and it is used as data model when we mine frequent patterns in XML.2. An algorithm named XMLMINER to mine frequent patterns in XML is produced. The keys of the algorithm are both the generations of candidate subtrees and their frequency counting. The technique named prefix equivalence class that used in TreeMiner is improved to generate candidate subtrees, and occurrence lists is used in counting the frequency of candidate subtrees.3. A pruning method is produced to improve our algorithm. The pruning method can permits us to directly get some undiscovered frequent patterns from some discovered frequent patterns, so that deceases quantity of candidate subtrees and time that used to count the frequency of their, thereby improves the efficiency of our algorithm.

Keywords/Search Tags:

Web mining, XML, semi-structured data model, labeled ordered tree, frequent subtree

PDF Full Text Request

Related items

1	The Research On Frequent Subtrees Mining And Corresponding Techniques
2	Research On The Application Of A Frequent Sub-tree Algorithm In Web-log Mining
3	Research On Related Technology Of Frequent Pattern Mining For Semi-structured Data
4	Structure Of Data Mining And Processing Problems
5	Research On The Data Model And The Approaches To Data Mining In The Semi-structured Data
6	Study On Frequent Subtree Mining And Its Application In XML Mining
7	Research On Embedded Frequent Subtree Mining
8	Research On Cultural Calculation Of Semi-structured Data
9	Research On Frequent Subtree Mining
10	Embedded And Export Of Frequent Subtree Mining Algorithm