Mining frequent structural patterns from XML datasets

Posted on:2013-07-13

Degree:M.S

Type:Thesis

University:King Fahd University of Petroleum and Minerals (Saudi Arabia)

Candidate:Ali, Mohammed Mohsin

Full Text:PDF

GTID:2458390008965497

Subject:Computer Science

Abstract/Summary:

Due to its flexibility and capability for representing various kinds of data, XML has become a de facto standard for data exchange over the net. Recently, the use of XML has been increasing at tremendous pace. With the ever-increasing amount of data available in XML format, the ability to mine valuable information from them has become increasingly important. However mining useful information from the XML is difficult due to its hierarchical tree structure. In this thesis we are proposing a new and efficient algorithm for mining frequent structures from XML documents. Unlike general trees, XML trees have many repeated substructures. So the proposed algorithm exploits the presence of repeated substructures and does the following. First, it clusters the input XML dataset by structure; second, it encodes the XML dataset objects in order to minimize storage space and to avoid string manipulation; and third, it applies Apriori algorithm on the clustered and encoded XML dataset to find the frequently repeated substructures. The experimental results show that the proposed algorithm significantly outperforms the Apriori based algorithms.

Keywords/Search Tags:

XML dataset, Mining frequent, Repeated substructures, Proposed algorithm

Related items

1	Research On Optimization Algorithm For Dataset Covering Problem
2	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
3	Research And Application On Frequent Substructure Mining Algorithms
4	Study On Frequent Subtree Mining And Its Application In XML Mining
5	The Research Of Frequent Itemsets Mining Algorithm Over Data Streams
6	Research And Application Of Frequent Pattern Mining Algorithm Based On Tissue-like P System
7	Research And Application Of Data Mining Algorithm Based On Graph Pattern
8	Mining Algorithm Based On Frequent Sub-graph Of The Multi-layer Index Structure
9	Research And Application Of Frequent Subgraph Mining Algorithm
10	Research On Fast Frequent Itemsets Mining Algorithm And Their Applications