Font Size: a A A

Mining frequent structural patterns from XML datasets

Posted on:2013-07-13Degree:M.SType:Thesis
University:King Fahd University of Petroleum and Minerals (Saudi Arabia)Candidate:Ali, Mohammed MohsinFull Text:PDF
GTID:2458390008965497Subject:Computer Science
Abstract/Summary:
Due to its flexibility and capability for representing various kinds of data, XML has become a de facto standard for data exchange over the net. Recently, the use of XML has been increasing at tremendous pace. With the ever-increasing amount of data available in XML format, the ability to mine valuable information from them has become increasingly important. However mining useful information from the XML is difficult due to its hierarchical tree structure. In this thesis we are proposing a new and efficient algorithm for mining frequent structures from XML documents. Unlike general trees, XML trees have many repeated substructures. So the proposed algorithm exploits the presence of repeated substructures and does the following. First, it clusters the input XML dataset by structure; second, it encodes the XML dataset objects in order to minimize storage space and to avoid string manipulation; and third, it applies Apriori algorithm on the clustered and encoded XML dataset to find the frequently repeated substructures. The experimental results show that the proposed algorithm significantly outperforms the Apriori based algorithms.
Keywords/Search Tags:XML dataset, Mining frequent, Repeated substructures, Proposed algorithm
Related items