Font Size: a A A

Research On XML Cluster Storage & Selectivity Estimation Of Path Expression

Posted on:2009-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiFull Text:PDF
GTID:2178360245482245Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of internet, a lot of Web data appears.They are almost represented by XML document. How to storage XML document and retrieve useful information from a lot of XML document becomes an important hotspot in the field of database study. This paper researches on the storage of XML and query optimization and focus on XML cluster storage model and selectivity estimation of path expression.This paper begins with the summary of XML technique, analyzes their current status and the breakthrough of their technique, and does the study in the fields of XML cluster storage and query optimization. In the field of XML cluster storage, facing the problem that DOM(document object model) can not reduce the disk I/O times efficiently in the process of XML query, this paper proposes X-cluster storage model which aggregates the most similar nodes in the field of node's structure and node's values according to the concept of nodes division, provides the different storage models and methods for different node value types, solves the problem that separating node's structure and node's value in the previous cluster model objectively and pretty much inaccuracy of cluster as well. In the field of query optimization, this paper studys the selectivity estimation of path expression, analyses the method of selectivity estimation of path expression based on histogram. Aimed at the problem of low efficiency and selectivity estimation accuracy, the paper produces the statistic information model of X-cluster synopsis into the selectivity estimation of path expression and proposes CHPM method, the method reduces the scale of CT(cost tree) by computing the node with selectivity rate is 100% and skipping the node and path that not involved in histogram. So it improves the efficiency of selectivity estimation of path expression. Meanwhile, in order to avoid the effect caused by the high-frequent data in the middle result histogram table which impact on the accuracy of consequent histogram, the inaccuracy of selectivity estimation of path expression is reduced because of compressing which makes the data in the histogram more symmetric.Experimental results indicate that X-cluster synopsis and the XML selectivity estimation of path expression with value predicates based on the synopsis and compressed histogram thchnology have lower relative caculation error and is a effective and applicable method in terms of selectivity estimation not only for simple path expression with single predicate but also for complex path exrpession with multiple predicates.
Keywords/Search Tags:XML database, xml cluster storage, selectivity estimation, histogram, stability
PDF Full Text Request
Related items