Font Size: a A A

A performance study of XML query optimization techniques

Posted on:2011-02-20Degree:Ph.DType:Dissertation
University:University of CincinnatiCandidate:Richardson, Bartley DouglasFull Text:PDF
GTID:1448390002952223Subject:Computer Science
Abstract/Summary:
As computers and technology continue to become more commonplace and essential to everyday life, more data is captured, stored, and analyzed by a variety of institutions in government, education, and the private sector. As this amount of data grows, so does the need for efficient methodologies and tools used to store, retrieve, and transform the data. A common method used to store this schemaless, semi-structured data is through the Extensible Markup Language, XML. In this way, an XML document is viewed as a database. With this sizable amount of data stored in a common format, one problem is how to efficiently query XML documents. While relational database management systems contain built-in query optimizers, no such framework exists for XML databases. A multitude of document shapes, query shapes, index structures, and query techniques exist for XML databases, but the implications of these choices and their effects on query processing have not been investigated in a common framework. This dissertation identifies a set of representative query techniques, document structures, and query styles for XML databases and provides a common framework for classifying the various query techniques, structures, and styles. We identify two broad classifications of query techniques, native XML and non-native XML, and develop a cost-based model for each technique that models query performance from an execution standpoint. We also develop our own query technique, RDBQuery, as an extension and major enhancement to a previously existing non-native XML query technique that leverages a relational database management system to efficiently process XML queries. To evaluate relative query performance, we compare the techniques for various parameters that impact their performance, including query shape and document shape/size, and the results are presented through a series of graphs. These graphs and their underlying cost models are used to present an optimization framework for XML queries, and this provides the essential foundation in development of an integrated cost-based XML query optimizer.
Keywords/Search Tags:XML, Query, Techniques, Performance, Framework, Common
Related items