Font Size: a A A

Efficient and scalable XML data processing using relational database systems

Posted on:2005-12-11Degree:Ph.DType:Thesis
University:The University of Wisconsin - MadisonCandidate:Tian, FengFull Text:PDF
GTID:2458390008983046Subject:Computer Science
Abstract/Summary:
The Extensible Markup Language (XML) can serve as a standard format for storing semi-structured data sets. One approach to implementing an efficient and scalable XML database is to use a relational database system to store and query XML data. This approach is very attractive because the relational database system is a mature technology with proven reliability and scalability. However, this approach also has several disadvantages. Storing and accessing XML data through an SQL interface (therefore, the whole relational database call stack) incurs overhead that is not necessary for XML processing. The relational schema mapped from XML schema may be inefficient in navigating between XML elements through the parent-children or siblings relationship. Also, some XQuery features are hard to translate into SQL or the resulting SQL is complex and inefficient.; This thesis addresses these issues when an XML database is implemented using a relational database system. We first compared the performance of storing XML data in a relational database against several other XML storage strategies that use a file system or an object manager. Our results dearly indicate that when the XML schema information is available, using relational database to store XML data is indeed a viable approach. Then we identify a number of XQuery features that are either hard to translate into SQL or the resulting SQL is complex and inefficient. We propose an extension to the relational database system that facilitates efficient XML query processing within the existing relational database execution framework. The extension can provide an order of magnitude performance improvement for queries such as long path expressions. In the third part of the thesis, we study how to implement an XML publish/subscribe system using a relational database. Our experiments demonstrated that the system has very good performance and scalability in our experiments, handling millions of subscriptions with moderate amounts of physical memory.
Keywords/Search Tags:XML data, Relational database, Efficient and scalable XML, Translate into SQL, Resulting SQL, XML schema, Processing, Approach
Related items