Font Size: a A A

Research On Storing Complex XML Documents Into Relational Database Based On XML Schema

Posted on:2012-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q L CengFull Text:PDF
GTID:2218330338973217Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the XML as a data carrier which is becoming increasingly important, especially it plays a vital role in data exchange, while a large number of XML data which have been produced lead to researches in a variety of problems of XML data storage. Based on the maturity of relational database, in the present study, the XML documents are being stored into the relational database in most of the studies. The most significant intermediate-mapping-layer-based approaches are P_Schema (Physical XML Schema), B_Schema (Basic XML Schema) and C_Schema (Complex XML Schema), which are still existing deficiencies. The definition of the multi-value elements in P_Schema is very vague for it does not make it clear whether the multi-valued element is the multiple occurrences of the value or the elements structure does; B_Schema sets no semantic constraints on XML, nor saving schema information, which made it not conducive for mapping schema to reconstuct and restore the XML documents. Although C_Schema inherited and expanded P_Schema and B_Schema, C_Schema has made the mapping method too complicated, especially in recursive complex element, multi-namespace, repetitive structure, future-oriented elements and the attribute. Moreover, the impact of the semantic constraints on the information to the XML document as well as the newly-added XML document data in the data tables haven't been taken into account. Therefore, some improvements have been done in this paper on C_Schema and X-RESTORE by putting forward a C_Schema++ mapping method as well as the XPath into SQL query algorithm.This paper is set out with the following three aspects:(1) C_Schema++, based on C_Schema, reanalyzed the XML schema semantic. This method re-defined the complex information in complex XML documents, such as the namespace information, repeated structure, recursive complex element, future-oriented elements and attributes, and re-concluded the extraction and mapping rules of these complex information, and got rid of the layer method to store the recursive complex elements, which made the method much simpler and clearer. Compared with the C_Schema, the biggest improvement is the use of the improved dynamic elements table to save the XML schema information in C_Schema++, which dynamicly grew with XML documents. Since the operation of element table is very frequent, the element table will be placed in memory by array, which greatly increased the storage and query efficiency. In addition, this paper re-analyzed the information of primary keys, foreign keys, unique constraints and so on, and save the XML semantic information to achieve lossless mapping.(2) XML documents based on C_Schema++ are being stored in relational databases, then it is necessary to convert XML queries into the relational database queries. The XML query is semi-structured query; the relational query is a structured query, which needs to use middleware. The middleware is based on the relational database query system, and the most famous is the X-RESTORE query middleware. X-RESTORE implemented the XQuery query based on XPath, so people who know well XQuery are able to use Xpath. To make it convenient for users, this paper improved the X-RESTORE and proposed an algorithm to convert the XPath into SQL query.The algorithm includes two steps:First, convert the users' input XPath query into XPathExpression, whose function is equivalent to C_Schema++ of complex XML documents. Then, the resulting graph is converted to SQL statements. The edges save the information of documents' nodes. Converting the information of nodes and edges into the content of SQL statements achieve the XPath query.(3) To confirm the advantages of C_Schema++, this paper compares the cost of storage and query between C_Schema and C_Schema++, which proves that the cost of storage and query is much less than C_Schema. Because, the cost of storage of XML document is mainly consisted of the cost of query operation, so this paper based of exiting methods estimates the cost by the cost of finding the nodes of DOM tree, and which is defined as the cost of querying nodes and the corresponding data.
Keywords/Search Tags:XML schema, Relational Database, complex XML document, C_Schema, C_Schema++
PDF Full Text Request
Related items