XML is fast emerging as the de facto standard for data representation and exchange on the World-Wdie-Web. However, XML is much more than the bridge between World-Wdie-Web and database. XML data is semistructured, and relational database isn't suitable for mangement of it for the inherent limitation of relational data model. On the other hand, there is lots of XML data storing as files on the World-Wdie- Web. These files contain a great deal of information, so an XML documents query tool is necessary to retrieve information from these files. Research on XML query can benefit from the previous research on database and XML. The query techniques of triditional database are mature enough to be migrated to XML query. While those of semistructured database can be apply to XML query directly. The most important thing is that W3C has defined schema, query language and formal semantics for XML, which brings the possiblity of query XML documents just like query relational database using SQL. Because of its inherent characteristic and the difference between it and triditional data model, there are still many problems about XML query. In this article, we discuss most techniques of XML query and introduce how to implement them in the XDQuery, an XML documents query system developed by us. XDQuery adopts XQuery as its query language. XQuery includes so much content that it is too difficult for us to implement all of its functions. We decide to implement a core subset of XQuery and extend the ability of updating by analyzing the use cases of it. Based on the formal semantics of XQuery and refering to the Lore system, we develop the logical operators and physical operators of XDQuery. XML data is free-structured and path expressions are of great importance in XML query. So we develop a path index based on the DataGuide of Lore but more suitable for XML data and XML query. Furthmore, indices of XDQuery can be stored as XML files and transferred together with the source XML documents. XDQuery adopts cost-based strategy for optimization, and also uses additional heuristics to further prune the search space. XDQuery is a memory-based query system, and its cost model is based on the count of reference operator, which is quite different from that of triditional database. Most of the techniques discussed in this article are implemented in XDQuery and detail performance results are reported. |