Font Size: a A A

Research On Pattern-Based XML Indexing

Posted on:2011-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:F Q XiaoFull Text:PDF
GTID:2178360305450707Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of internet application, especially the development of Electronic Commerce and Web Service, the data exchange between company and personal communication become more and more frequent, a widely accepted standard of data exchange is eagerly needed. In this situation, XML (Extensible Markup Language) appeared. It is a specification of the definition of semantic markup. XML has become a widely popular standard for representing and exchanging data over the Internet. Storing and querying XML data becomes more and more important.To retrieve XML and semi-structured data, several query languages have been proposed. Examples are Lorel, XPath and XQuery. All of these query languages have a same feature, which is querying XML document by using path expression. In order to improve the efficiency of path query, professors take research on the index of XML document. The information of XML pattern (DTD or XML Schema) can greatly affect the foundation of XML index and the improvement of the efficiency of path query, but it is not widely used by those proposed indexing methods.At present, DTD and Schema are the XML patterns which are used more frequently. In this paper, we proposed two indexing methods separately based on DTD and Schema. The main idea of this method is to establish both XML pattern index and XML document index and establish the mapping of XML pattern and XML documents. The pattern index is used to locate the specified element/attribute of the pattern. The document index is used to find out the required element/attribute from XML documents, return the results.When there is a query, we take two steps to execute. Firstly, we examining ancestor-descendant relationships given in regular path expression queries in the pattern index, if there is no matched path in the pattern tree, we can affirm that there is no matched path in the XML documents too. Thus, we can return the result "No such element" directly, no need to search the XML documents. Otherwise, if we found matches in the pattern tree, we should search the XML documents to find the final result.The proposed index structure, on one hand, makes effective use of XML pattern to reduce the amount of data that have to search, on the other hand, it using an optimized numbering scheme to reduce the amount of structural join operations. All of these can improve the efficiency of path queries greatly.Most of the recently researches of XML indexing neglected the maintenance of it. In this paper, we use an extended Dewey coding method to encode XML documents. This method ensures the consistency of data updating. The codes of existing nodes should not change when insert a new node to the document or delete a node from it.The main works and achievements of this paper are:1. Propose new indexing methods based on DTD(CDBXI) and XML Schema(SBXI). Describe the structure and the numbering scheme of the indexing methods.2. Propose the query algorithms and the steps of query operations in XML database which adopt the methods proposed in this paper.3. Discuss the maintenance of the proposed indexing methods. Afford the maintenance algorithms of adding documents and deleting documents.4. Prove that the proposed indexing structure has high efficiency by data analysis, not only on query operations but also on its maintenance.
Keywords/Search Tags:XML Index, DTD, XML Schema, Path Query
PDF Full Text Request
Related items