Font Size: a A A

Double Index-Based XML Data Query Optimization

Posted on:2011-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:R JiangFull Text:PDF
GTID:2178360305984865Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
XML (Extensible Markup Language) is a flexible data format and is fast emerging as the dominant standard for representing and exchanging information on the internet because of its abundant expression and self-description, flexibility, etc. With the large amount of data in XML format, how to manage the XML document in an efficient, systematic and scientific way has become a great challenge in the field of database study.In the paper, we first analyze the mapping between XML schema and relational schema, and then propose an XML storage model based on schema. After that, we discuss a number of key techniques of the XML indexing and query. Based on the research, we proposed a Double Index (DI) method. At last, a query algorithm is presented. It mainly contains:(1) Although it is simple to use traditional model to storage XML documents, it is only suitable to the traditional method traversing XML documents in top to bottom or bottom to top order. And the query efficiency is low. In this paper, we propose an XML storage model based on schema. According to the rules of conversion, using XML Schema which often exists in practical applications, we can generate a storage model based on relational database. By comparing with the conventional method, the model has some merits as follows:disperse the traditional large table into interconnected smaller ones, and it is suitable to traversing XML documents from any layer. When the document is large and has many nodes, querying with the storage model don't have to traverse the document node-by-node thus it can increase the efficiency. In addition, the model can provide a durable conference for index.(2) Based on the storage model, this paper proposes a new indexed structure-DI structural index. The current path index tends to resolve the queries on absolute path expression. To relative path expression, in order to get the results meeting the expression, it may have to traverse the whole index. The cost is high. Applying inverted file indexing technique and Chinese word segmentation technique, DI method builds an absolute index model and a relative one. The method can efficiently support diverse queries. The absolute index model reduces the number of comparison by shortening the path expressions. The relative one completes the path expressions by setting up parent-child index table and replaces original queries with small index structure. This method can avoid the defect of always traversing the tree from the boot and save storage space. Also, it improves the query efficiency.(3) Based on DI, this paper also proposes a query algorithm related to the method. After the test on three different queries between Fabric index with DI index, model simulation results are given. Experimental results show that the method works well.
Keywords/Search Tags:double index, inverted index, query optimization
PDF Full Text Request
Related items