Double Index-Based XML Data Query Optimization

Posted on:2011-03-27

Degree:Master

Type:Thesis

Country:China

Candidate:R Jiang

Full Text:PDF

GTID:2178360305984865

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

XML (Extensible Markup Language) is a flexible data format and is fast emerging as the dominant standard for representing and exchanging information on the internet because of its abundant expression and self-description, flexibility, etc. With the large amount of data in XML format, how to manage the XML document in an efficient, systematic and scientific way has become a great challenge in the field of database study.In the paper, we first analyze the mapping between XML schema and relational schema, and then propose an XML storage model based on schema. After that, we discuss a number of key techniques of the XML indexing and query. Based on the research, we proposed a Double Index (DI) method. At last, a query algorithm is presented. It mainly contains:(1) Although it is simple to use traditional model to storage XML documents, it is only suitable to the traditional method traversing XML documents in top to bottom or bottom to top order. And the query efficiency is low. In this paper, we propose an XML storage model based on schema. According to the rules of conversion, using XML Schema which often exists in practical applications, we can generate a storage model based on relational database. By comparing with the conventional method, the model has some merits as follows:disperse the traditional large table into interconnected smaller ones, and it is suitable to traversing XML documents from any layer. When the document is large and has many nodes, querying with the storage model don't have to traverse the document node-by-node thus it can increase the efficiency. In addition, the model can provide a durable conference for index.(2) Based on the storage model, this paper proposes a new indexed structure-DI structural index. The current path index tends to resolve the queries on absolute path expression. To relative path expression, in order to get the results meeting the expression, it may have to traverse the whole index. The cost is high. Applying inverted file indexing technique and Chinese word segmentation technique, DI method builds an absolute index model and a relative one. The method can efficiently support diverse queries. The absolute index model reduces the number of comparison by shortening the path expressions. The relative one completes the path expressions by setting up parent-child index table and replaces original queries with small index structure. This method can avoid the defect of always traversing the tree from the boot and save storage space. Also, it improves the query efficiency.(3) Based on DI, this paper also proposes a query algorithm related to the method. After the test on three different queries between Fabric index with DI index, model simulation results are given. Experimental results show that the method works well.

Keywords/Search Tags:

double index, inverted index, query optimization

PDF Full Text Request

Related items

1	Based On The Index Technology Of Xml Query Optimization Research
2	Space- And Time-efficient Compression And Intersection Algorithms For Inverted Index
3	Research On Top-k Subgraph Query Algorithm Based On Double Index
4	Index Compression And Query Processing In Search Engines
5	Research On The Key Techniques For XML Index And Query
6	Compressed Index And Query Optimization On Large-Scale Genomic Data
7	Research On Key Technologies Of Full-text Index Compression In Cloud Environment
8	Understanding Of Web-based Document Inverted Row Of Full-text Index Research And Realization
9	Research On SSD-based Inverted Index Construction And Maintenance Strategies
10	Research On The Index Technology Of Semi-structured Data