Font Size: a A A

Research And Implementation Of Index In The Native XML Database

Posted on:2011-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z F TanFull Text:PDF
GTID:2178360305454759Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
XML(eXtensible Markup Language) data is a common form of expressing semi-structured data. With the development of Internet and electronic commerce, the scope of its application is constantly expanding and it gradually become the mainstream of the current form of data.In banks, securities companies and other financial companies, XML is used in a variety of platforms to exchange data. In libraries, archives and other enterprises, XML is used to store data, and data retrieval; In addition, search engines, automatic intelligent translation, software configuration management and other fields, XML also has a wide range of applications. Because of its rich structure and semantics, good scalability and easy to use, XML becomes the standard of data description and data exchange. It provides a broad application prospects for the study of semi-structured data and promoted the development of the study of semi-structured data.It is conceivable that in the near future XML data is likely to meet or exceed the size of data in the relationship database and becomes the new mainstream data form following the relational database. Extensive use of XML makes efficient XML data management become a pressing need. Efficient large-scale XML data query is the most important. The traditional relational database can not provide good support for the XML data. Proceeding from the nature of XML data, Native XML database can effectively store and manage XML data. The existing index techniques that support XML document structure query are based on code of the node in the XML document tree. By the code we can determine the structural relationship between two nodes. These indexes are often emphasized the query efficiency, ignoring the cost of update. This article designs two XML index(DUXI and ADUXI) supporting dynamic update .It effectively reduces the cost of index updates.DUXI index consists of three parts: the data graph, pattern charts and node mapping table. The XML data graph saves the data structure information and the text information. It uses tree model of XML to store and index data. Each node in the data graph has only one node corresponding in the XML data. The nodes in the data graph record the address of the brother nodes,children nodes and parent nodes.The nodes that have the same name are linked by a list. In order to reduce the cost of updating index, the data graph records the offset address of the node to its parent. When dealing with updates, only the brother nodes of the updated node and the brother nodes of the parent node of the updated node need to be modified. The pattern graph records the model information of the XML data. It is a high degree of abstraction of the XML data. Although the XML data is a flexible semi-structured data, it still has a certain pattern. XML document is formed by a large number of repeat these patterns. As the size of pattern graph is usually much smaller than the size of the XML data, we can quickly matching query path using the pattern graph. In order to speed up the judgement of the structural relationship between nodes, we assigned to each node in the pattern graph a pair of beginning and end range encoding. Usually there are few nodes in the pattern graph and we rarely update the pattern graph. Therefore, we ignore the cost of updateing the nodes in the pattern graph. There are two sources of pattern graph:①Generated from the XML DTD or XML Schema;②Extracted the pattern information from the XML data. In order to speed up the extraction of pattern information, we record all the paths to the Hash table to improve the path to determine speed.Node mapping table is an index entry for the rapid positioning all nodes in the pattern grahp with the same node name. It also maps the node name for string to digital code. It uses B + tree and makes the string value of nodes as a key. The storage format of the leaf nodes in each index block is (number, local). Number is the digital code of the node name. In the pattern graph we record the digital code rather than the string value of the name, so that we can effectively reduce the storage space.It also convert the compare of string to the compare of digtal code. Local records an address, pointing to node chain in the pattern graph.The nodes in the chain have the same node name with the name of the leaf nodes in the B+tree.In dealing with complex query path, the path is decomposed into a number of simple paths. The simple paths are handled by using the pattern graph, and then the results of the simple path are join to produce the final result. We convert the structural join to equal join by the structural information and address information record in the data graph. We design Hash-Merge-Structural-Join algorithm and the time complexity of the algorithm is O (m + n).It wil keep high good performance in dealing with a large number of node jion.In this paper, the XML keyword queries have been studied. The keyword queries is easy to use without having to learn the complex XML structurral query language and be familiar with the tructural information of the XML documents. It has a broad application prospects. Because the XML data is Semi-structured, the XML keyword query is different from the traditional text keyword query. The result of this query should be the kid tree which contains all the keywords, rather than the entire document. This study is based on SLCA (Minimum Minimum Common Ancestor) semantics. We propose a new XML keyword query processing algorithms. By the subscript path of the node in the data graph,we calculate the common prefix of the subscript and can get the LCA nodes, determine the association between nodes, and filtered out the non-SLCA nodes.Finally, we design and implet a Native XML database prototype system. This system support the XML keywords query and the XML structural query with the index proposed in this paper. We use the DBLP data as the XML-test data. We tested the XML structural queries and the XML keywords query. We alse test the time consumed by createing the index, the storage space of the index and the cost of updating the index. The reaults of the experience show that the index and the struactural join algorithm haveood performance.
Keywords/Search Tags:XML, Native Database, Index, SLCA, Structured Query, Model, Hash, dynamic update
PDF Full Text Request
Related items