Font Size: a A A

The Establishing And Update On The Index In XML-Database

Posted on:2012-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:D Y FangFull Text:PDF
GTID:2178330332499768Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The database technique sprang up during the sixties and seventies of the twentieth century, which was built for settle the problem of saving and managing the growing masses of data. The direction of the research is efficiency saving, using and managing the data.In recent years, with the coming of the information age, the database technique and computer network technique has been closely related, and they are becoming the two main research areas in modern computer technique. The database technique is not only used to deal with lots of different transactions, but also used in artificial intelligence, expert system and many other correlation areas. During the process of researching how to use the database technique to manage the interenet information, it is proved that the information in the web page are not in the same fixed mode and the data type is also fluctuant. So, we are trying to find out a practical way to express the different data source into the same type.Against this background, the W3C published the W3C standard XML1.0 in February 1998, which provides a new data model for the management of the network data with the characteristics of semi-structured data, then we could store the masses of diffferent message in XML file.Unlike HTML, the XML focus on data storage in stead of the display mode of the data. The semi-structured XML as the tools of data storage is widely used in various applications, and becomes the common language for data exchanging soon. Although the application could also communicate through the other forms, the XML are widely used soon. With the large number of XML documents, people begin to do research on XML database in order to manage the XML files better.At the present time, there are three main solutions for XML data storage: text, relation database and Native-XML database mode.For the mode of text, the XML document itself is a data file, which is the same with the manner the data understood and saves the relationship of the data. This solution is always achieved by DOM or SAX, etc for accessing the XML files. For the mode of relation database,it stores the XML documents as structured data for storage, which using the relation database for management and achieved by XML mapping layer. In this process, we will establish the model table of relational data based on the XML documents first, then break down the content of XML documents and dump into the table.During this process, the logical hierarchy in the table may be lost. The last solution is Native-XML database which is designed specifically for the XML documents. The same as the traditional relational database, it supports the transaction processing, security, multi-user access and the other functions. The difference is that its internal structure is not the form of relation tables, but the tree structure of XML documents. How to determine the logical hierarchy between the different nodes quickly is very important, which is the focus of structure query also. With the date account becoming larger and larger, using the traditional XPATH and XQUERY for query is very slow and difficult to achieve efficient query. While the traditional data encoded is weak on the support of structure query, how to make the code to support both value query and path query is the focus of this thesis.The thesis puts forward a new combination code of suffix coding technology based on the analysis of the different coding for the XML document tree and establishes the index technology supporting both value query and path query. Taking the advantage of the binary tree structure, we establish the path index combining with the coding based the structure of XML document tree.For the query of the content in XML document, we build the value index using the traditional lucene index technology. For the query of the keywords, we carry on the query with the paramerters which are the keywords expressed as a string.In the process of structure query, we should consider the simple structure queries and the complex queries with keywords. For the complex queries, we use JAVACC for parsing and verify the legitimacy of the query through the regular expression. After that, we break down the complex query into many simple queries, and then link the query results.We analysis the query efficiency of this design through the experimental data and collect the query time and the other data under index and without index.What'more, we analysis the problems of the size of the index and the efficiency of query when we build the value index and the path index on the XML data set.This thesis makes certain improvement on XML document tree coding,the structure of the index and the query,which could support the value query and the path query better and reduce the cost of the index update to some extent.
Keywords/Search Tags:XML database, index, suffix code, structure index, lucene technology
PDF Full Text Request
Related items