Font Size: a A A

Research Of XML Data Encoding And Storage Management

Posted on:2011-01-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Y WangFull Text:PDF
GTID:1118330332972483Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the past ten years, XML has been extremely rapidly developed and widely used. With its flexible semi-structured features, XML data becomes the new data format widely adopted in various areas. Large repositories of XML documents have emerged on the Web, and semi-structured data management has become an important branch of the modern database technology. XML data management should be based on the XML data model, and should consider the essential characteristics of XML data to explore the efficient storage scheme in support of its tree structure. Namely, it is the key problem to design the native XML data management solutions. This thesis is the research of native XML data management. On the one hand, we have designed and developed a fully independent native XML database, centering on the XML data model, providing physical storage scheme, which can fully support XML indexing structure and efficient XML query processing. On the other hand, in order to support XML data in the traditional relational database, the thesis has designed and implemented a native XML storage scheme seamlessly integrated with relational databases, and the scheme reuses relational database storage manager to solve the difference between the logical XML data model and relational model, which provides the most natural way to store and index XML documents. Based on the analysis of research in recent years, the thesis has studied the native XML encoding, storage scheme, indexing structure, XML data update, document similarity measure issues. The main contributions and innovations as follows:·The thesis presents a novel XML tree numbering scheme, called BSC, which takes advantage of the property of binary fractional numbers and can completely avoid re-labeling any existing nodes when the XML update is performed in any case. Theoretical analysis of the label size is given, and our experimental results show that BSC works much better than the existing dynamic numbering schemes considering either the static numbering or the XML data updates.·The thesis first introduces a new independent native XML data storage scheme XN-Store. The scheme will record XML node in paged file to maintain the original XML data logical model. XN-Store not only achieves a rapid publishing XML nodes and accessing operations, and can fully support XML indexing structure and efficient XML query processing. The experimental results show that XN-Store is a high performance storage scheme for native XML databases.·Based on XN-Store storage scheme, this thesis presents an efficient update strategy, called XN-Store+. XN-Store+ adds the forward link records to keep the virtual address of moving records remaining unchanged, and this manner assures the correctness of the various index structure based on the record address. At the same time, the strategy adds relocation records to keep actual data of the moving records to maintain the document order. The strategy solved XML data update problem well.·This thesis presents a model-based mapping XML storage scheme seamlessly integrated with relational databases, called NXRel, whose central idea is to take advantage of effective XML node numbering scheme to support XML data model over relational table. Firstly, this relational table of XML nodes is a non-destructive mapping of XML data model. Secondly, the storage scheme maximizes the reuse of the underlying storage mechanism existing in relational database. Using a variety of data sets, the experimental results show that NXRel is a high-performance XML storage scheme.·Based on the bidirectional path constraints model (BPCM), this thesis proposed a method to evaluate the similarity of XML documents. The BPCM can accurately describe the characteristics of XML document structure, and then two methods are proposed based on the path set and N-Gram respectively to calculate the path similarity, finally based on a variety of weight, the method is given to evaluate the document similarity. By applying the method to XML documents clustering, the experiments show that the method can improve the clustering effect of precision and recall rate.
Keywords/Search Tags:node numbering, XML data encoding, XML storage scheme, XML data update, structural similarity
PDF Full Text Request
Related items