Font Size: a A A

Research On Structural Query Techniques For XML Data Management

Posted on:2009-06-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:F ShaoFull Text:PDF
GTID:1118360242983023Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, XML has been the de-fasto standard for data representation and exchange.More and more Web data appears in the form of XML document. How to efficiently manage XML data is important research topic in database area. The intrinsitc of XML data is semi-structure, thus the management of XML storing, querying and updating is more complex than that of structure data. It is inefficient to solve the problem of XML data management by traditional database technology. Accordingly, we need to research and develop novel technicals of XML data management in terms of XML data charateristics.The research work in this thesis revolves around XML structural query processing on XML data management. XML structural query is a special kind of XML data query. Its query condition is XML structure restriction which is expressed in terms of path expression. XML structural query is the basis of XML data query, and it is hard core of many previous XML query languages, e.g. XQuery, XPath, etc. Hence, it is very important to provide highly efficient XML structural query processing. In the thesis, we classify and process structural queries acoording to path expression, therefore XML structural query processing has performed well.First, we present a novel multi-class XML structural query processing architecture named MCXArch, which has two important components: query execution model named MCXEng,query optimazition model named MCXOpt. In the MCXEng, we introduce some query execution operators. In the MCXOpt, we introduce some optimaztion rules for structural query. Then, within the MCXArch, we discuss four crucial XML structural query technicals: XML linear path matching, XML branch path matching, the acceleration of XML structural query, the estimation of XML containment join.In the research of XML linear path matching, we propose two novel matching algorithms: Integer difference algorithm and reduced navigation algorithm. The integer difference algorithm is used to process XML simple linear path matching, and the reduced navigation algorithm is mainly used to process XML complex linear path matching. These two matching algorithms improve query matching performance by some reduction methods.In the research of XML branch path matching, we propose two heuristic matching algorithms: Heur-PC algorithm and Heur-Unnested algorithm. The Heur-PC algorithm is used to process simple branch path matching, and the Heur-Unnested algorithm is used to process self-unnest branch path matching. The processing time of these two heuristic algorithms is less than that of previous twig join algorithms.In the research of XML structural query acceleration, we present a novel speedup method named bitmap filtration. By employing the prefix/suffix tag bitmaps, the bitmap filtration method can speedup many path matching algorithms, e.g. join-based algorithms, navigation-based algorithms, etc. We firstly introduce the theory of bitmap filtration speedup, and then integrate this speedup method into path matching algorithms.In the research of XML containment join estimation, we present a novel estimation method named weighed haar wavelet method. In the preprocess stage of the weighted haar wavelet method, we employ haar wavelet to compress the statistics and construct the wavelet synopsis. In the estimation stage, we obtain estimated value by the wavelet coefficient reconstruction. We also introduce the probabilistic weight model to simulate the workload adjustment of XML containment join queries. The weighed haar wavelet method has smaller mean relative error than previous containment join methods under the same space budget.
Keywords/Search Tags:XML, Structural Query, Path Matching
PDF Full Text Request
Related items