Font Size: a A A

Study And Improvement Of XML Keyword Query Based On SLCA

Posted on:2010-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:D D GaoFull Text:PDF
GTID:2178360278473032Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
XML, stands for Extensible Markup Language, is a standard of semantic markup. It defines the data type aimed at easily recognized by both computers and users. With the fast development of network applications, XML has been widely applied to the Internet intelligent information retrieval system, digital libraries, data integration, Web Service and so on, which makes XML become a primary data form. So how to find the useful information from XML data has been a hot research area.According to different features of XML Query request, we can divide the XML Query strategy into two categories: XML Structural Query and XML Keyword Query. XML Structural Query requests users to master the XML structure and exquisite query language. It is a big challenge to users. However, XML Keyword Query is much more flexible. Users can easily use it by only providing keyword information instead of any query language or document structures. So this strategy is widely used and valuable to study.For XML Keyword Query, the basic issue is to find the smallest XML fragments which contain all of the keywords. The definition of the smallest XML fragments and the efficiency of computing them will directly affect the efficiency and accuracy of XML Keyword Query.At present, SLCA is the best definition for the smallest XML fragments. And there are also many good solutions for computing SLCA. In this paper, we first analyze the disadvantage of previous algorithms for SLCA and then propose two more efficient algorithms for improvement. But according to the experiment, we found that there some nodes in the answers. So we improve the notion of SLCA and propose Heterogeneous SLCA (HSLCA) which brings label's information in. HSLCA could improve the accuracy of XML Keyword Query obviously.In the efficiency enhancement of algorithm for SLCA, this paper analyzes the classical algorithms and makes the innovations below:1. We propose the USSSA algorithm which bases on the union set. It uses the level coding to do the computation against to the disadvantage of more space complexity and more space storage of LISA IL The space complexity and the space storage of USSSA are more advanced than LISA II in the same time scenario. 2. We propose the TREE_Set algorithm which bases on the idea of overlapping tree. It computes the SLCA in the tree mode against to the poor efficiency of LISA II when there are less keywords. Meanwhile, this algorithm is more intuitional and easy to understand and realize.In the accuracy enhancement of algorithm for the XML Keyword Query, we analyze the disadvantages of SLCA and make the innovations below.1. We improve SLCA and propose HSLCA as a new notion for the smallest fragment. For HSLCA, label is introduced to determine the correlation between keywords, and to improve the accuracy of XML Keyword Query.2. The HLISA algorithm is innovated which is a solution of HSLCA with the enhancement of LISA II algorithm. HLISA theoretically have the same superior performance with LISA II since they have the same time complexity.This paper makes an in-depth study on SLCA. Through it, we improve the efficiency and accurate of XML Keyword Query which is very significant for users and applications.
Keywords/Search Tags:XML Keyword Search, The Smallest Fragment, SLCA, LISA, HSLCA
PDF Full Text Request
Related items