Font Size: a A A

Research On Slca-Based Keyword Search Over XML Documents

Posted on:2012-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:S Y YangFull Text:PDF
GTID:2218330338962703Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
XML (Extensible Markup Language) is created by w3c based-on the standard generalized markup language, which is used to define semantic label. In fact, it has become a criterion to describe data in Web services, e-commerce, digital libraries, and many other network-related applications.There are many XML data query algorithms in order to extract information the users desire from the mass of the XML data, making XML data query become a hot topic in the field of XML data management.Generally, XML data query algorithms are divided into two categories according to the difference of describing query modes, namely XML structures and XML keyword search queries. The former is mostly used traditional regular expressions description method, partial to the traditional structured query, which can express the intent of the user's query clearly; the latter brings ideas and methods commonly used in query into the field of information retrieval, which allows users query only with the key words.Based on exact query condition, XML structure query algorithm can output desired results. However, the algorithm requires users to know more about query not only the language facilitated in structured query algorithm but also the structure of the query XML document tree. The above requirements are impractical for most users. XML keyword query can avoid the problem well, so from the user point of view, XML keyword query can be widely used.The most critical problem in XML keyword query is how to find out the most compact fragments containing all keywords, namely SLCA (Smallest Lowest Common Ancestors) problem. There are many solutions, including Stack, ILE, SE, LISA and LISA II and so on. ILE and SE perform more efficient than Stack which are suitable for query with frequent I/O operations based on massive XML data, since they only need to read the XML data one time in order. Compared to ILE and SE, LISA and LISAⅡhave shown a better performance in the light XML queries both in theoretical analysis and experiment.However, LISA requires frequently scan nodes, and it also introduces a set cross operation which cost a lot of CPU cycles. Although LISA II tries to avoid unnecessary scanning, it has its own unique code, which not only brings into the code mapping, but also greatly weaken the versatility of the algorithm. Even if the two algorithms are executed only in memory, the above shortcomings influence the query speed significantly.Based on SLCA, the paper made improvements on existing algorithms. The main contribution of this article:First, the analysis of the shortcomings on the SLCA algorithm and propose a new improved algorithm for LISAⅡalgorithm time complexity is high and waste storage issues SLCA-based encoding algorithm level UBS algorithm, the experimental data shows the algorithm effectiveness.Second, through the introduction of PE to determine a meaningful thought, given the effective common ancestor, which is ESLCA the definition that labels the content and structure from the elements of similarity is equivalent to two aspects that may determine the existence of XML keyword search invalid results and results for ESLCA exists but in return the result is empty define XML keyword search result sets, the last paper presented by EV-Index model is equivalent to the value of the index based query algorithm BVA, BVA in the query results show the efficiency of and queries have greatly improved the quality.
Keywords/Search Tags:XML keyword query, Dewey, SLCA, hierarchical encoding, ESCLA
PDF Full Text Request
Related items