Font Size: a A A

Context - Based XML LCA Keyword Query Technology

Posted on:2012-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhuFull Text:PDF
GTID:2208330434972943Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As XML (eXtensible Markup Language) has become the leading standard for exchange and the information representation and exchange on the Web, there are growing demands for retrieving XML data. Since XML has many new features, how to retrieve XML effectively and efficiently has faced with a lot of new challenges and opportunities.Traditionally, a structured query language, such as XQuery and XPath, can convey complex semantic meanings and therefore obtain the desired retrieve results more precisely. Nevertheless, in many cases keywords style search is more easily accepted. For example, the structured query languages require user to know the XML document schema and how to express a query according to their complicated grammar. Most developed XML keyword search approaches are based on the idea of using a variant of lowest common ancestor (LCA) concepts. For each submitted query, they only retrieve nodes included in the subtree rooted at LCA node, while the others are treated as non-relevant to the user. But in fact, because the XML tree’s structural information is blind to the user and the keyword query typed by the user is so short that lack enough information to judge, the results based on the LCA can’t include all the relevant information in the subtree of LCA node in most of the time, which leads to user dissatisfaction with the query result. So how to improve the low effective performance suffered by many XML keyword search engines is the motivation of this paper and will try to be solved in the following.The main contributions of this paper are listed below:summaries the existing works and addresses the problem that only retrieve relevant information in the subtree rooted at LCA node, for which we propose the concept of LCA node based on context; Proposes a result expansion based approaches to define and get context information. The problems, which include how to decide whether the results should be expanded and which information should be added; to judge whether the results should be expanded, a decision rule which can balance both effectiveness and efficiency is proposed by analyzing query log; proposes an XML TF*IDF approach to score the candidate attributes, the name of candidate attributes are referred to those attributes which are not included in the LCA result, and the query expression is expanded based on the context information.In the experiments versus SLCA approach, whatever in the experiment date of precision, recall and F-measure, our approach has much better performance than SLCA approach. And the system response time is acceptable. The series of experiment data verify that we have achieved our goal.
Keywords/Search Tags:XML, Context, Keywords Search, LCA
PDF Full Text Request
Related items