Font Size: a A A

Research On Semantic-based Approximate Query In XML Documents

Posted on:2011-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:D L YanFull Text:PDF
GTID:2248330395957905Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
XML (Extensible Markup Language) has been increasingly used in Web applications and becomes the standard of data interchange over the Internet. During the query processing, users’ intents are often ambiguous and incomplete, so that users cannot express their purposes accurately. In addition, XML data usually contain semantic information, including relationships of domain concepts and similarity information. Semantic information may play an important role in improving the performance of approximate query in XML documents. Nevertheless, traditional approaches require a determined query and then return all the answers satisfied the straightforward constrains. However, these answers are usually unsatisfactory because they cannot reflect the users’ intentions on the approximate semantic constrains. Therefore, it is important to discover the semantic knowledge and approximate relation of XML data in order to help users obtain the most relevant answers.This thesis proposes an approach to approximately query XML data with the assistant of semantic information. It proposes algorithms to extract semantic information from XML documents. Following the order of the importance of query conditions, it rewrites the initial query based on the semantic information and then obtains all the approximate answers. The whole process is divided into three parts.Firstly, effective algorithms to extract semantic information organized as ontologies and semantic trees from an XML document are developed. Ontologies provide a concise and unambiguous description of concepts and their relationships for a domain, while semantic trees are used to compute the similarity of text-type property values.Secondly, an algorithm to compute IDF scores of query conditions is introduced. According to IDF scores, the importance of each query condition can be calculated. Based on ontologies and semantic trees extracted form the XML document, it rewrites the initial query conditions. Specially, according to importance of each query condition, it proposes a set of query expanding rules based on the semantic information to expand the users’ query condition to the semantical equivalent or semantical approximate results.Then this thesis proposes an algorithm to delete invalid elements from initial query and adjust the query when the structural relationships are wrong for heterogeneity XML documents. An algorithm to relax structure restrains is also presented.Finally, we evaluate our approach against the existing work. The experimental results show that our approach is more effective. Concretely, comparing with existing methods, our approach has a remarkable increase in the recall rate and the precision rate of returned answers.
Keywords/Search Tags:XML, semantic information, approximate query
PDF Full Text Request
Related items