Font Size: a A A

A Self-adaptive Cross-domain Query Strategy On The Deep Web

Posted on:2012-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LiFull Text:PDF
GTID:2248330395958148Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As an increasing amount of Web information, the information in Deep Web becomes more and more. How to visit these databases as automatically as possible is the target of the current data integration of Deep Web.The data sources in Deep Web cover many different domains. The technique in domain-oriented data integration becomes well-rounded which brings about many domain-oriented data integration systems of Deep Web. In this thesis, we suppose all data sources in Deep Web have been clustered according to the domain, each cluster which integrates all data sources belonging to a domain corresponding with a global query interface. At present, cross-domain query has been a pressing need with the increasing number of Deep Web applications. What this thesis researches is how to meet users’need of cross-domain query.To solve this problem, this thesis proposes a system named cross-domain query automatically. It includes two parts:(1) find the correlation between different domains and construct a domain correlation model. In this step, we analyze the correlation of domains based on the query interface’s attributes of the data source and the attribute values. We present an algorithm to calculate the correlation between two data sources from two domains and to justify whether two domains are correlated.(2) When a user does a query in a global query interface of some domain we construct a query tree based on the domain correlation graph. Furthermore, we present a cross-domain query oriented Query Path Evaluating Model (QPEM) to rank and recommend top-k query paths to meet all possible query intentions.We use samples of Web databases as the basis of selecting databases. Firstly. we choose Web databases which meet the user’s query according to samples. Secondly, queries are sent to the real Web databases which are chosen in order to reduce query cost. Furthermore, the content correlation of data sources is also based on the samples. The QPEM is a cross-domain query oriented query path evaluating model to rank and recommend top-k query paths based on the correlation between data sources, the quality of the father data source, the outgoing degree of it and the quality of the son data source, the incoming degree of it.In experiments, we get a high correlation precision of data sources. What’s more, we compare user satisfaction of four standardization methods and evaluate the influence of query coverage to user satisfaction. The experiment results show the effectiveness of the method proposed in this thesis.
Keywords/Search Tags:Deep Web, cross-domain, top-k, domain correlation
PDF Full Text Request
Related items