Based On The Keywords Retrieval Of Xml Data Sources

With the rapid growth of data in the Internet, data are distributed to several data sources rather than only in one single data source. The users’keyword query will be delivered to each data source to process to get the result. In order to accelerate the query evaluation process, the key problem is how to select the relevant data sources to the keyword query. In this paper, we proposed a keyword search based xml data source selection method. To make it easier to predict the relevance of the data source to the query, we propose to use XDS (xml data source summary) to summarize the relationship between keywords and the data source. The nodes in XML documents are organized hierarchically, and we capture this feature as well as the textual information of xml documents and integrate them to an evaluation formula, which is defined recursively to construct the XDS, and XDS will store some numerical information standing for the relevance between keyword pairs and the data source. Along with XDS, updating algorithms and compacting strategy with some threshold are also provided to improve the runtime process performance, based upon XDS, we propose four selection methods to select the most relevant top-k data sources with respect to users" keyword queries, and we evaluate these selection methods as well as K-Graph method with DBLP dataset, and compare them with each other, the results show the best performance of our proposed methods is both efficient and effective.
