Font Size: a A A

Results Diversification For XML Keyword Search Based On Semantics

Posted on:2015-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y L SongFull Text:PDF
GTID:2268330425488836Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, as the standard to describe and exchange data on the Internet, XML attracts more and more attention. XML query becomes a popular research topic as well. For those users without professional background, traditional structure search method is no longer applicable. Keyword search can finish retrieving without complicated search languages and schema information of the databases, so it gains a large space for development.However, keyword search has the disadvantages of unexpressivity and inherent ambiguity, which lead to that best-effort results are returned after query processing. If the size of search space is large, a great many results would be returned at the same time. Particularly, how to organize these search results in a reasonable and effective way is a problem exigent to solve. This paper studies the problem of diversifying XML keyword search results. Diversification is done by organizing search results into categories based on some guideline, aiming to providing convenience for users’searching. Our main work can be summarized as follows:1) Represent XML data as several entities based on the different semantic objects they describe. Each of the entities contains unique semantic information. Then we begin to cluster these entities semantically. We can define a compositive formula of calculating the semantic similarity by analyzing the typical characteristic information of entities. After the semantic similarity value is computed among all the entities, we cluster the entities based on the clustering algorithm selected in advance.2) Propose to diversify search results in such a way that results belonging to entities which carry similar semantics are grouped into one cluster. Most of the previous works on diversification pay more attention on results match pattern, while our method avoids the burdensome processing. How to find the central entity a search result belongs to is defined from the semantic point of view. So search results from central entities of the same cluster would be put together to construct a group. This method not only takes different users’search demands into full consideration by generating results with clear boundaries, but also reduces on-line processing time greatly because a plenty of heavy work has been finished off-line before inputting keywords.The experimental results show that the formula of calculating the semantic similarity can reflect the semantic distance of different entities and our diversification method based on the semantic category of central entity is better than other results diversification methods in effectiveness, efficiency and scalability.
Keywords/Search Tags:XML, keyword search, entity, diversification
PDF Full Text Request
Related items