Font Size: a A A

A Query Expansion Algorithm Based On Overlapped Cluster

Posted on:2015-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:N LiuFull Text:PDF
GTID:2348330518970439Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, it is one of the main methods to obtain information by search engines or Web network. And it has become accustomed to the way people work and live. Resources on the network are changed rapidly?updated fast and distributed diversely and widely. The user query usually only is composed of several words,so that the query can not express clearly and accurately information what the user looks for. It increases the difficulty of information retrieval so that retrieval results can not meet the needs of the user. It can effectively improve retrieval results of the system by feedback information and query expansion. So feedback techniques and query expansion technique became the focus of research in the field of information retrieval.We study feedback models of query expansion technique. Then we find a great deal of noise obtained in the top-retrieved n documents. So that the new query deviates from the information which the original query expresses. We propose a query expansion algorithm based on overlapped clusters to overcome shortcomings of pseudo relevance feedback model.It is different from Knn algorithm and other classic clustering algorithm that clusters of overlapped clustering can overlap. The algorithm we propose takes advantage of the characteristics of overlapped clustering to identify dominant documents. And it can automatically set the size of the window of feedback by dominant documents. Because a cluster represents a query topic and the dominant document appears several clusters, the dominant document can represent several query topics and is more relevant than other document. The algorithm we propose not only improves the quality of expansion sources but also overcomes the shortcomings of pseudo relevance feedback model which has heavy reliance on the number of feedback documents; meanwhile we use Apriori algorithm instead of traditional probabilistic model to ding out expansion words form feedback documents. It improves the quality of expansion words.Finally, we make experiments to verify the performance of the algorithm we propose.The results show that the proposed algorithm not only improves the retrieval performance of the system, but also has a better robustness; we analyze it how to affect the performance of the proposed algorithm that the number of feedback documents, the number of expansion word, the threshold value of dominant documents as well as a parameter of the new query expression.
Keywords/Search Tags:information retrieval, query expansion, pseudo-relevance feedback, overlapped cluster, association rules
PDF Full Text Request
Related items