A Query Expansion Algorithm Based On Overlapped Cluster

Posted on:2015-09-19

Degree:Master

Type:Thesis

Country:China

Candidate:N Liu

Full Text:PDF

GTID:2348330518970439

Subject:Computer software and theory

Abstract/Summary:

With the rapid development of Internet technology, it is one of the main methods to obtain information by search engines or Web network. And it has become accustomed to the way people work and live. Resources on the network are changed rapidly、updated fast and distributed diversely and widely. The user query usually only is composed of several words,so that the query can not express clearly and accurately information what the user looks for. It increases the difficulty of information retrieval so that retrieval results can not meet the needs of the user. It can effectively improve retrieval results of the system by feedback information and query expansion. So feedback techniques and query expansion technique became the focus of research in the field of information retrieval.We study feedback models of query expansion technique. Then we find a great deal of noise obtained in the top-retrieved n documents. So that the new query deviates from the information which the original query expresses. We propose a query expansion algorithm based on overlapped clusters to overcome shortcomings of pseudo relevance feedback model.It is different from Knn algorithm and other classic clustering algorithm that clusters of overlapped clustering can overlap. The algorithm we propose takes advantage of the characteristics of overlapped clustering to identify dominant documents. And it can automatically set the size of the window of feedback by dominant documents. Because a cluster represents a query topic and the dominant document appears several clusters, the dominant document can represent several query topics and is more relevant than other document. The algorithm we propose not only improves the quality of expansion sources but also overcomes the shortcomings of pseudo relevance feedback model which has heavy reliance on the number of feedback documents; meanwhile we use Apriori algorithm instead of traditional probabilistic model to ding out expansion words form feedback documents. It improves the quality of expansion words.Finally, we make experiments to verify the performance of the algorithm we propose.The results show that the proposed algorithm not only improves the retrieval performance of the system, but also has a better robustness; we analyze it how to affect the performance of the proposed algorithm that the number of feedback documents, the number of expansion word, the threshold value of dominant documents as well as a parameter of the new query expression.

Keywords/Search Tags:

information retrieval, query expansion, pseudo-relevance feedback, overlapped cluster, association rules

Related items

1	Research On Pseudo Relevance Feedback Query Expansion Technology Based On Latent Semantic Relation
2	Research On Pre-trained BERT Based Pseudo-relevance Feedback Method
3	Cross Language Information Retrieval Based On Topical Pseudo Relevance Feedback
4	Studies On Affinity Propagation Based Pseudo-Relevance Feedback And Document Expansion For Spoken Document Retrieval
5	Research And Application On Expansion Term Ranking Model For Query Understanding
6	Research On Pseudo Relevance Feedback Based On Document Similarity
7	Research On Retrieval Method Based On Positional Relationship In Document
8	Query Expansion Based On Supervised Learning
9	Research On Query Expansion Technique Of Retrieval System In Biomedical Field
10	Research Of XML Information Retrieval Based On Pseudo-relevance Feedback