Font Size: a A A

Study On Positive And Negative Relevance Feedback Query Expansion Techniques

Posted on:2013-06-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y WangFull Text:PDF
GTID:1228330398496409Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information acquisition occupied an important position in people’s work, life and various activities, and the methods of Information acquisition are varied. With the rapid development of computer networks, mobile communications, and global informatization, it becomes an important method to access information using Web network and search engine, and becomes an accustomed behavior of user to access information. All the factors such as wide distribution, various shapes, information, open organization, loose management, fast alteration and fast transmission of information make information retrieval more difficult. Users put forward higher and more requirements, how to get results satisfied user information needs precisely and quickly is big challenge to the information retrieval. The search engine must have strong and advanced information retrieval technical support, and by this way, could meet user’s requirements well.Usually the user cannot express their information needs accurately and clearly, often just a few words, such that retrieval results are not satisfied well. It is a good strategy to improve the retrieval performance commonly and effectively using he relevant feedback to modify the query model, the query expansion and feedback technology is always the focus of study in information retrieval field. The most of the previous works about feedback place emphasis on relevance feedback and pseudo relevance feedback in IR. In recent years, some work has investigated the negative relevance feedback in IR. However under the framework of the language model, so far, it has not been seen in SIGIR to study on the mixture model of negative feedback and positive feedback. This thesis mainly do research based on the positive and negative feedback model, including positive and negative feedback automatic recognition, model parameter dynamic adjustment, multi-subject feedback and so on. The main contributions include:(1) Positive and negative relevance feedback modeling framework:We proposed new model framework integrated positive feedback with negative feedback based on language model in IR. The relevance, pseudo relevance and negative relevance feedback all are the special case of this model. Positive feedback enhanced query information and negative feedback can effectively suppress the query noise, which can effectively improve the retrieval performance. Our model exceeds pseudo relevance feedback model and relevance feedback model in average precision and the top10documents precision. Compared to pseudo relevance feedback, our model significantly reduced the number of harmed query to improve the robustness of queries.Positive and negative feedback model is composed of the query, positive feedback and negative feedback by linear interpolation, so there are three proportion coefficients for the three components, which are parameters of the model. For any mixture retrieval model, its search results are sensitive to the parameters. Aim at positive and negative feedback model we present two kinds of simple, feasible and effective algorithms to dynamically adjust the parameters, one is to calculate the parameter according to the proportion of irrelevant documents in top k documents, the other is to learn the parameters through training documents. Thus can further improve the retrieval performance of the positive and negative feedback model.(2) Clustering distinguishes relevant and irrelevant documents:We analysis relevant and irrelevant documents’distribution characteristics in the top k documents and found that density clustering algorithm can well identify isolated irrelevant documents through the theoretical analysis and experiments. After modifying the density clustering algorithm DBSCAN, we could find the irrelevant documents in the top k documents with more than72%precision and more than32%of the recall, and find the relevant documents in the top k documents with more than54%precision and more than87%of the recall.The top k documents are divided into two sets, connected pint sets and isolated point set, respectively as positive and negative feedback in the positive and negative feedback model, retrieval performance far exceeds the pseudo relevance feedback.(3) Improving pseudo relevance feedback model with multi-subject domain:We present a new model which use multi subject domain information to improve pseudo relevance feedback. The new query is composed of the original query, top k and multi subject domain’s top s, which can effectively improve the pseudo relevance feedback model’s retrieval performance. This method can be applied to personalized retrieval.
Keywords/Search Tags:Information Retrieval, Relevant Feedback, Negative Feedback, QueryExpansion, Clustering
PDF Full Text Request
Related items