Font Size: a A A

The Research Of Machine Learning Techniques And External Web Resources For Relevance Feedback

Posted on:2012-10-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z YeFull Text:PDF
GTID:1118330335454687Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the explosive growth of information on the Internet, there is an increasing need for information systems to help users find the resource they need. Information retrieval system is to response this challenge of information overload in general. Its main application, search engine, has achieved great success during the past decade. Extensive experiments have proven that relevance feedback technique is one of the most effective techniques for ad hoc information retrieval. In this dissertation, we mainly explored utilizing machine learning and Web mining techniques to further enhance relevance feedback methods. In particular, the main work of this dissertation can be summarized as follows:(1) For most of the current relevance feedback models, the expansion terms are selected based the document level statistics. However, for a given feedback document, even it is humanly judged to be relevant, may consist of different topics. Obviously, not all these topics are useful for relevance feedback models. We argued that it is more reasonable to conduct relevance feedback on a fine-grained level. Following this argument, a novel topic-based relevance feedback model is proposed in this dissertation, in which three different methods for approaching the query-related topic are explored.(2) In traditional relevance feedback models, each feedback document is treated equally. In fact, the feedback documents are different in quality, therefore will influent the relevance feedback process differently. In order to address this problem, we revisit Rocchio's algorithm by proposing to integrate this classical feedback method into the divergence from randomness(DFR) probabilistic framework for pseudo relevance feedback(PRF). Such an integration is denoted by RocDFR in this paper. In addition, we further improve RocDFR's robustness by proposing two quality-biased feedback methods, called QRocDFR and ReRocDFR.(3) Most existing relevance feedback approaches are based on the assumption that the most informative terms in top-ranked documents from the first-pass retrieval can be viewed as the context of the query, and thus can be used to specify the information need. However, there may be irrelevant documents used in PRF (especially for hard topics), which can bring noise into the feedback process. The recent development of Web 2.0 technologies on Internet has provided an opportunity to enhance PRF as more and more high-quality resources can be freely obtained. (4) Most current PRF approaches estimate the importance of the candidate expansion terms based on their statistics on document level. However, in traditional PRF approaches, the context information is always ignored in traditional query expansion models. Therefore, off-topic terms can also be selected, which may result in a decrease of retrieval performance. In this paper, we propose a context-based feedback framework based on Bayesian network, in which multiple context information can be taken into account.
Keywords/Search Tags:Text Information Retrieval, Retrieval Model, Relevance Feedback, Machine Learning
PDF Full Text Request
Related items