Font Size: a A A

A Comparative Study Of Explicit Search Result Diversification Algorithms

Posted on:2017-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:T T ChenFull Text:PDF
GTID:2308330503464109Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, search result diversification has become a hot topic in the field of information retrieval. Many diversification methods have been proposed and studied. However, the performance and features of each method still remain unclear. Therefore it is a good undertaking to analyze and compare group of results diversification methods on a unified platform, so that we can use results diversification techniques in a better way.On the other hand, the data fusion technology in information retrieval has been used to obtain more effective fusion results by combining multiple component search results. Previous research has demonstrated that some data fusion methods are useful for improving performance of search results based on relevance, but little consideration is given to the diversity aspect of the result. In theory, because different search systems return different results, fusing all of them is also helpful for results diversity. Therefore we investigate how to use the data fusion method for this purpose. The main research work of this paper is as follows:(1) We analyze and compare three results diversification methods IA-SELECT, x QuAD and PM2. Experiments are carried out with 4 groups of data sets from the TREC web track task, and four diversity-oriented evaluation metrics, Prec-IA, MAP-IA, ERR-IA and α-nDCG, are used for measuring the performance of these diversification algorithms. It is found that xQuAD is optimal better than the others with optimal setting, IA-SELECT and PM2 comparable in performance.(2)A new diversification method, CombSumDiv, is proposed based on the traditional data fusion algorithm CombSum, considering both relevance and diversity of the document at the same time. And compared with three explicit result diversification algorithms IA-SELECT, x QuAD and PM2, it is slightly better than xQuAD, and much better than IA-SELECT and PM2.(3)Through the increase and decrease of the number of subtopics we try to test the stability of CombSumDiv, IA-SELECT, xQuAD and PM2 under different situations. When all the subtopics were identified, the final results of four methods were not affected by the addition of a number of falsely identified subtopics. When subtopics are partially identified, the results are not very different from the results with all the subtopics identified. Therefore we conclude that all those results diversification methods are stable with strong ability of anti-interference.In short, based on analysis and comparison of existing diversification methods, this dissertation addresses the problem of search result diversification in information retrieval from the new perspective of data fusion. We also test the stability of all the algorithms involved. Few search has been done to deal with these problems. We can expect that the output of this dissertation is able to serve as a guide for people to use those results diversification methods effectively in the future.
Keywords/Search Tags:search result diversification, data fusion, re-ranking, performance comparison, stability
PDF Full Text Request
Related items