| With the rapid development of Internet, and the fast growth of digital recording devices, people are now flooded with increasing amounts of information, an effective and efficient information retrieval system becomes more crucial than ever, therefore, web search engine evolves to be an important daily tool for people to find desired information. Web search engine contains many sub-systems, and ranking system plays a significant role for its ability of matching a web users’intent with a list of most relevant web pages, which can greatly reduce the time required to search desired information for web users. For this reason, tremendous efforts have been devoted to develop effective ranking methods, which mainly explore the relevance features in documents based on content analysis techniques, or investigate the importance features of document using link analysis techniques. Although plenty ranking algorithms have improved information retrieval system a lot, they all share two crucial limitations:they fails to handle large amount of features which are all useful for ranking, or it is intractable for them to tune a ranking model with many (hundreds or even thousands) parameters when the model is formulated on the basis of large amounts of features. Fortunately, during the past decades, researchers have developed a new branch referred to as learning to rank, which can easily handle these problems and learn a ranking model with higher performance.Learning to rank is an interdisciplinary of machine learning and information retrieval, it learns ranking model from given training data set, where each training instance is presented as a feature vector and marked with a human-given relevance level. With the learned model from the training procedure, the ranking system can make predictions on the unseen instances. Most of learning to rank algorithms can be roughly categorized into three groups: the pointwise, the pairwise, and the listwise approach. Previous works have demonstrated that the listwise approach performs best on most public data collections. Therefore, the work focus on this approach and proposes one novel ranking method referred to as DEARank, based on data envelopment analysis (DEA) technique and boosting technique.In this work, we present two DEA variant models:CCR-I and CCR-O, which are modified from the classical CCR model. Both models treat the documents to be ranked as the decision making units (DMUs) in context of DEA and are utilized to construct a pool of weak ranker candidates, which are the optimal weights solved from those linear programming models. Each weak ranker represented a feature subset drawn from the complete input feature space, and depicts one preference from the corresponding DMU. Then, we propose the DEARank algorithm by employing the boosting technique and aim to train a ranking function with higher performance with those weak candidates. We also conduct extensive experiments on LETOR3.0and LETOR4.0collections (including HP2003, HP2004, NP2003, NP2004, TD2003, TD2004, OHSUMED, MQ2007and MQ2008), with twelve well-known algorithms as the baselines. The experimental results indicate that DEARank is a promising learning to rank algorithm, and DEARank provides an important tool for web information retrieval system. Acknowledgments#44References#45List of Publications#53... |