Font Size: a A A

The Study On Learning To Rank For Information Retrieval Using The Clonal Selection Algorithm

Posted on:2013-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Q HeFull Text:PDF
GTID:2248330374983428Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Information retrieval deals with the representation, storage, organization of, and access to information items. One central problem of Information Retrieval (IR) is ranking, that is to determine which documents are relevant and which are not to the user information need. Without loss of generality, we focus on document retrieval in this thesis. Given a query, IR systems rank the documents according to their degrees of relevance with respect to the query. With the explosive growth of information on the internet, ranking is becoming one of the research hotspots.Many models and methods have been proposed to solve the ranking problem, e.g., the Boolean model, vector space model, probabilistic model, and language model. These conventional IR models have represented a document as a set of representative keywords and defined a ranking function (or retrieval function) to associate a relevance degree with a document and a query. In general, these models are designed in an unsupervised manner and thus the parameters of the underlying ranking functions, if exist, are usually tuned empirically. In addition, it’s limited and inflexible to use evidences in IR since ranking functions can’t be too sophisticated.In recent years, machine learning technique, that is, learning to rank, is becoming more widely used for the ranking problem of IR. Learning to rank has been devoted to automatically learning a ranking function from a set of training data with relevancy labels. One fundamental issue of learning to rank is the choice of loss function to be optimized. Since in IR, ranking results are generally evaluated using IR evaluation measures and therefore these IR measuresl are ideal loss functions for optimizing. In this way, high accuracy in training promises high performance in evaluation. However, this is usually difficult due to the requirements of loss functions in conventional machine learning techniques, i.e., most optimization algorithms need smooth, or even convex loss functions while IR measures are not. In fact, they are rank-dependent, and thus non-continuous and non-differentiable. We consider there would be a great benefit if IR measures but not surrogate loss functions are optimized directly. In this thesis, we aim to develop a general learning algorithm that can directly optimize the performance measures used in IR innately. To this end, we employ the clonal selection algorithm to address this challenge. A learning to rank method called RankCSA is proposed to directly optimize IR evaluation measures. RankCSA employs the clonal selection algorithm to leam an efficient ranking function by combining various evidences in IR. In learning, RankCSA represents a list of documents for a query as an antigen and a ranking function as an antibody. A shape-space model for the ranking problem in IR is defined. Furthermore, since the evolutionary process is discrete, the affinity function is directly defined as IR evaluation measure. Our method evolves a repertoire of antibodies using clonal selection principle and affinity maturation process over a series of generations. Individual antibodies are selected, cloned, replaced, or hyper-mutated according to their affinities. Eventually, the optimal solution is selected from the final repertoire of antibodies as the final ranking function.To validate our approach, we perform experiments on the LETOR benchmark datasets. We use four state-of-the-art learning methods, i.e., RankSVM, ListNet, AdaRank and RankBoost as baselines in comparison of our methods. The results on three datasets show that RankCSA achieves consistent improvements over the compared ranking algorithms in most cases.
Keywords/Search Tags:Clonal Selection Algorithm, Information Retrieval, MachineLearning, Learning to Rank, Ranking function, LETOR
PDF Full Text Request
Related items