Font Size: a A A

Graph-Knowledge-Bases Based Collection Selection For Distributed Information Retrieval

Posted on:2018-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:B L HanFull Text:PDF
GTID:2348330512483426Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Collection Selection aims at selecting a small number of information collections,which is critical to improve search engines efficiency.Most state-of-the-art collection selection methods use central sample index that consists of some documents sampled from each collection to represent collection semantic information.However,these methods rely solely on 'morpho-syntactic' information.In this paper,we propose a collection selection method that models the collection as a weighted sub-graph of a knowledge base.Firstly,DBpedia is used as the graph knowledge base.Context-and structure-based measures are used to weight the semantic distance between any pair of entities extracted from the sampled documents of a collection.Secondly,the similarity between a query and a collection is calculated by the aggregated distance of their semantic distance,which also takes the frequency of entity and collection size into account.Finally,collections are ranked based on the similarity.In order to enrich the entities contained in a query,DBpedia based query expansion is integrated.To overcome the limitations of traditional collection similarity metrics,we use a learning to rank algorithm,LambdaMART,to combine multiple collection similarity metrics,which trains a reasonable collection ranking model.To evaluate the performance of our method,ReDDE,CRCS and DLCS were chosen as baseline methods and extensive experiments were conducted on a large webpage dataset.Experiment results demonstrate the effectiveness of the proposed method.
Keywords/Search Tags:distributed information retrieval, collection selection, knowledge bases, query expansion, learning to rank
PDF Full Text Request
Related items