Font Size: a A A

Research On Resource Selection For Retrieval Result Diversification In Federated Search

Posted on:2019-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2428330566472837Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,Internet technology has been developed rapidly,and search engine has become one of the most commonly used network applications.In the face of massive network information and huge search requirements,traditional centralized information retrieval systems are no longer competent.Based on distributed architecture,federal search is one of the most concerned search engine technologies in academia and industry.Resource selection is an important task in federated search.On the other hand,search result diversification is a research hotspot in the field of information retrieval in recent years.Its main purpose is to satisfy users' diversified search requirements,which is an important technique especially for short queries and ambiguous queries.How to adapt to the new search environment,by selecting a group of appropriate resources through effective resource selection algorithm,in the federated retrieval system,so as to achieve search result diversification is the main object of this search.The main work of this thesis is as follows:(1)Based on the LDA topic model,a variety of resource selection methods are proposed.The methods use ? correlation to filter Centralized Sample Index,analyze sample related documents by LDA topic model.The methods adopt greedy selection strategy,and select resource groups with best performance for diversification,by balancing relevance and diversity.This type of method can be applied at both the document level and the resource level,and D-LDA and R-LDA algorithms are proposed.(2)Based on the distributed word representation,we analyze the resource sample documents from the perspective of text semantics and take TF-IDF weights into account,fully consider the distributed characteristics of the terms,and model document and resource in the semantic space.The methods are applicable at both the document level and resource level,and the D-WE and R-WE algorithms are proposed.(3)A federated search experiment environment was constructed based on the data set of Clueweb12-B13 for search result diversification.A diversified evaluation method SDC,which is adapted to the federated search environment,is proposed as a complement to evaluation indicators of diversified resource selection in experiments.(4)Experiments are performance to evaluate all the four methods proposed in this paper among some existing ones.Both performance and efficiency of the four algorithms are considered.The experiment results show that all four algorithms are effective.Among them,the D-LDA algorithm,which is based on the document level LDA topic model,has the best performance.
Keywords/Search Tags:federated search, resource selection, search result diversification, LDA topic model, distributed word representation
PDF Full Text Request
Related items