Research On Learning Algorithms For Resource Selection And Results Merging In Distributed Information Retrieval

Posted on:2020-06-29

Degree:Master

Type:Thesis

Country:China

Candidate:T F Wu

Full Text:PDF

GTID:2428330590960618

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Information retrieval technology provides convenient services for people's daily information search and information filtering.With the increasing amount of network information,people often hope to quickly search for diverse and relevant search results.Distributed information retrieval technology can forward query to relevant distributed collections and merge diverse search results back to users.There are two environments in distributed information retrieval including cooperative and uncooperative environment.In a cooperative environment,the broker can get all the information inside the resource.In an uncooperative environment,the broker usually obtains information about the resource by query based sampling.There are many factors affecting the effectiveness of resource selection and the results merging in distributed information retrieval.By combining with multiple factors for learning in distributed information retrieval,we can effectively fit various characteristics and improve the effectiveness of resource selection and results merging.Firstly,A resource selection algorithm based on learning to rank called LTR_RS is proposed in this dissertation.By analyzing the factors affecting the effectiveness of resource selection,we extract three kinds of features including term matching features,central sample index based features and topical relevance features.By training LambdaMART learning to rank model and optimizing NDCG metric of resource rank list,the resource selection perfermance is improved in LTR_RS.Experiments on the SoGou dataset SogouQCL show that LTR_RS algorithm can significantly outperform the baseline methods in NDCG and precision metrics.In addition,under the circumstances that without enough training data in resource selection and the multi-factor features extraction is not sufficient in uncooperative environment,An unsupervised resource selection algorithm named VAE_RS based on variational autoencoder is proposed in this dissertation.VAE_RS uses the unsupervised learning model named variational autoencoder to model the documents in the resource,and then uses the latent variable of the documents to get the vector representation of each resource.By calculating the similarity between the the vector of query and resources,we get the relevance of each resource.Finally,the effectiveness of the algorithm is proved by experiments on the TREC Fedweb dataset.In the results merging stage,a learning to merge framework by combining factors from documents,result lists,resource and vertical is proposed in this dissertation.By analyzing the factors affecting the effectiveness of results merging,we extract multi-factor features in the framework,and then the features are fitted by the LambaMart model to optimize the NDCG metric of the final result list.Experiments on the Fedweb dataset show that the multi-factor learning to merge algorithm outperform other models including the state-of-art deep learning model DeepMerge.

Keywords/Search Tags:

Distributed information retrieval, Resource Selection, Results Merging, Learning to rank

PDF Full Text Request

Related items

1	Improving Resource Selection And Result Merging In An Uncooperative Search Environment
2	Key Problems Research On Distributed Information Retrieval
3	Research On P2P Search Technology In Uncooperative Environments
4	Query Expansion And Cluster Based Distributed Information Retrieval
5	Research On Learning To Rank For Information Retrieval
6	Research On Personalized Meta Search Results Merging In Information Retrieval
7	Graph-Knowledge-Bases Based Collection Selection For Distributed Information Retrieval
8	Research And Implementation Of Distributed Search Diversity Based On Vertical
9	The Study On Learning To Rank For Information Retrieval Using The Clonal Selection Algorithm
10	Selective merging of retrieval results for metasearch environments