Font Size: a A A

Research On Learning Algorithms For Resource Selection And Results Merging In Distributed Information Retrieval

Posted on:2020-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:T F WuFull Text:PDF
GTID:2428330590960618Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Information retrieval technology provides convenient services for people's daily information search and information filtering.With the increasing amount of network information,people often hope to quickly search for diverse and relevant search results.Distributed information retrieval technology can forward query to relevant distributed collections and merge diverse search results back to users.There are two environments in distributed information retrieval including cooperative and uncooperative environment.In a cooperative environment,the broker can get all the information inside the resource.In an uncooperative environment,the broker usually obtains information about the resource by query based sampling.There are many factors affecting the effectiveness of resource selection and the results merging in distributed information retrieval.By combining with multiple factors for learning in distributed information retrieval,we can effectively fit various characteristics and improve the effectiveness of resource selection and results merging.Firstly,A resource selection algorithm based on learning to rank called LTR_RS is proposed in this dissertation.By analyzing the factors affecting the effectiveness of resource selection,we extract three kinds of features including term matching features,central sample index based features and topical relevance features.By training LambdaMART learning to rank model and optimizing NDCG metric of resource rank list,the resource selection perfermance is improved in LTR_RS.Experiments on the SoGou dataset SogouQCL show that LTR_RS algorithm can significantly outperform the baseline methods in NDCG and precision metrics.In addition,under the circumstances that without enough training data in resource selection and the multi-factor features extraction is not sufficient in uncooperative environment,An unsupervised resource selection algorithm named VAE_RS based on variational autoencoder is proposed in this dissertation.VAE_RS uses the unsupervised learning model named variational autoencoder to model the documents in the resource,and then uses the latent variable of the documents to get the vector representation of each resource.By calculating the similarity between the the vector of query and resources,we get the relevance of each resource.Finally,the effectiveness of the algorithm is proved by experiments on the TREC Fedweb dataset.In the results merging stage,a learning to merge framework by combining factors from documents,result lists,resource and vertical is proposed in this dissertation.By analyzing the factors affecting the effectiveness of results merging,we extract multi-factor features in the framework,and then the features are fitted by the LambaMart model to optimize the NDCG metric of the final result list.Experiments on the Fedweb dataset show that the multi-factor learning to merge algorithm outperform other models including the state-of-art deep learning model DeepMerge.
Keywords/Search Tags:Distributed information retrieval, Resource Selection, Results Merging, Learning to rank
PDF Full Text Request
Related items