Font Size: a A A

Research On Pre-trained BERT Based Pseudo-relevance Feedback Method

Posted on:2022-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2518306347989609Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The semantic gap between a document and a query is a challenging problem in information retrieval,which could be solved to some extent by using pseudo-relevance feedback.Due to the complexity of natural language,the traditional pseudo-relevance feedback method is hard to accurately judge the semantic relevance between a candidate expansion word and a query,which inevitably induces noise information.The pre-trained model BERT is a major milestone in many Nature Language Processing tasks.Nogueira et al.proposed a BERT-based ranking model which outperformed the previous state-of-the-art models by 27%(relative)in MRR@10 on the MS MARCO passage retrieval task.BERT performs better than traditional text modeling methods in capturing semantic information.This paper conducts a comprehensive and systematic study on the feasibility and effectiveness of applying BERT in pseudo-relevance feedback.We propose two novel BERT-based pseudo-relevance feedback approaches.The main work of this paper includes the following two aspects:On the one hand,we research how to use BERT for selecting expansion words.We propose a method that utilizes BERT-based word embedding to select and weigh the expansion words.Firstly,we use the traditional BM25 model to perform the first retrieval and get the N top-ranked documents.Secondly,we sort the candidate expansion words by the semantic similarity between them and the queries,and select the final expansion words.Finally,we re-rank the documents by using the newly formed query.We conduct a large number of experiments on three TREC datasets:AP90,Disk4&5,and WT2G.The experimental result demonstrates that the BERT-based approach can select better expansion words compared to the previous methods,thus promoting retrieval performance.On the other hand,we research how to use BERT for selecting expansion text chunks.Considering the relevance between a document and a query as well as the relevance between documents and expansion items,we introduce a model named PEACE.Firstly,we use the traditional BM25 model to perform the first retrieval and get the N top-ranked documents.Secondly,we use BERT to get the representation of each document.Thirdly,we choose text chunks from the candidate documents.The scores of BM25 and the similarity of text chunks are calculated with different weights.Finally,we use the scores to re-rank the candidate documents.We conduct a large number of experiments on three TREC datasets:AP90,Disk4&5,and WT2G.There are great improvements in both NDCG@10,NDCG@20,P@10,and P@20 contrasts to Rocchio.This demonstrates the effectiveness of the proposed PEACE model.
Keywords/Search Tags:information retrieval, pseudo-relevance feedback, query expansion, word embedding, BERT
PDF Full Text Request
Related items