Research On Pre-trained BERT Based Pseudo-relevance Feedback Method

Posted on:2022-07-02

Degree:Master

Type:Thesis

Country:China

Candidate:L Li

Full Text:PDF

GTID:2518306347989609

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The semantic gap between a document and a query is a challenging problem in information retrieval,which could be solved to some extent by using pseudo-relevance feedback.Due to the complexity of natural language,the traditional pseudo-relevance feedback method is hard to accurately judge the semantic relevance between a candidate expansion word and a query,which inevitably induces noise information.The pre-trained model BERT is a major milestone in many Nature Language Processing tasks.Nogueira et al.proposed a BERT-based ranking model which outperformed the previous state-of-the-art models by 27%(relative)in MRR@10 on the MS MARCO passage retrieval task.BERT performs better than traditional text modeling methods in capturing semantic information.This paper conducts a comprehensive and systematic study on the feasibility and effectiveness of applying BERT in pseudo-relevance feedback.We propose two novel BERT-based pseudo-relevance feedback approaches.The main work of this paper includes the following two aspects:On the one hand,we research how to use BERT for selecting expansion words.We propose a method that utilizes BERT-based word embedding to select and weigh the expansion words.Firstly,we use the traditional BM25 model to perform the first retrieval and get the N top-ranked documents.Secondly,we sort the candidate expansion words by the semantic similarity between them and the queries,and select the final expansion words.Finally,we re-rank the documents by using the newly formed query.We conduct a large number of experiments on three TREC datasets:AP90,Disk4&5,and WT2G.The experimental result demonstrates that the BERT-based approach can select better expansion words compared to the previous methods,thus promoting retrieval performance.On the other hand,we research how to use BERT for selecting expansion text chunks.Considering the relevance between a document and a query as well as the relevance between documents and expansion items,we introduce a model named PEACE.Firstly,we use the traditional BM25 model to perform the first retrieval and get the N top-ranked documents.Secondly,we use BERT to get the representation of each document.Thirdly,we choose text chunks from the candidate documents.The scores of BM25 and the similarity of text chunks are calculated with different weights.Finally,we use the scores to re-rank the candidate documents.We conduct a large number of experiments on three TREC datasets:AP90,Disk4&5,and WT2G.There are great improvements in both NDCG@10,NDCG@20,P@10,and P@20 contrasts to Rocchio.This demonstrates the effectiveness of the proposed PEACE model.

Keywords/Search Tags:

information retrieval, pseudo-relevance feedback, query expansion, word embedding, BERT

PDF Full Text Request

Related items

1	Research On Pseudo Relevance Feedback Query Expansion Technology Based On Latent Semantic Relation
2	Cross Language Information Retrieval Based On Topical Pseudo Relevance Feedback
3	Studies On Affinity Propagation Based Pseudo-Relevance Feedback And Document Expansion For Spoken Document Retrieval
4	Research And Application On Expansion Term Ranking Model For Query Understanding
5	Research On Pseudo Relevance Feedback Based On Document Similarity
6	A Query Expansion Algorithm Based On Overlapped Cluster
7	Research On Retrieval Method Based On Positional Relationship In Document
8	Query Expansion Based On Supervised Learning
9	Research On Query Expansion Technique Of Retrieval System In Biomedical Field
10	Research Of XML Information Retrieval Based On Pseudo-relevance Feedback