Font Size: a A A

Research On Contextual Text Retrieval Technology Based On BERT And Text Segmentation

Posted on:2021-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:S J GaoFull Text:PDF
GTID:2518306104993639Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Text retrieval aims to find the most relevant subset of documents in the document collection for a given query.It can effectively compress and integrate the meaning of the words and sentences in the document and match them with the query.It can take the advantage of computer's ability to processing massive data,reducing the scope of document access and retrieval quickly,greatly improving the efficiency of screening and processing information.This paper studies the text retrieval model and finds that the main model of text retrieval is to construct the representation of input query and text,and the relationship between text and query depends on the similarity degree of representation.This kind of model is often unable to solve the problem of long distance dependence and can't model the semantics well.As a result,the vector representation of the model cannot accurately represent the semantic information of the text.Therefore,this paper believes that the main challenge at present is to improve the vector representation of the original text so as to improve the effect of the whole retrieval model.Aiming at the above challenges,this paper designed a Bert-based Text Retrieval Model(BTRM)based on the BERT pre-training model.After the BTRM model concatenates the query and the text,the cascading Encoder is used to model the inter-sentence relationship and obtain the similar prediction between the two.In addition,in order to better use of BERT and get the text context semantics,we propose the BERT-based text segmentation network,while solving BERT input length limit,further excavate the context of the text semantics,obtain more coherent semantic text block,make the text retrieval similarity matching prediction more accurate.To validate the validity of the model,this thesis were conducted on an open set of data Robust04 used in the TREC information retrieval meeting.The experimental results were evaluated using a versatile nDCG indicator.This paper mainly compares the BERT based text retrieval model with some previous neural network semantic models.At the same time,on the basis of the BERT based text retrieval model,it compares the improvement of text segmentation and the influence of different text segmentation technologies on the model effect.Experimental results show that,the text retrieval model based on BERT can obtain better result compared with other models,compared with the neural network retrieval model DRMM 9.7%improvement on nDCG@20 indicator,the experiment proves that adding text segmentation network has ascend to the retrieval results,and text segmentation network based on BERT compared with the other text segmentation technology,promote more obvious of the effects of text retrieval,besides,when compared with basic BTRM model,it has a 4.7%improvement.Finally,an example is given to show the effect of text retrieval model based on text segmentation network.The retrieval results contain more context information.
Keywords/Search Tags:Text retrieval technology, BERT pre-training model, Text segmentation, Context information
PDF Full Text Request
Related items