Font Size: a A A

Contextual Awared Multi-layer Information Retrieval Method Based On BERT

Posted on:2022-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LuoFull Text:PDF
GTID:2518306554482714Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of data production methods,a large amount of data is being generated from news recommendation platforms,e-commerce platforms,office automation systems etc.every moment.How to sift the contents that matches the user's information needs from the massive data has become a hot spot in the current information retrieval field.Nowadays,pre-trained language models have been successfully applied to information retrieval(IR).Since the BERT model can be trained in a large-scale corpus to obtain a universal embedding representation of words,it can provide richer information compared with the traditional bag-of-words model,and has become a basic building block in information retrieval tasks.Nevertheless,there are several limitations when applying BERT to the query-document matching task: 1)relevance assesments are applicable at the document-level,the tokens of documents often exceed the maximum input length of BERT.2)Applying BERT to long documents leads to a great consumption on memory usage and run time,owing to the computational cost of the interactions between tokens.This paper explores a novel multi-layer contextual passage architectual which based on BERT model to break the limits.The main work includes the following two aspects:First,passage-level summarization extraction.We utilize Maximal Margin Relevance algorithm which based on TF-IDF mechanism extract important sentence as the passage-level summarization,which ensure the high relevance and deduce the redundancy of information.Secondly,BERT based multi-layer contextual passage information retrieval.We first take the pasage-level summarization which extracted in first stage as the contextual evidence,and attaches with the document title and original text together compose the multi-layer contxtual passage architecture.Finally,we utilize the sentence pair classification task to predict the relevance score between query and passage.Experiments conducted on two standard ad-hoc retrieval collections from the TREC 2004 Robust Track(Robust04)and Clue Web09 with two different characteristics indivisually,experimental results show that: our method is generally better than the baseline models of the neural ranking models;compared with the other passage-level retrieval models,our method achieves the best results in all metrics,which shows that the use of contextual information of passage can significantly improve the precision of retrieval task.The experimental results verify the effectiveness of the existing BERT's multi-layer contextual passage retrieval method.
Keywords/Search Tags:Relevance Matching, Text Summary Extraction, BERT, Neural Ranking Models
PDF Full Text Request
Related items