Research And Implementation Of Full-text Retrieval Combining Word Matching And Context Interaction

Posted on:2022-12-31

Degree:Master

Type:Thesis

Country:China

Candidate:Z Wu

Full Text:PDF

GTID:2518306761959519

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

Information retrieval is a comprehensive discipline that has attracted much attention in the industry.In recent years,the rapid development of Internet scale and information resources has brought people the problem of information overload,and people are becoming more and more dependent on information retrieval.Domestic and foreign technology companies have developed their own full-text search engines,such as Baidu,Google and so on.These full-text search engines have reduced the cost of accessing effective information for everyone,and are becoming essential tools for people to filter and browse information.The goal of a full-text search engine is to filter out what users want from massive amounts of information in a short time.Full-text retrieval generally consists of two ranking steps: rough generic ranking and re-ranking.Using a simple and high recall ranking algorithm to initially filter out relevant documents from a large collection of documents,and then using one or more re-ranking methods to improve retrieval accuracy.In order to improve the accuracy of retrieval,many studies have been devoted to applying deep neural network models on re-ranking tasks of information retrieval.Experiments show that these deep neural network models achieve better performance in re-ranking,especially the pre-trained language model,which achieves the current best results on various ad-hoc retrieval benchmarks.However,the computational complexity of the pre-trained language model is quadratic with respect to the input sequence's length,when applied to ad-hoc ranking tasks,the pre-trained language model is usually only used to predict the relevance of paragraphs or individual sentences.Making pre-trained language models perform well on document-level data with limited computational cost is the key to full-text retrieval.In order to improve the retrieval accuracy without compromising the retrieval efficiency,this paper combines the traditional word matching algorithm TF-IDF with the computational idea of Vector Space Model,and proposes an improved solution for the contextualized late-interaction model Col BERT: Filters were introduced in the Col BERT model to extract the citation items with higher differentiation in the query items,and modified the way of interaction calculation to enhance the degree of relevance matching between query items and passages based on semantic matching.Passage retrieval experiments and analyses were conducted on three public datasets to verify the effectiveness of this improved scheme.In order to aggregate sequential signals between passages with full semantic understanding,this paper imitates the human reading behavior from front to back,and introduces a Gated Recurrent Unit as a feature aggregator based on the above passage retrieval model.After the query interacts with each passage and obtains the interaction feature representation,the interaction feature representation of all passages is aggregated as the interaction feature representation of the whole document using the aggregator,and the matching score between the query and the whole document is further calculated.The experimental results show that this method can effectively aggregate the sequential signals between passages,which enables it to perform well in full-text retrieval.To verify the practicality of the above full-text retrieval model,this paper constructs a high accuracy full-text search engine based on the model by independently encoding queries and documents into two sets of contextual embedding and index the documents offline.

Keywords/Search Tags:

Information Retrieval, Passage Retrieval, Full-text Retrieval, Search Engine, Pretrained Language Model

PDF Full Text Request

Related items

1	Research And Implementation Of A Chinese Full-Text Information Retrieval Technology Based-on Lucene Search Engine
2	Research On Information Retrieval Language Under The Conditions Of Network
3	Research On Full-text Information Retrieval Technology For We Chat Content
4	Research Of Search Engine Key Technique And Optimize Performance
5	The Applied Research Of Information Retrieval Technology In Oilfield Information Net
6	Passage Retrieval System Based On Language Model
7	Research And Implementation Of An Open High-Performance Platform Of Full-Text Retrieval
8	Research And Application Of Full-text Retrieval Technology Based On Lucene
9	Research On Information Retrieval Models Based On Statistical Language Model And Passage Feature
10	Research On The Unified Platform For Access And Full Text Retrieval In Open Access Journals