Font Size: a A A

Research On Open Domain Machine Reading Comprehension Technology For Information Retrieval

Posted on:2023-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:K SuFull Text:PDF
GTID:2558306623997149Subject:Engineering
Abstract/Summary:PDF Full Text Request
OPen-domain Machine Reading Comprehension(OPMRC)can greatly improve the efficiency of people’s information acquisition because it can directly return answers in natural language instead of a list of related documents.It is considered to be a new generation of Human-computer interaction key technologies in the field of Information Retrieval(IR)after search engines.At present,OPMRC can be divided into retrieval-reading two-stage method and end-to-end method.The end-to-end method trains the retriever and reader jointly,which simplifies the system structure.But its training mechanism requires a large amount of labeled data,and its application is limited due to the high cost.The retrieval&reading two-stage method includes two parts,an answer-related document retriever and a reader.Because it is relatively flexible,and the state-of-the-art retriever and the state-of-the-art reader can be selected in the system to optimize the results,this paper chooses the retrieval-reading two-stage method.The main work of the paper is as follows:(1)Proposed a ranking method PWFT-BERT that integrates the Learning to Rank and the pre-trained language model.In the retrieval stage,current retrieval models with better performance often trade space for time,which requires a lot of memory to store the index to achieve fast retrieval.To deal with this question,this paper considers training an efficient retriever with low cost.A ranking method that combines the Learning to Rank method and the pre-trained language model,PWFT-BERT,is proposed,and a pseudo-negative sample fast generation algorithm is proposed to obtain the training data.Applying PWFT-BERT to the list of documents recalled by retrieval algorithms such as IF-IDF or BM25 can cleverly balance retrieval speed and precision without consuming a lot of machine memory.Finally,the test results on WSDM-Digg Sci 2020 dataset verify the effectiveness of the proposed algorithm.(2)Proposed a knowledge-enhanced long-document machine reading comprehension model KLMRC.In the reading stage,current MRC models have two problems: the inability to effectively utilize external knowledge and the inability to effectively process long documents.To solve these problems,this paper has designed a model,KLMRC,that can easily and quickly integrate external structured knowledge while processing long documents.In order to effectively integrate external knowledge,the model firstly performs relevant knowledge retrieval in the matching layer,and then uses a tree structure to splicing triples on the original text in the knowledge aggregation layer.Second,the model uses global attention and soft position encoding in the encoding layer to encode the text including external knowledge,and it transfers word embeddings into the interaction layer,where the question and the knowledge-enhanced text are exchanged for information.Finally,the output layer predicts the probability that each word is the starting or ending position of the answer,and extracts the answer accordingly.The experiment verifies the superiority of KLMRC compared to the baseline method on Du Reader2.0,which is a typical Chinese long document reading comprehension dataset.(3)Designed and implemented a Chinese open-domain QA system for massive documents.Finally,this subject applies the algorithms proposed in the above two stages into a Chinese open-domain question answering system.Different from using a search engine for information retrieval to return a list of documents,the system directly returns answer and displays the source of the answer,which not only effectively improves the efficiency of information retrieval,but also ensures the credibility of the retrieval results.
Keywords/Search Tags:Information Retrieval, Open Domain Question Answering, Machine Reading Comprehension, Retrieval Ranking, Knowledge Enhancement
PDF Full Text Request
Related items