Hashing Based Open-Domain Question Answering

Posted on:2021-02-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y Jiang

Full Text:PDF

GTID:2428330647950739

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Open-domain Question Answering is a task which uses natural language to answer general-domain questions.It is one of the core problems in information retrieval and natural language processing.Most existing studies divide this problem into a few stages,including document retrieval,document ranking and machine reading comprehension.Document retrieval tries to retrieve documents relevant to the question from a large text corpus.Document ranking tries to re-rank the retrieved documents by the document retrieval subtask.Machine reading comprehension tries to extract the final answer from the re-ranked documents by the document ranking subtask.In recent years,self-attention based pretraining models are widely used in open-domain question answering,which also bring high computing and memory cost.This paper applies hash learning to different stages of open-domain question answering.Three innovative contributions are outlined below:Existing information retrieval methods mostly use TF-IDF or BM25 algorithm.These algorithms are based on direct keywords matching,which cannot capture semantic information.To solve this problem,this paper proposes a hashing based query expansion model(HQE),which rewrites the question and improves the effiency of query expanding via hash learning.Experiments show that HQE model is able to obtain higher recall in multiple datasets,compared to existing approaches.Document ranking models which use pretrained self-attention networks as their encoders have computing effiency and memory cost issue.We propose a hashing based passage re-ranking(HPR)model,which learns the binary matrix representation of each candidate document.When used for online prediction,the model stores the matrix in the memory to prevent recalculation,which also reduces the memory cost.Experimentson three datasets show that HPR outperforms existing models and achieves the stateof-the-art performance.Existing reading comprehension models mostly use pretrained self-attention models to get the contextual semantic representation of documents and questions,which also have computing effiency and memory cost issue.Considering other candidate documents while predicting a document's answer can help improve the model's performance,but brings much more memory cost.To tackle this problem,this paper presents a hashing based multi-document reading comprehension model(HMRC),which predicts the answer by multiple iterations.HMRC learns the binary representation of the candidate documents to reduce the memory cost.Experiments on three open-domain QA datasets show that our model achieves the state-of-the-art performance.

Keywords/Search Tags:

Question Answering, Information Retrieval, Machine Reading Comprehension, Learning to hash

PDF Full Text Request

Related items

1	Automatic Question Answering Method Based On Retrieval And Machine Reading Comprehension
2	Research On Machine Reading Comprehension And Its Application In Question Answering System In The Field Of "Four Insurances And One Housing Fund"
3	Research On Deep Learning-based Multi-document Passage Ranking Methods For Question Answering System
4	Research On Machine Reading Comprehension And Textual Question Answering
5	Research On Open Domain Machine Reading Comprehension Technology For Information Retrieval
6	Research And Implementation Of Chinese Intelligent Question Answering Technology Based On Machine Reading Comprehension
7	Research On Machine Reading Comprehension Model For Question Answering System
8	Research Of Machine Reading Comprehension For Open-Domain Question Answering
9	Research On Particular-form Oriented Machine Reading Comprehension
10	Design And Implementation Of Question Answering System Based On Machine Reading Comprehension