Font Size: a A A

Research On Key Technologies Of Single Data Source Open Domain Question Answering System

Posted on:2021-06-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:S J ZhongFull Text:PDF
GTID:1488306350978269Subject:Investment
Abstract/Summary:PDF Full Text Request
Automatic question answering plays an important role in computer science,and it is also an important research direction in the fields of information retrieval and natural language processing.Unlike search engines that simply retrieve and sort documents,automated question answering systems provide natural language expressions with more semantic meaning.The main task of automatic question answering is to understand and automatically answer questions raised by users,and to build an automatic question answering system that meets the user's retrieval and reasoning needs.As an accurate retrieval technology,the open domain question answering technology aims to provide people with a more natural and direct interface for information access.Open domain Q ? A is a method that does not limit Q ? A in a specific field.It can ask questions based on any field,and sometimes the answer is not limited.Its most significant feature is that users' questions are not limited to a specific field or application.However,large-scale question answering system relies on multiple data sources to answer the user's questions,which largely depends on the redundancy between information sources.In such a system,the machine's understanding of the problem usually assumes that a small piece of relevant text has been identified and given a reading comprehension model.This processing method may be better for processing structured text data,but it is unrealistic for open domain question answering system which dealing with unstructured text.The task of implementing automatic question answering from a single source of knowledge forces machine learning models to be very accurate so that answers can be accurately searched.This challenge encourages in-depth research on machine reading capabilities.In this paper,the key technology of a single data source open domain question answering system is the research target.The research focuses on key technologies such as information retrieval,query expansion,and machine reading comprehension in open domain question-and-answer tasks.And the main research contents are as follows.(1)A neural information retrieval model with conceptual hierarchy is proposed.Deep learning technology itself cannot directly act on the data source for retrieval,but based on providing the data source,it answers user questions through machine reading comprehension.Before machine reading comprehension,it is necessary to build a basic data set that covers possible answers so that subsequent reading tasks can be carried out on this data set.Therefore,this paper analyzes the related work of information retrieval,and makes clear that the development trend of temporary retrieval technology is based on deep neural network,and the main work is semantic analysis of query and document.On this basis,a neural information retrieval model with staggered layer recognition ability is proposed,which has feature graph component,attention component,aggregation component and distance loss function component.Experiments on the well-known datasets SQu AD,Wiki QA and Tree QA prove that the model has certain competitiveness in the field of temporary information retrieval.(2)A neural pseudo-relevance feedback framework for query expansion is proposed.Expansion of the query is an effective way to improve the efficiency of information retrieval.Query expansion technology aims to add new query terms to the query.Compared with the initial query,the extended query is expected to provide better search results.Relevance feedback requires users to point out which documents are related in the original query.This method is one of the most reliable query expressions.However,it is difficult to obtain the real relevance feedback in the real environment.Therefore,in practice,automatic query expansion technology that does not require user participation feedback is particularly important.Pseudo-relevance feedback technology is an efficient strategy that can improve retrieval accuracy without user intervention.However,the existing neural information retrieval model does not have a mechanism to deal with the extension words which are different from the original query words,that is,a neural information retrieval model can only match the relevance feedback method of the model,and can not use the pseudo relevance feedback method of other neural information retrieval models.It is not easy to combine the new neural information retrieval model with the existing pseudo relevance feedback methods.Therefore,this paper proposes a neural pseudocorrelation feedback framework based on the analysis of related work on pseudocorrelation feedback.The framework can use unsupervised or supervised information retrieval system to provide initial ranking documents,and then combined with other different neural information retrieval models,the similarity score can be calculated by extracting document interaction and merging document interaction,so as to be compatible with different neural information retrieval models and realize query expansion.Experiments on SQu AD and Wiki QA prove the effectiveness of the framework.(3)A multi-hop reading comprehension model with co-encoded questions and answers was proposed.Machine reading comprehension is a task to test the machine's understanding of natural language,and it can also be regarded as an extended task of automatic question answering system,which plays an important role in the field of natural language processing and automatic question answering.Therefore,after analyzing related research work on machine reading comprehension,the paper believes that machine reading comprehension technology based on deep learning shows its superiority in obtaining context information,and its performance is obviously better than traditional rule-based machine reading comprehension technology;multi-hop reading comprehension that by providing multiple documents as a resource,there is more evidence for answer prediction,so even if the question is complex,the model can give a good answer;multi-hop reading comprehension is one of the mainstream development directions of future open domain question answering systems.On this basis,a multi-hop reading comprehension model with co-encoded questions and answers was proposed for a single data source open domain questionand-answer task.The model uses the top-ranked documents and questions given by the information retrieval system as initial inputs.The final answer is given by input embedding,input encoding,multi-hop inference,and output decoding.An open domain question-and-answer experiment on Wikipedia was performed,which proved the effectiveness of the proposed model;the performance of the single-headed reading comprehension was evaluated on SQu AD,which proved that the proposed model has a certain competitiveness;Multi-hop reading comprehension performance was evaluated on Wiki Hop and proved the performance of the proposed model.
Keywords/Search Tags:Open-domain, Question answering system, Deep learning, Information retrieval, Reading comprehension
PDF Full Text Request
Related items