Font Size: a A A

Research On Key Technologies Of Open-Domain Question Answering

Posted on:2020-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2518306548995879Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Open-domain Question Answering is a challenging task of Natural Language Processing(NLP).Based on corpus like Wikipedia,it is supposed to extract the answer span for a given question,through information retrieval and other text pro-cessing,and has become a hot topic in recent years.This paper mainly focus on neural network model research on the factoid ques-tion open-domain question answering,including the following innovations:Firstly,ranking model in open-domain question answering is supposed to cal-culate the relevance score between question and candidate paragraphs.And there are two problems in current ranking models:(1)Most models are built up on the basic of word embedding,lacking of semantic information integration on sentence level.(2)The forming function of paragraph representation cannot capture enough semantic information.To tackle with the above problems,this paper proposes a Sentence-Based Semantic Matching Ranker(SBSMR)for open-domain question an-swering.The main ideas are:(1)Replacing the word embeddings with sentence embeddings,to strengthen both the information integration and information in-teraction in sentence level.(2)Innovating the aggregation function,to model the weight differences among sentences in semantic expression.The experiments has proved that,SBSMR model has solved the current problems to some extend.Eval-uated on open-domain public datasets QUASAR-T and SearchQA,SBSMR ranker model achieves 11%and 17%improvements in recall performance respectively.Be-sides,we build up the open-domain QA pipeline with SBSMR,and receive 14.0%and 24.4%improvements in the overall QA performance.Secondly,current ranking model and QA system cost a long time and a large memory for prediction.Thus,we consider to compress the sentence embedding to reduce the memory cost and to keep the performance as the same time.There are two problems in current compression methods:(1)The neural network com-pression model faces significant performance degradation when the dimension com-pressed a lot.(2)Traditional dimensionality reduction methods ignore the context semantic information integration,resulted in information lost.To tackle with the above problems,this paper proposes an end-to-end two-level sentence embedding compression model.The main idea is using neural network compression model to deliver the fist-level compression and information integration.And using PCA to deliver the second-level compression.The experiments has proved that our model is able to relieve the above problems.In the experiment,SBSMR ranker which uses the compressed sentence embeddings can keep more than 95%performance on Quasar-T dataset when the dimension of sentence embedding is compressed into 12.5%.What's more,on the SentEval,the sentence vector coding ability evalua-tion dataset,our two-level coding compression method also achieves satisfied overall performance.
Keywords/Search Tags:Open-domain QA, SBSMR, Aggregation Function, Sentence Embedding, Dimensionality Reduction, PCA
PDF Full Text Request
Related items