Font Size: a A A

Research And Implementation Of Question Answering System Based On Unstructured Text

Posted on:2020-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2428330575957078Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of mobile internet and big data,unstructured web pages and documents in various vertical fields have accumulated rapidly.As a high-level form of information retrieval,automatic question answering system based on unstructured text has gradually become a research hotspot in recent years by analyzing user's real intention and extracting clean and accurate answers from retrieved documents.However,there are still many problems in most open research works at present:1)In the Q&A scenario,the question and document length are seriously unbalanced.The lack of fine-grained semantic level similarity matching in the information retrieval module makes it difficult to meet the precise retrieval requirements;2)In the Chinese context,the mainstream machine reading comprehension model has not been fully verified,and there is room for improvement in performance;3)The current automatic question answering technology based on large scale unstructured texts is not good enough,and there are relatively few platforms in a vertical field.This paper focuses on the key technologies of document information retrieval and answer extraction in automatic question answering system based on the unstructured text,optimizes the algorithm and realizes the system.The main research work includes:(1)Proposed a semantic similarity matching model(Deep-HAN-Matching)which based on hierarchical attention mechanism to solve the problem of semantic similarity matching caused by the length imbalance between query and document in question answering system.The performance of WikiQA,a open dataset,is improved a lot than common baseline models by abstracting and extracting features layer by layer from word level and sentence level using attention mechanism;(2)Proposed a machine reading comprehension model(BiDAF-GCN-SelfAtt)based on gated convolutional neural network and self-attention mechanism,to solves the difficulty of context representation and interactive matching feature fusion in BiDAF when model the long text.On DuReader,the ROUGE-L and BLEU-4 are improved by 2.8%and 5.2%respectively compared with the baseline model;(3)Integrated the proposed algorithms and implemented an automatic question answering system based on unstructured text in the field of clinical medicine.Experiments proof that the two proposed models have good applicability in clinical medical labeling data sets.At the same time,the accuracy of Top1 in the test set of Clinical Medical Professional Examination in 2018 is significantly improved compared with that of the baseline system.
Keywords/Search Tags:unstructured text, automatic question answering, reading comprehension, information retrieval, attention mechanism
PDF Full Text Request
Related items