Font Size: a A A

Research And Implementation Of A Question-and-answer System Based On Unstructured Text In A Restricted Domain

Posted on:2022-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:F XuFull Text:PDF
GTID:2518306335986829Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Traditional search engines are an important channel for users to retrieve information.Users type in a question and return to a collection of web pages,which requires them to quickly browse the pages to locate the answer,a process that is time-consuming and laborious.Natural Language Processing(NLP)based question and answer systems are an important improvement over traditional search engines,as they allow users to get answers quickly and save time.Compared to structured systems such as knowledge graphs,unstructured text data is larger and easier to obtain,and the technology for open unstructured text-based systems is not yet mature enough.However,there are still many problems in the current research work:(1)The serious mismatch in the document length between the question and the unstructured text,which leads to inefficient and inaccurate answer extraction;(2)The shortcomings of the current mainstream baseline model in the document context encoding and interactive feature fusion layer.Therefore,this paper proposes an algorithm optimization for answer document retrieval and answer extraction based on a restricted-domain unstructured text questionand-answer system,and designs a question-and-answer system in the field of analytical chemistry by fusing the proposed algorithms to verify the optimization effect of the two algorithms in a practical application scenario.The main contents of this article are as follows:(1)When searching for answers,it is difficult to match the suitable length of answer,causing by the length of the text is too long.This paper proposes a text classification algorithm based on the Glove word vector model combined with SVM for addressing the problem.Answers document that is close to the real semantics was selected by classifying the longer texts.Then the similarity of the document was calculated.The performance is significantly improved on the TREC-QA public dataset compared to that before optimization.(2)In order to address the difficulties of the current mainstream baseline model in document context encoding and interaction matching feature fusion,this paper proposes a bi-directional neural network based on Bi-LSTM to improve the context encoding and improve the machine reading comprehension of the matching fusion features to add attention mechanism,introduce BERT pre-training vectors in the encoding input layer,and use the BERT pre-training vectors in open Chinese machine reading.Comprehension dataset Du Reader and baseline model on the test comparison,the experimental effect improved significantly.(3)In the field of analytical chemistry,this paper integrates the proposed algorithms to implement a question-and-answer system for unstructured text-based analytical chemistry,which verifies the optimization effects of the two proposed algorithms on practical application scenarios,and the experimental results show that the results are improved.
Keywords/Search Tags:Unstructured text, question and answer system, answer selection, answer extraction, machine reading comprehension
PDF Full Text Request
Related items