Font Size: a A A

Research On Answer Extraction Technology In Intelligent Question Answering System

Posted on:2021-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:2428330611488268Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasing development of internet technology,the online Q&A community is gradually becoming a popular information sharing and acquisition platform for users.Users can obtain information that meets their needs from other users' answers in the form of asking questions or querying similar questions.However,the quality of the answers provided by users varies greatly,and it is a very challenging problem in the question-and-answer community research to extract high-quality answers from many answers,filter semantic repetitions,and generate correct word order answer clauses.Combining with a safety engineering research institute to build an intelligent question and answer system,it is urgent to automatically build research needs for question and answer pairs in the chemical industry.This article takes the largest and most authoritative Haichuan Chemical Forum data in the chemical industry as the research object,mainly from the following three aspects.Research on answer extraction technology in intelligent question answering system:(1)The characteristics of users participating in community Q&A can represent rich auxiliary data in the prediction of answer quality.In this paper,a heterogeneous information network for user questions and answers in the Haichuan Chemical Forum is constructed.Two metapaths are designed,and the user vector representation of the community is extracted using the Note2 vec algorithm.The ablation experiment proves that the heterogeneous information network of the community forum constructed in this paper is effective for the division of Haichuan Chemical Community.(2)Selecting a small number of high-quality answers from the many answers to the forum questions as a data set for answer extraction can improve the accuracy of the answer extraction algorithm.However,there are a lot of answers in the Haichuan Chemical Industry Forum that have no integral evaluation,and the quality of the answer cannot be judged directly based on the integral.Therefore,this paper builds a forum answer quality prediction algorithm for the chemical industry.First of all,the question and answer pairs are combined with the thesaurus in the chemical industry,and the text vectors of questions and answers are generated using the weighted word vectors based on TF-IDF.Then fusion,question and answer to the text vector representation,text static feature,user vector representation,user static feature a total of four dimensions of information,the use of factor decomposition machine(FM)algorithm to train the chemical industry forum answer quality prediction model.Experiments show that the quality prediction algorithm of the forum for chemical industry constructed in this paper is superior to the prediction model constructed by LSTM and Wide Deep in evaluation indexes such as MSE,EVS,and accuracy.(3)To address a large number of descriptive questions in the Haichuan Chemical Forum,a single candidate answer cannot cover all aspects of the question at the same time,and there may be semantic redundancy between different candidate answer clauses.This paper proposes a community forum answer extraction algorithm.First,multi-dimensional features and FM algorithm are used to construct a candidate answer clause selection model,and the candidate clauses with low quality are screened.Then use the improved clause quality evaluation formula to filter semantically redundant clauses,and then use the FM algorithm to build a context prediction model between the two clauses.Finally,the genetic algorithm is used to search for the optimal clause ranking sequence.Experiments show that the answer extraction algorithm constructed in this paper can better complete the answer extraction task in the Haichuan Chemical Industry Forum.
Keywords/Search Tags:answer extraction, answer quality evaluation, heterogeneous information network, user representation learning, sentence ordering
PDF Full Text Request
Related items