Font Size: a A A

Research On Similar Problem Recognition For Question Answering System

Posted on:2020-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:C FuFull Text:PDF
GTID:2428330596995453Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,with the increasingly prosperous Internet technology,all kinds of information show explosive growth,and question answering system is no exception,accumulating a huge problem database.With the development of artificial intelligence in modern society,big data brings more and more benefits to people's work and life.Therefore,it is extremely important to provide accurate and reliable information resources for the users by processing these huge data information.Although modern search engines help people get a certain degree of relief from the messy information,it is not difficult to find that there is a lot of noise information in the real useful information.How to avoid the interference of these noise data and improve the quality and efficiency of users' searching information,question answering system plays a big role in this aspect.Question answering system is considered to be a higher-level retrieval system because it overcomes the shortcomings of the search engine's difficulty in understanding the user's intentions,and on the other hand it avoids the return of some erroneous results.Question answering system is often able to give simple,accurate and user-friendly answers to questions that users ask in natural language.Question answering system is an evolving research field that combines information retrieval,natural language processing,and deep learning techniques.It is divided into the free-text architectures and the question-answer pairs architectures.This thesis mainly studies the question answering system based on question-answer pairs architecture.By analyzing the questions submitted by the users,it matches the problem group database in the system,and then retrieves the most similar problem to the question submitted by the user.The best candidate answer is recommended to the user,which can improve the efficiency of the user to retrieve valid information.Therefore,in order to understand the user's query intent better and match the most similar question in the question answering system,it is especially important to calculate semantic similarity between problem pairs.The similarity calculation of text is generally measured and analyzed from the levels ofphrases,sentences,paragraphs and documents.This paper mainly researches the semantic similarity of problem pairs at the sentence level.Inspired by the application of Convolutional Neural Network in the field of image recognition,this paper constructs a deep learning model based on Siamese CNN to generate self-adaptive content information matrix,and proposes a method of combining self-adaptive affinity graph and prior knowledge affinity graph to form a two-channel affinity graph.The text affinity graph can express the neighbor relationship of the text sample.In this paper,the text can be converted into the vector form by the word embedding method,and the text similarity relation matrix is constructed to obtain the text affinity graph.The existing methods usually construct static affinity graphs.These methods rely on prior knowledge on the one hand,and it is difficult to obtain the optimal representation of sentence pairs on the other hand.In view of these shortcomings,this thesis proposes to learn better dynamic and updated affinity graph by using Siamese CNN.Through experiments,the accuracy of the model on the Quora and MSRP data sets are84.35% and 75.65%,respectively,and the F1 values are 79.98% and 82.97%,respectively.The experimental results achieve excellent performance,which proves the deep learning model proposed in this paper is feasible and effective in the identification and matching tasks of short text problem pairs.
Keywords/Search Tags:question answering system, natural language processing, word embedding, self-adaptive affinity graph, Siamese CNN
PDF Full Text Request
Related items