Font Size: a A A

Research And Application Of Convolutional Neural Network In Question Classification

Posted on:2019-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:X JiFull Text:PDF
GTID:2438330563957655Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Question Answering System is a new generation of search engine,it can accurately retrieve the answer to the user,better meet the user query requirements.Question classification play a very important part of role in the question answering system.Its performance directly affects the accuracy of the answer extraction in the later period and the performance of the whole question answering system.Because of the short character of question,the method of text categorization is used directly to model the question sentence.Training model directly with data sets,this method is very low when the question category is very large.Because the data determine the upper bound of the accuracy of the training model,and the optimization model can only approach the upper bound.Therefore,this paper starts from two aspects of data and model to improve the accuracy of the training model.This studies how to increase the sample information under limited data samples,and how to design a convolution neural network model to ensure the accuracy of the classification,and to maintain the generalization ability of the model.The main achievements of this article are as follows:1.From the data,the key position of the question is extended by the synonyms,and the synonyms are the words that the users may often ask.Multiple positions are extended for the synonym,and the Cartesian product is finally carried out.Training the expanded training samples and then using the traditional machine model.The experiment shows that the accuracy of the model trained with the replacement data set is greatly improved than that before the replacement.2.In the feature extraction of questions,the distributed representation model of Skip-Gram words is used to train the word embedding of each word.Then the word embedding of each word of the question is combined into the form of a two-dimensional matrix,which is used to represent the distributed feature of the sample.Then a convolution neural network structure is designed to classify questions.In order to reduce the complexity of the model,only a coiling layer and a pool layer are used in this paper.Compared with traditional machine learning methods,including support vector machines,random forest,logistic regression,and so on,the accuracy rate has been improved.3.In this chapter,sentence structure information is introduced into the training of the convolution neural network model.The main features of the sentence are extracted from different sections by subsection pool operation.The dropout algorithm is added to improve the generalization ability of the model and prevent the model from overfitting.The experimental results show that the segmentation pooling method adopted in this paper can increase the accuracy of the model when the dropout algorithm is added.Finally,the accuracy rate reaches 85.1% on the 57 classification data set of the bank.
Keywords/Search Tags:Question classification, The expansion of synonyms, Word embedding, Distributed feature, piecewise-pooling, convolutional, neural network(CNN), Dropout algorithm, natural language processing(NLP)
PDF Full Text Request
Related items