Font Size: a A A

Question Classification Based On Deep Learning

Posted on:2017-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:X P ZhouFull Text:PDF
GTID:2308330503987195Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Question classification(QC) has a significant role in automated QA systems. QC is to label a question into a class which represents the answer type. The selected question type can be used for filtering the candidate answers.In recent years, more and more researchers pay attention to the research of QC. In the early period, the main method of QC is the rule-based approach, and later, researchers turn to the machine learning approaches which produces better result than the rule-based one which isn’t universal. The support vector machine(SVM) and maximum entropy models(ME) are the most popular method in question classification. Now, the classifier model based on machine learning has some shortcoming, for example, the traditional machine learning methods(SVM, ME) deal with the fixed-length data, while the length of sentences is variable, so information would lose when changing a variable-length sentence to a fixedlength vector; and now, no work have cared about the domain information in classifier, the gap between the difference domain will degrade the performance of the classifier.For the above problems, we raised a new method based on DNN for QC. Firstly, we propose a deep neural network classification model based on the feature fusion, in which we apply the unigram words feature, the part of speech feature and term weight feature. In the model, the input of the DNN network is the word embeddings which is generated by the fusion of the word feature vectors, and the hidden layer is consisted of many LSTM units so as to address the problem of sentence length. The sentence feature vector extracted from the outputs of the hidden layer in the pooling layer is classified into a class by the softmax layer. Experiment results show that by using a variety of features, the proposed classifier can achieve a better score, the coarse and fine accuracy rate can reach 94.0% and 88.2% respectively.Secondly, we explore the domain adaptation in DNN for QC. In the domain adaptation model, our training data also contain an unlabeled corpus, and we predict the domain label for every sentence in the train set so as to reduce the domain information included in the sentence feature vectors in the training process. The coarse and fine accuracy rate is increased by 0.4% and 1.2% by applying the domain adaption in the classifier.
Keywords/Search Tags:Question classification, LSTM, feature fusion, domain adaptation
PDF Full Text Request
Related items