Font Size: a A A

Studies On Question Classification Technology In Chinese Question Answering System

Posted on:2016-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiangFull Text:PDF
GTID:2308330470479889Subject:Natural language processing
Abstract/Summary:PDF Full Text Request
Question answering system is more intelligent than the traditional search engine, it does not require the user to enter the keywords, it is able to understand a simple question which is in natural language, and return the exact answer instead of the relevant documents or webpage. Question answering system mainly includes three modules: Question understanding, information retrieval and answer extraction. When the user input a question, first of all, the system will understand the user’s purpose by question classification and determine the conditions that meet the answer, and then search out relevant information from a large scale of network, finally the answer extraction module sorts out the precise answer according to the constraints of the question classification. Question classification is an important part of question answering system, it can not only reduce the range of the candidate answer, but also can determine the answer extraction strategy, then improve the accuracy of the system returns the answer. Question classification includes: word segmentation, extract stem, wipe off stop words, feature extraction and multi class classification. This article focuses on studying the methods and techniques of the question classification, finally determine to realize automatic question classification by machine learning method.Due to the data of high scale, high correlation, and the characteristics of nonlinear, the key to improve the generalization ability of question classifier is how to extract the essence and internal characteristics of the original data. A novel method using random forests and the combination of support vector machine(SVM) for feature selection. This method is an advanced approach of current method that selected all feature and based on the bag of words and sequence of words of feature extraction. It also improved the algorithm of SMO in one-against-all segmentation based on binary tree support vector machine. Experiments show that, these methods can effectively select the classification feature, the classification accuracy rate can reach 87.18%.The main results of this paper are as follows:1) Build a training set(entertainment), compare the accuracy of different methods of question classification through the experiment.2) Question classification and feature selection,put forward three different depth feature selection method,The experimental results show the combined method of Random forest and Support vector machine is the most effective one.3) Using the one-against-all segmentation binary tree support vector machine(SVM) classification method, and combined with the feature selection strategy that put forward in this paper, realize the automatic question classification about the aspect of entertainment questions.
Keywords/Search Tags:Machine Learning, Feature Selection, Random Forests, Support Vector Machine, Binary Tree
PDF Full Text Request
Related items