Font Size: a A A

Chinese Question Classification Based On Semi-Supervised Learning

Posted on:2017-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2348330491450436Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Question classification is an important component of the question answering system,the accuracy of question classification directly affects the performance of the question answering system.For the current question classification research, the main question classification method is based on supervised learning, and it also gets good classification results. However, the question classification method based on supervised learning, just uses the labeled data samples, while ignoring the value of unlabeled data samples.This thesis focuses on Chinese question classification method based on semi-supervised learning, in which the Tri-training algorithm is adapted,and by doing some modification in order to fit for the Chinese question classification.The main contents are listed as the following.Firstly, for the original Tri-training algorithm,each training set is formed from the labeled samples by a random sampling method. This sampling method will lead to the imbalance of the number of samples, and thus affects the accuracy of the classification. In order to improve this sampling method, each type of data is extracted, and then separately form the three training sets. By this way can guarantee the balance of training data, also can guarantee the diversity among classifiers, further improve the classification accuracy.Secondly, for the original Tri-training algorithm, when the classification results are not same, the default classification result is given by the first classifier, which may reduce the accuracy of a classifier in this case. Therefore, a new voting algorithm called often-excellent is proposed to avoid the results of the first classifier is given as a result of the classification model of the classification results of this one-sided situation.Finally, based on the improved Tri-training algorithm, with the Chinese questions set provided by Harbin Institute of Technology, and the ones acquired by manually from the Internet, several experiments are conducted for question classification.Compared with the original Tri-training algorithm,the performances of Chinese question classification gets more improved.
Keywords/Search Tags:Question classification, semi-supersied learning, Tri-training algorithm, sampling, voting mechanism
PDF Full Text Request
Related items