Font Size: a A A

Chinese Question Classification, Based On Semi-supervised Learning

Posted on:2011-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhaoFull Text:PDF
GTID:2208330332977720Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Question classification, an important component in Question Answering System, is the foundation and core that Question Answering System to handle, classification accuracy has a direct impact on the performance of Question Answering System.At present, many studies focused on questions with a supervised learning classification, and have achieved fairly good results. However in reality applications, a large number of sample data manually marking is costly, the paper undertakes a study of question classification basing on semi-supervised, and achieved the following results:Firstly, the proposed feature extraction method for question classification. Feature vector used to express questions with characteristic, high-frequency keywords selected from the corpus, domain vocabulary and questions about the word as the feature vectors of the feature items, through the analysis of syntactic dependencies, extracted backbone word of questions, calculated the similarity of words and feature items using semantic similarity calculation method, in order to get feature vector and construct question feature vectors.Secondly, the proposed questions with classification method based on semi-supervised learning. This method is based on the questions with feature extraction, using Co-forest learning algorithm with collaborative training, use the marked sample questions to mark sample questions with unmarked samples, and select the new marked questions with high confidence to add to the marked sample questions, that build classification model. This method is used in 5 coarse types and 23 fine types experiments of Yunnan traveling domain, the result suggests this method is accurate than supervised learning method by increasing 8.28 percent and 1.19 percent respectively, the results show that the proposed method can improve the classification accuracy of question by using unmarked samples effectively.Finally, Chinese Question classification prototype system is designed and realized basing on questions corpus in Yunnan traveling domain. And classifier is evaluated on the base of experiment.
Keywords/Search Tags:Question classification, word semantic similarity, marked samples, unmarked samples, semi-supervised learning, collaborative training, Co-forest algorithm
PDF Full Text Request
Related items