Font Size: a A A

Study On Feature Selection In Chinese Question Classification

Posted on:2012-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:C GaoFull Text:PDF
GTID:2178330335490693Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Traditional search engines require users to submit keywords in the form of the query, and then return a list of documents relevant to the query. Question Answering (QA) allows the user to ask questions in form of natural language, and direct return a simple, correct answer. The question classification (QC) plays an important role in enhancing the overall performance of QA system since it provides semantic restrictions and constraints for answer extraction and selection. Almost all of the QA system has QC process in the stage of question understanding.At present, the research of QC is mainly concentrated in two aspects: First, rule-based method, to determine the type of the question by rules. Another way is to achieve the QC by statistical methods. Choosing the features which are the most able to express the meaning of the whole question, through a statistical learning by the marked real corpus, to build a learning model and to achieve the type of questions classification. So the selected good features to describe the Chinese question are the key factors in QC. General, in Chinese QC, there are some useful features including bag-of-words, part of speech, named entity, cue words and dependency relations. The main contribution of this paper has three aspects:First, analyze the method and effect of fundamental features selection in Chinese QC. Study on using bag-of-words feature as the foundation feature in QC, and analyze the contribution of the other features in Chinese QC. Found in linear kernel function, bag-of-words feature and cue words features with a good performance.Second, experiment binding and the combination various features in Chinese QC. We proposed the way of binding other features and bag-of-words feature, proving the outstanding performance of combined features bag-of-words, WSD, relation and parent, binding features bag-of-words and part of speech, named entity, dependency relations.Third, proposed the algorithm of feature extraction the subject, predicate, object, interrogative and the interrogative related words based on dependency relations. Then using the quintuple to carry on the experiment in QC and proving the contribution on Chinese QC. Finally, according to the contribution of features in the classification group, using heuristic selection method select quintuple, WSD, relation, Word/parent and bag-of-words reached classification accuracy 84.376%. And this is a best accuracy in HIT-IR Question Set.
Keywords/Search Tags:Chinese question classification, SVM classifier, Feature selection
PDF Full Text Request
Related items