Font Size: a A A

Research On Question Classifying Of Chinese Question Answering System Based On Bayesian Classification

Posted on:2011-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2178330332485472Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The first task in the analysis of questions is to classify them. The process description is to transform the questions into data features, design classifier, and take the output of classifier as the class label. The major challenge in this dissertation is feature extraction and the design of the classifier. This dissertation intends to improve Bayesian classification and then use it in classified Chinese questions. It selects Bayesian classification as the research direction and proposes a method for feature extraction and two improved models based on the characteristics of Chinese questions.(l)Extract the result of the syntax analysis as the features of questionsSyntax analysis belongs to superficial semantic analysis that fills in the lack of lexical information in questions. By extracting the main part of sentences, question word and the subordinate components from the result of syntax analysis, combining the time words in the sentence and conducting naming entity recognition of unknown words, the advantages of feature vectors can be improved.(2)Incremental Semi-naive Bayesian classification model based on fuzzy feedback mechanismsBased on the uncertainty in active selection strategies in incremental learning, this dissertation enters the samples into the training set whose posterior probability approximate l/n (n is the number of classes) and puts forward the incremental learning model based on fuzzy feedback mechanism. This dissertation uses Semi-naive Bayesian model as the basic model and the Dirichlet distributive features of parameters to estimate the expected value of the posterior probability parameter. The membership function is selected with unitary principle and the threshold is determined with heuristic method.(3) x2-IDF Weighted Bayesian ModelThe starting point for weighted Bayesian model is the maximum Bayesian posterior hypothesis. For different sentences the different characteristics provide various information for the classification of questions.Combining the characteristics of chi-squared statistic, x2-IDF assessment function is introduced to weighting formula of the characteristics to calculate the contribution to classification by every characteristics. The calculation of characteristics is about the posterior probability which can more reasonably interpret the maximum posterior hypothesis.This dissertation uses the questions set from Natural Language Processing Lab from Harbin Institute of Technology to construct and test the above models and then provides analytical data, conclusion and assessment for the experiments. The comparison of experiment results show that the improve Bayesian model has improved the precision of classifier.Finally this dissertation gives a summary and points out the direction for future research.
Keywords/Search Tags:Question and Answering system, Bayesian classification model, classified Chinese questions, analysis of dependency syntax, incremental learning, fuzzy set, feedback mechanism, weighted Bayesian classification model
PDF Full Text Request
Related items