Font Size: a A A

Research On Question Answering Oriented Question Classification And Answer Extraction

Posted on:2014-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:A ZhangFull Text:PDF
GTID:2268330425491833Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Artificial Intelligence, Information Retrieval and Natural Language Processing, Question Answering(QA) had made considerable progress. Espe-cially, the research on QA has gained great improvement since QA evaluation task was organized by TREC. Mostly evaluation tasks are English oriented, and the QA data sets are mostly English. Relatively, Chinese QA evaluation is not popular, and the data sets which are Chinese are also rare. The research on QA in Chinese is relatively backward because of these reasons. We use online search engine to achieve answer retrieval, and the mainly word of this thesis is question analysis and answer extraction.At the part of question analysis, we present a new method which is based on words composing and categories of questions to build the stop-list. This method composes n adjacent words as phrases, and chooses stop-words from the phrases. During this process, class labels are concerned. It repeats until n decreased to0. This method performs well on our data set.Then, we present a new feature extraction method called TFC-ICF, which is based on categories of questions. TFC-ICF has the same basic idea with TF-IDF. This method considers both the ability of a word that identifies a category and the distribution of this word in all the categories. This can improve the quality of features. SVM classifier is used in this paper. The precision rate can reach80.45%while using TFC-ICF. We consider it as the baseline. In order to improve the precision of classification, we present3ideas: manual-choosing feature, keyword extension based feature and semantic based feature. And we try different ways while using features in the last two methods. The precision rate can achieve86.01%、85.14%and82.13%respectively, when combining TFC-ICF with these methods.At the part of answer extraction, we firstly discuss how to construct candidate-sentences set while using sentence similarity method based on support vector model. Then, entity recognition is implemented according to questions’ categories. Finally, we introduce how to extract answer list based on sentence similarity score and entities. We achieve good result on CLQA test set of NTCIR5.We achieve some result on question classification and answer extraction, but there are also some weakness in our work. For example, the quality of the question set can be better, and there are room for improvement about the performance of entity recognition and answer extraction. In the future, our work’ll concentrate on these parts.
Keywords/Search Tags:Question Answering, Question Classification, Feature Selection, AnswerExtraction, Entity Recognition
PDF Full Text Request
Related items