Research On Question Answering Oriented Question Classification And Answer Extraction

Posted on:2014-10-20

Degree:Master

Type:Thesis

Country:China

Candidate:A Zhang

Full Text:PDF

GTID:2268330425491833

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of Artificial Intelligence, Information Retrieval and Natural Language Processing, Question Answering(QA) had made considerable progress. Espe-cially, the research on QA has gained great improvement since QA evaluation task was organized by TREC. Mostly evaluation tasks are English oriented, and the QA data sets are mostly English. Relatively, Chinese QA evaluation is not popular, and the data sets which are Chinese are also rare. The research on QA in Chinese is relatively backward because of these reasons. We use online search engine to achieve answer retrieval, and the mainly word of this thesis is question analysis and answer extraction.At the part of question analysis, we present a new method which is based on words composing and categories of questions to build the stop-list. This method composes n adjacent words as phrases, and chooses stop-words from the phrases. During this process, class labels are concerned. It repeats until n decreased to0. This method performs well on our data set.Then, we present a new feature extraction method called TFC-ICF, which is based on categories of questions. TFC-ICF has the same basic idea with TF-IDF. This method considers both the ability of a word that identifies a category and the distribution of this word in all the categories. This can improve the quality of features. SVM classifier is used in this paper. The precision rate can reach80.45%while using TFC-ICF. We consider it as the baseline. In order to improve the precision of classification, we present3ideas: manual-choosing feature, keyword extension based feature and semantic based feature. And we try different ways while using features in the last two methods. The precision rate can achieve86.01%、85.14%and82.13%respectively, when combining TFC-ICF with these methods.At the part of answer extraction, we firstly discuss how to construct candidate-sentences set while using sentence similarity method based on support vector model. Then, entity recognition is implemented according to questions’ categories. Finally, we introduce how to extract answer list based on sentence similarity score and entities. We achieve good result on CLQA test set of NTCIR5.We achieve some result on question classification and answer extraction, but there are also some weakness in our work. For example, the quality of the question set can be better, and there are room for improvement about the performance of entity recognition and answer extraction. In the future, our work’ll concentrate on these parts.

Keywords/Search Tags:

Question Answering, Question Classification, Feature Selection, AnswerExtraction, Entity Recognition

PDF Full Text Request

Related items

1	Design And Implementation Of Chinese Question Answering System For Complicated Statements
2	Research On Short Question Classification Based On Automatic Question And Answering
3	Research On Question Answering Technology For Answering History Subject Question
4	The Study Of Recognition And Classification Of Entity For Question Answering System
5	Research And Application Of Key Technologies Of Community Question Answering
6	Question Understanding Based On Graph Matching In Question Answering Over Knowledge Base
7	Research On Question Understanding Method Of Knowledge Graph Question Answering System
8	Research On Question Feature Model Combining With Ontology In Chinese Question Classification
9	The Research On Feature Extraction And Question Classification Of English Sentence In Automatic Question Answering System
10	Question Classification Method And Its Application In Question Answering System