Font Size: a A A

The Research On Feature Extraction And Question Classification Of English Sentence In Automatic Question Answering System

Posted on:2019-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:X K YiFull Text:PDF
GTID:2348330542972027Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The goal of the automatic question answering system is to correctly understand the questions described by the user in natural language,and then feedback the answer to the user efficiently and accurately.Question classification is the first step of the question answering system.Accurate classification of questions cannot only effectively narrow the scope of the answer search,and it can effectively improve the accuracy of answer retrieval,although natural language processes and machine learning techniques have significantly improved the level of question classification.However,the performance and accuracy of question classification still need to be further improved.Question feature is the key factor that affects the classification effect of question sentence.This paper proposes an improved semantic feature extraction method for question words based on information gain model.Firstly,the semantic similarity of words in question sentences is calculated by WordNet,then the information gain of words is calculated based on the information gain model,and the importance of words is evaluated according to the information gain.Finally,the words with high gain are selected to form the semantic feature space of question sentences.In order to obtain the lexical feature of question sentence,the question sentence is expressed as the sequence of words.A method of frequent pattern mining of question sentences based on sequential pattern mining is proposed.The frequent patterns constitute the lexical feature space of question sentences.In this paper,three kinds of classifiers are used to evaluate the proposed method on the common UIUC dataset.The experimental results show that the classification effect of SVM is better than that of naive Bayes classifier and C4.5.Based on support vector machine,the accuracy of large class is 96%and that of small class is 90%,which is superior to the existing method of question classification.The syntactic feature extraction method proposed in this paper saves computational overhead effectively because it does not need to parse questions.
Keywords/Search Tags:Question Answering System, Question Classification, Feature Extraction, Information Gain, Sequence Pattern Mining
PDF Full Text Request
Related items