Font Size: a A A

Research On Question Classification Based On Semantic Information

Posted on:2015-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:K YinFull Text:PDF
GTID:2268330428976089Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid expansion of information, it becomes a more and more important topic how to access what we need from massive amount of information accurately and quickly. Question Answering system, which can understand user’s question in natural language, position precisely and extract quickly corresponding correct answer, has been one of effective means to solve the problem, Question classification, which is a core component of the question answering system, can provide semantic restrictions and constrains for the answer selection and extraction.Question Classification has some degree of similarity with Text Classification. They both classify the text according to its contents. However, compared with Text Classification, Question Classification faces many difficult problems that include:1. The dimensionality of the feature vector space used for question representation is too high, and the correlation between those feature vectors is too weak.2. The feature vector space is too sparse while the question is short.With the purpose of solving the above two problems, this thesis focuses on semantic of feature words, constructs knowledge base that has semantic support ability, and then proposes a question classification method based on semantic. The main contents of this thesis are outlined as follows:Firstly, based on Baidu encyclopedia, this thesis proposes a method acquiring Semantic Similarity automatically. It takes Baidu encyclopedia entry and its related terms as the graphical nodes that has link relations, and then calculates Semantic Similarity of encyclopedia entries with SimRank algorithm.Secondly, based on the Semantic Similarity of encyclopedia entries, this thesis proposes a method extracting instance-of semantic relation of entries automatically. This subject collects and organizes open Classification Tree of Baidu encyclopedia as the concept hierarchy of semantic knowledge base, clusters similar entries in semantics, and then calculates the concept (i.e., classification of Baidu encyclopedia) of each clusters. In this way the subject realizes extraction of instance-of relation of entries and construction of semantic knowledge base.Finally, based on semantic information this thesis proposes a question classification method. This subject generalizes the question and transfers the terms to its concepts, then conducts feature extraction for generalized question, and finally classifies the question with SVM algorithm.In order to verify the effectiveness and availability of this subject, the experiments on the problems dataset of Baiduzhidao show that semantic knowledge base can provide the support of semantic, the method of problem classification based on semantic could solve the problems caused by high dimension, less correlation and sparse data of feature vector space, and this method has a better accuracy.
Keywords/Search Tags:Semantic knowledge base rules, Baidu encyclopedia, Semantic Similarity, Instance-of, Question classification, SVM
PDF Full Text Request
Related items