Font Size: a A A

Research Of Chinese And Japanese Question Classification

Posted on:2010-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:X KangFull Text:PDF
GTID:2178360278966402Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Online Question Answering (QA) systems collect large archives of questions and their answers and offer precise information on natural language queries on the Internet.As the preprocessing part of QA system, Question Classification (QC) is a currently hot research area. It tells QA "what the user is seeking for", and helps searching candidates and directing the answer selection. Most researches are engaged in English QC, but QC for multiple languages is still not well studied.This study focuses on two mayor QC problems: conquering language gap between Chinese and Japanese and comparing language features for machine learning classification in QC.Chinese and Japanese question sentences vary through out the real corpora in creating words, building phrases and composing sentences. In this study, lexical, syntactic and semantic features are separately studied. We employed the Support Vector Machines (SVM) for classification, in which Linear Kernel function is applied for lexical feature and Subset Tree Kernel (SSTK) function for syntactic and semantic features.Results show that the lexical feature plays a more important role for Chinese QC, while for Japanese the three features are of the same importance. The combining feature outperforms others, with 80.95% accuracy for Chinese and 80.19% for Japanese.
Keywords/Search Tags:question answering, question classification, support vector machines, subset tree kernel, machine learning, natural language processing
PDF Full Text Request
Related items