Font Size: a A A

Research On Question Classification In Community-based Question Answering Service

Posted on:2018-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhangFull Text:PDF
GTID:2348330542965251Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The Community Q&A service provides a concise and accurate answer to the natural language questions raised by users.With the rapid development of user interaction information on Internet,Q&A community has been paid more and more attention.Question classification is one of the most basic tasks in the Q&A system,which can effectively reduce the candidate space of answers and affect the extraction strategy of answers.Specifically,according to the different types of questions,the system will adopt different strategies of answer selection and knowledge bases.Question classification can be regarded as a special task of text classification.However,there are obvious differences between the question classification and the general text classification task.First,the question text is usually short and the number of words contained in the question is very limited.If only using question text for classification,there will be a very lack of information for question classification.In addition,the traditional bag of words model for text representation can not capture the semantic relations of the words in the text and lose abundant useful information.Secondly,the resources of the labeled questions are scarce,and the tagging task of questions takes a lot of time,manpower and material resources.Therefore,how to add more auxiliary features to extend problem information and make full use of unlabeled sample information to improve the performance of classification is an important problem and urgently needs to solve in question classification.This paper mainly focuses on the question classification in community Q & A service.The main contents of this paper include the following three aspects:At first,this paper proposes a semi-supervised question classification method based on label propagation algorithm.The core idea of this approach is as follows: Firstly,the answer feature is combined with the problem feature to represent each sample.Then,the label propagation method is used to train the classifier for the annotated problem,and automatically tagging the unlabeled questions' categories.Finally,the original labeled question and automatically tagged questions are merged as the training set,and the maximum entropy model is constructed to classify the test text.The experimental results demonstrate that the semi-supervised question classification method with answer assistance we proposed can make full use of unlabeled samples to boost performance,which is also obviously superior to other benchmark methods.Secondly,this paper also proposes a method of semi-supervised question classification with representation learning.The characteristic is that we consider the question and its corresponding answer as conjunct context to learn the word distributed representation.Specifically,neural network language model is introduced to learn question and answer representations jointly,so that the word vectors of question are added more information.Secondly,large numbers of unlabeled questions and answers participate in word vectors learning,which could strengthen the representation capacity of question word vectors.Finally,we represent the questions of word vectors as training samples,adopting the convolutional neural network to construct the question classifier.The experimental results demonstrate that the method of semi-supervised question classification with synergetic representations learning in this paper can make full use of word vectors and the unlabeled samples to improve the performance,and is better than other strong semi-supervised methods.Finally,this paper proposes a new approach named dual-channel LSTM model with bilingual information.Firstly,we extend the Chinese corpus and English corpus with the corresponding translated corpus for each other,which can reduce the workload of the monolingual corpus tagging effectively.Secondly,the samples are respectively represented by the original question text and its translated text to enrich the information of training samples.Finally,we build a question classifier using dual-channel LSTM model.The experimental result demonstrates that our approach can improve the performance of question classification and is better than other benchmark methods.
Keywords/Search Tags:Q&A Community, Question Classification, Representation Learning, Semi-Supervised Learning, Bilingual Information
PDF Full Text Request
Related items