Font Size: a A A

Research On Answer Summarization Algorithm In Non-factoid Community Question Answering

Posted on:2018-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:H Y SongFull Text:PDF
GTID:2348330512990267Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,we have witnessed a rapid growth in the number of users of community question-answering(CQA)services.Community question-answering services provide users a platform to post questions and find answers.And the rich data in CQA platforms has already gained researchers' interest.The task on which we focus in this thesis is answer summarization in community question-answering.While most previous work focuses on factoid question-answering,we focus on the non-factoid question-answering.Unlike factoid CQA in which questions are typically asking for an exact result and answers are typically short sentences,non-factoid question-answering usually asks for opinions and advice,so they would require paragraphs and passages as answers.Compared to traditional multi-document summarization task that usually focuses on summarizing news articles,summarizing answers in non-factoid CQA faces its specific challenges:the shortness and sparsity of answer sentences,and the diversity of topics of answers.To tackle these challenges,we propose a sparse coding-based summarization strategy that includes three core ingredients:document expansion of short answer sentences,sentence vectorization,and a sparse-coding optimization framework.Specifically,we extend each answer sentence in a question-answering thread to a more comprehensive representation via entity linking and sentence ranking strategies,utilizing Wikipedia as an external resource.From answers extended in this manner,each sentence is represented as a feature vector trained from a short text convolutional neural network model.We then use these sentence vector representations to estimate the saliency of candidate sentences via a sparse-coding framework that jointly considers candidate sentences and Wikipedia sentences as reconstruction items.Given the saliency vectors for all candidate sentences,we extract the top-ranked sentences to generate an answer summary based on a maximal marginal relevance algorithm.Our contributions in this thesis can be summarized as follows:we address the task of summarizing answers to non-factoid questions in community question-answering by tackling the shortness,sparsity and diversity challenges of answers.We also evaluate the performance of our proposed method in answer summarization of non-factoid CQA on a benchmark data collection,compared with several state-of-the-art baselines.The experimental results confirm the effectiveness of our proposed method,and moreover,its significant improvement compared to state-of-the-art baselines in terms of ROUGE metrics.Deeper analysis of experimental results also suggests the robustness and scalability of our proposed approach.
Keywords/Search Tags:Community question-answering, Sparse coding, Short text processing, Document summarization
PDF Full Text Request
Related items