Font Size: a A A

Research And Application Of Answer Ranking And Question Retrieval In Community Question Answering System

Posted on:2019-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2428330542494197Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Community Question Answering(CQA)system is a higher-level Information Retrieval system,There are two main differences between CQA and Search Engine(SE),CQA has high-quality knowledge base which is answered and maintained by community in long term.CQA system returns what the users want to obtain while SE gives some related web pages.In recent years,CQA has become a research focus in information retrieval field,but the efficiency of information extraction is still a difficult problem.The main work of this dissertation can be categorized into three aspects:1)An modified topic model is proposed.The traditional LDA model does not take into account spam topic and similar topic,which affects topic similarity accuracy.In Chapter 3,to retrieve semantic topic,spam topic filtering algorithms and topic similarity detection algorithms are proposed to remove spam topics and detect similar topics.The experimental results on the Fudan corpus verify that the modified topic model can improve text classification accuracy.Chapter 3 also introduces the model into CQA system,the experimental results on the SemEval dataset on the one hand determine the optimal feature combinations for the answer ranking and question retrieval tasks,on the other hand,it proves that the model can improve accuracy of answer classification.2)An information enhancement method for CQA is proposed.Using original questions,similar questions and answers constructing annotated dataset,the annotated dataset is integrated into the original dataset,which enhances semantic of sentence pairs.Exploring the modeling methods of five deep neural networks for CQA,experiments found that BiLSTM network and Attention network have overall higher answer classification accuracy and MRR,experiments also found data cleaning is necessary in most scenes,but for some cases(such as smaller dataset,BiLSTM network),data cleansing will damage performance of CQA system.3)Designed the flow charts of CQA system.Analyzing the existing problems and key technologies,the solution for each problem can be summarized as follows:For the problem of high computational complexity of information retrieval in knowledge bases,A two-stage ranking method is designed,the first-stage uses Elasticsearch to construct a simplified problem set,and the second-stage uses machine learning to rank further in the simplified problem set.For the problem of frequentcontent updating and high coincidence of requests,Spark engine is used to rankanswers and update hot questions in real time,pre-calculated partial text features andhot questions are cached in NoSQL databases,the hot questions are selected andupdated using LRU algorithm.These flow charts are designed by combining theproposed methods with big data frameworks,to reduce overall system response delayand improve the accuracy of answer ranking and problem retrieval.
Keywords/Search Tags:community question answering, answer ranking, problem retrieval, topic model, deep learning, information enhancement, flow chart design
PDF Full Text Request
Related items