Font Size: a A A

Research On Answer Summary Of IT Question Answering System Based On Deep Learning

Posted on:2024-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y CaoFull Text:PDF
GTID:2568307115477194Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Answer summary technology typically utilizes algorithms to integrate answers from multiple perspectives,which is used to obtain high-quality,low redundancy,and complete answers.The Internet’s swift growth has caused a tremendous influx of data,and the amount of knowledge accessed by users has skyrocketed.Due to the large amount of useless or redundant information in posts of Stack Overflow,and most of this information is incomplete,leading to problems such as poor answer search efficiency and long search process.In view of the need for developers to quickly and accurately obtain information,it is necessary to use document automatic summarization technology to summarize posts on Stack Overflow,so as to generate high-quality answer summaries.In response to the above problems,in order to generate high-quality and concise answer summaries more efficiently,this thesis conducts the following research on answer summarization algorithms and answer summarization systems:(1)To address the current lack of high-quality question and answer(Q&A)pair data,this thesis constructs SO Q&A pair dataset.The main application of crawler technology was to obtain 58907 Q&A pair data from the Stack Overflow,covering four programming languages: Java,Python,Java Script,and C.By cleaning,filtering,deduplicating,labeling and so on,the SO Q&A pair dataset was constructed,including 3070 questions and 36407Q&A pair.(2)There are some problems in community of software engineering,such as low quality answer posts,containing a large amount of redundant information,and incomplete posts.This thesis proposes an answer summary model(ITSum),which mainly includes three key modules: the relevance ranking module uses the answer correlation ranking model based on the BERT pre-training model to retrieve answer sentences related to technical questions;Integrating the multi-feature sentence extraction module combines sentence content features and user oriented features to extract answer sentences useful for technical questions by fusing information entropy,answer sentence position,answer vote count,and other features;The answer generation module uses the Maximum Marginal Relevance algorithm(MMR)to remove redundant answer sentences in the candidate set,and then selects five diversified and high-quality sentences to form an answer summary.Finally,a comparative experiment was completed on the Tech Sum Bench dataset,which verified that the quality of the answer summary generated by the ITSum model is higher than other models.(3)Design and implementation of an online answer summarization system.The system is designed and implemented based on microservice architecture,and combined with the ITSum algorithm proposed in this paper,the on-line answer summary system is implemented.The system can return high-quality,concise and complete answer summary information for users through data processing,model training,answer summary generation and other task processing modules,combined with the user interaction interface,which effectively improves the efficiency of user search questions.
Keywords/Search Tags:CQA Q&A, BERT model, Multi document automatic summary, Answer summary
PDF Full Text Request
Related items