Font Size: a A A

Research And Implementation Of Automatic Abstract Generation System Based On Deep Learning

Posted on:2022-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:T MaFull Text:PDF
GTID:2518306314451744Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the gradual improvement of network infrastructure,the rapid development of the Internet industry and the increasing number of network users,the amount of network information has increased dramatically.Famous Internet enterprises such as Baidu,Alibaba and Tencent have established massive cloud databases to store network information.When faced with massive data,how to extract high-quality information is very important.Automatic summary generation technology can extract concise text from redundant text,reduce data redundancy and improve user's reading efficiency.Therefore,how to extract abstracts efficiently has become a hot issue.Automatic summary generation technology is mainly divided into extractive summary technology and generative summary technology.Abstract technology is to extract important sentences from an article as an abstract.Although it retains the information in the original text to a great extent,it can not deeply understand the text information,and the resulting abstract is not consistent with the summary rules.As deep learning technology is gradually introduced into natural language processing,generative summarization technology has become the mainstream of current research.However,there is a large amount of redundant information in long text,which will lead to the encoder unable to accurately extract the feature information of the input text,resulting in long-distance dependence problem,and eventually the model can not converge,The effect of generating summary is poor.In this paper,a two-stage summarization generation algorithm is proposed.In order to reduce the redundant information in the text and retain the rich information of the original text to the greatest extent,in the first stage,BM25 algorithm is used to calculate the BM25 similarity between sentences,then the semantic vector of sentences is obtained by using the Bert model,and the semantic similarity of two sentences is calculated by using the cosine similarity formula.Finally,BM25 similarity and semantic similarity are combined according to the weight ratio,The similarity is input into textrank algorithm for iteration,and the key sentences are extracted;In the second stage,the extracted key sentences are input into the seq2 seq model based on attention to get the summary.The experimental results show that the two-stage summary generation algorithm is better than the generative and extractive ones,which improves the effect of summary generation.This system uses B / S architecture to design the text summary generation system.The main functions are summary generation module and summary search module.After testing,the summary generation algorithm designed in this paper improves the effect of summary generation and meets the needs of users.The implementation process of the system is as follows:1.Requirement analysis stage: analyze the user's requirements,determine the goal of generating the summary system,analyze whether the development cost of the system is low,whether the technology of the system is mature,whether the system runs smoothly,whether the system is reliable,and analyze the non functional requirements of the system.2.System design: first,the overall structure of the system is designed,and then the framework of the system is built;Secondly,the IPO table is used to design the system in detail,and each functional module is designed from the total to the score;Finally,the conceptual structure and table of the database are designed.3.Automatic summary generation algorithm: This paper proposes a two-stage summary generation algorithm.In the first stage,the key sentences are extracted by graph model algorithm,and in the second stage,the extracted key sentences are input into the deep learning model to get the summary.4.System implementation and testing stage: Based on My Eclipse platform,Java Script language is used to code the foreground interface,python language is used to complete the writing of deep learning model,Java language is used to complete the writing of key sentence extraction code,and rouge evaluation index is used to evaluate the summary effect.
Keywords/Search Tags:Abstract generation, textrank algorithm, deep learning, seq2seq, BM25
PDF Full Text Request
Related items