Font Size: a A A

Automatic Summarization Of Academic Literature Based On Deep Learning

Posted on:2019-04-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Z WangFull Text:PDF
GTID:1368330572968601Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the arrival of big data era,online academic resources are growing explosively,where an increasing number of scholars are getting stuck in the vast ocean of literature.Therefore,how to automatically summarize a collection of literature in a particular discipline into a concise yet comprehensive report has become one of the hot issues in current study and practice of the knowledge management.As an important technology for the natural language processing,automatic summarization presents the most critical information in a way that is concentrated and close to users' needs,with the purpose of helping researchers achieve the goal of "standing on the shoulders of giants".This paper focuses on how to improve the automatic summarization of academic literature,and develops a research system for "Automatic Summarization of Academic Literature based on Deep Learning".Specifically,this system involves a series of theories and methods about the deep learning such as text representation based on neural networks and automatic summarization based on Seq2Seq models,as well as the classical text mining algorithms like LDA and Labeled-LDA two statistical topic models,and PageRank and PageRank with Priors two link analysis methods.As for the numerical experiment,this study selects a considerable portion of literature on the computer science from the ACM(American Computer Association)digital library to validate the proposed model.The main contents of this paper are as follows:1.This paper formulates the "literature review generation" into a problem of sequential text generation,and then proposes a Seq2Seq model based on hierarchical neural networks that mainly consist of a hierarchical document encoder and an attention-based decoder.To be specific,the encoder derives the semantic representations for the sentence-level and document-level through the CNN and RNN respectively,which not only reflects the hierarchical nature of an article correctly,but also avoids the vanishing gradient and the information loss that are both caused by long word sequences.During the decoding phase,both the saliency and novelty of each candidate sentence are considered simultaneously to minimize the redundancy of a generated summary when maximizing its representativeness.2.As the literature review is context-aware,this paper puts forward a Seq2Seq model which fuses the contextual information.For characterizing the context relevance between each candidate sentence and its target document more accurately,the Labeled-LDA is utilized first to infer the topic distribution of each sentence,then the sentence topic is integrated into the encoding process of documents,and finally the source texts are also encoded at the same time to be included into the decoding phase.3.Since statically analyzing the context relevance cannot satisfy the fact that the text corpus is changing dynamically,this paper investigates the importance of graphic context toward the "literature review genreation" from the perspective of information network,and then raises a Seq2Seq model with a joint context-driven attention mechanism.Specifically,the Node2vec is employed first to vectorize every node of the heterogeneous bibliography network,then the connectivity distance within the graphic context is measured for any pair of papers,and finally two different context relevance measured from both the texts and a heterogeneous bibliography network are introduced into the decoding period simultaneously.
Keywords/Search Tags:Deep Learning, Automatic Summarization, Seq2Seq Model, Context Relevance, Heterogeneous Bibliography Network
PDF Full Text Request
Related items