Font Size: a A A

Research On Automatic Title Summarization Based On Knowledge Graph And Deep Learning

Posted on:2021-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhaoFull Text:PDF
GTID:2518306107493514Subject:Engineering (Computer Technology)
Abstract/Summary:PDF Full Text Request
The news industry and short-review articles require a large number of headline abstracts,which not only puts forward demand for news editors,but also greatly affects the user experience of the audience,limiting the intelligent development of the news media industry.Therefore,headline summarization technology is essential for the development of news media communication.The automatic title abstract technology can be regarded as a branch of the traditional long text abstract.The core is to extract or generate high-quality titles that can summarize the full text according to the text information.Extractive summaries consist of sentences with high importance extracted from the original text by evaluating the importance of sentences in the original text,while generative summaries use a series of natural language processing techniques and are composed of more concise and capable sentences generated by computers.Compared with traditional extractive abstraction methods,deep learning networks can retain more semantic information.Compared with extractive automatic abstraction,generative automatic abstraction is more in line with human writing habits,and has the characteristics of simplicity,flexibility and diversity.The thesis mainly studies the abstract generation method based on deep learning and knowledge graph,and designs two summary generation systems based on deep learning.The main work is as follows:(1)Summary generation based on deep learning Seq2 Seq framework.First,after cleaning and classifying the Tsinghua news data set and the crawled news data set,the research is carried out by word vectors and word vector routes respectively.Technically,convolutional neural networks,LSTM networks and BERT pre-training models are used to perform deep features on the text.Extraction,combined with attention optimization mechanism,pointer generation network,Beam Search and other technical optimization models.Through experimental comparison,this paper proposes that the architecture can generate better quality titles.(2)Propose a method based on the combination of traditional generative abstract and extractive abstract,and use Textrank and TFIDF algorithm for text pre-processing stage.The experiment proves that this method can effectively improve the data utilization rate,and can make downstream tasks get more and better quality.Dataset.(3)Effectively combine the semantic knowledge of the knowledge graph,so that the effect of generating the title summary is significantly improved.In order to improve the professionalism and readability of abstracts,the knowledge features of knowledge triples are integrated and improved and optimized on the LSTM network and the BERT pre-training model respectively.It is proved through experiments that the fusion of knowledge features of knowledge triples can be Generate higher quality professional titles.
Keywords/Search Tags:LSTM Network, BERT Pre-training Model, Knowledge Graph, utomatic Title Generation
PDF Full Text Request
Related items