Font Size: a A A

Research And Implementation Of Key Technology Of Chinese Automatic Summarizing

Posted on:2019-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:H R ZhangFull Text:PDF
GTID:2428330566497300Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,a large number of text data are generated every day.Summarization is the main content of the text.Automatic summarization provides a quick way to understand the content of the original text.At the same time,automatic summarization research has a wide and important application scene,such as Web search engine summary,knowledge fusion of question answering system,hot spot and topic tracking of public opinion supervision system.Therefore,the research of automatic summarization will promote the development of the whole Natural Language Processing.This paper mainly studies Chinese extractive and abstractive automatic summarization.For extractive summarization,five kinds of common methods are investigated and realized: rule based and statistical method,graph based model method,integer linear programming,word vector packet method and machine learning method.And the focus of this paper is that in the method based on graph model,many methods have been completed to improve the sentence similarity calculation.Compared with the traditional graph model method,the effect is improved obviously.In the machine learning method,the word cha racter,the dependency syntactic feature,the name of the name of the life body,the word vector and the statistical feature are fused.It forms a 115 dimensional rich and representative feature vector space.In this paper,the abstract task is taken as a regression problem,which avoids the disadvantages of classifying abstraction as the sample category of two classification problems and cannot complete long summarization.And the method of calculating the regression value label is put forward creatively.For abstractive automatic summarization,this paper uses the deep learning of sequence to sequence(Seq2Seq)model.The decoder predicts the sequence of target words based on the abstract representation of the source language by the encoder.It is this abstract representation that provides the possibility of generating automatic summarization.Although we implement the abstractive Automatic Summarization Based on the deep learning model,there are still many drawbacks,such as generating duplicate words.In order to facilitate the display,this paper finally implements a Django system to invoke the experimental interface and present the result summary of each method.
Keywords/Search Tags:automatic summarization, feature vector space, Seq2Seq, regression
PDF Full Text Request
Related items