Font Size: a A A

Automatic Summarization Method Based On Sentence Vector And Statistical Features

Posted on:2019-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z B ChenFull Text:PDF
GTID:2428330566497552Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automatic text summarization has received extensive attention as an important research topic in the field of Natural Language Processing.Due to the rapid growth of information,technology of simplifying text is more and more important.Although there are many researches on automatic text summarization,there are still some problem in the field of summarization.In terms of data,high quality data is scarce,especially in chinese.Besides,some traditional technology such as statistical based summarization,graph based summarization or machine learnig summarization can not analysis semantic,while NLP based summarization and deep learning based summarization require additional data or a large number of training data to analysis semantic.To solve data problem above,we using weibo to collect chinese summarization data.As for the problem of semantic,we proposed two kinds of autoencoder model to analyze semantic.One of them use information of POS and the other utilize similarity between sentences.And we also design some statistical feature to help genrenate higher quality summarization.On the POS model,we build network according to the POS information and expand the data by combining different sentences.On the similarity model,we utilize similarity between sentences to build network.We train the model with vectors of words bag and update paramter s according to encoding loss and similarity loss.We use combination to expand training data in the same way.Besides,we extract some statistical feature such as entity number and sematic location by applying semantic graph.In the experiment,we choose Lead,Text Rank and integration importance non-redundancy and coherence summarization as compared algorithms.In index of ROUGE-2,ROUGE-3 and ROUGE-4,our model which based on the POS and statistical achieved 10.154%,15.779%,18.253% improvement respectively.And the other model which based on similarity and statistical feature obtain better results,which imporve 13.327%,19.399%,22.058% respectively in the index of ROUGE-2,ROUGE-3 and ROUGE-4.
Keywords/Search Tags:summarization, autoencoder, sentences vector
PDF Full Text Request
Related items