Automatic Summarization Method Based On Sentence Vector And Statistical Features

Posted on:2019-01-31

Degree:Master

Type:Thesis

Country:China

Candidate:Z B Chen

Full Text:PDF

GTID:2428330566497552

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Automatic text summarization has received extensive attention as an important research topic in the field of Natural Language Processing.Due to the rapid growth of information,technology of simplifying text is more and more important.Although there are many researches on automatic text summarization,there are still some problem in the field of summarization.In terms of data,high quality data is scarce,especially in chinese.Besides,some traditional technology such as statistical based summarization,graph based summarization or machine learnig summarization can not analysis semantic,while NLP based summarization and deep learning based summarization require additional data or a large number of training data to analysis semantic.To solve data problem above,we using weibo to collect chinese summarization data.As for the problem of semantic,we proposed two kinds of autoencoder model to analyze semantic.One of them use information of POS and the other utilize similarity between sentences.And we also design some statistical feature to help genrenate higher quality summarization.On the POS model,we build network according to the POS information and expand the data by combining different sentences.On the similarity model,we utilize similarity between sentences to build network.We train the model with vectors of words bag and update paramter s according to encoding loss and similarity loss.We use combination to expand training data in the same way.Besides,we extract some statistical feature such as entity number and sematic location by applying semantic graph.In the experiment,we choose Lead,Text Rank and integration importance non-redundancy and coherence summarization as compared algorithms.In index of ROUGE-2,ROUGE-3 and ROUGE-4,our model which based on the POS and statistical achieved 10.154%,15.779%,18.253% improvement respectively.And the other model which based on similarity and statistical feature obtain better results,which imporve 13.327%,19.399%,22.058% respectively in the index of ROUGE-2,ROUGE-3 and ROUGE-4.

Keywords/Search Tags:

summarization, autoencoder, sentences vector

PDF Full Text Request

Related items

1	Research Of Automatic Summarization Based On Named Entity
2	A Study Of Automatic Summarization For English Document By Citing Sentences
3	Text summarization using concept hierarchy
4	Research On The Method Of Differential Summarization Of Bilingual News
5	Automatic Text Summarization Using Importance of Sentences for Email Corpus
6	Study On Multi-Document Summarization Algorithm Based On Fusing Topic Sentences Semantic
7	Research In Sub-Topic Based Multi-Document Summarization
8	Research On Chinese Automatic Summarization Based On Clustering Algorithm
9	Research On Chinese Automatic Summarization System
10	Research On Adaptive Video Summarization Algorithms