Research On Extractive Text Summarization Method Based On Unsupervised Ensemble Learning

Posted on:2021-05-31

Degree:Master

Type:Thesis

Country:China

Candidate:J F Long

Full Text:PDF

GTID:2428330602989105

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the growth of user-created content in social media on the Internet,the number and size of electronic documents availablel on the network have become enormous.In this case,in order to analyze a large amount of generated data,a natural language processing(NLP)application is required.Automatic text summarization(ATS)is an ever increasing and challenging task in the NLP field.Its goal is to produce a simplified version of a large document while retaining the main ideas that existed in the original document.The traditional automatic text summarization methods are mostly based on supervised learning methods,which require a lot of manual annotation data:At the same time,the representation of high-dimensional data and sparse data will make it difficult to capture semantic information.Aiming at these problems,this paper explores the extractive text summarization method based on unsupervised integrated learning,and designs and implements a method based on the integration of unsupervised deep neural network and word embedding method to improve the quality of automatic text summarization in automatic text summarization tasks.1)First of all,this article uses the Word2Vec word embedding model.Compared with the traditional bag of words model(BOW),it can convert high-dimensional data into a low-dimensional vector representation.At the same time,it is a more expressive form of representation,and the generated vectors have the semantic relevance of the context.2)Secondly,this article combines Word2Vec and TF-IDF coefficients to improve the Sentence2Vec sentence vector representation method.3)In addition,this paper proposes an adaptive K-value text abstraction extraction algorithm,which improves the accuracy of the clustering algorithm by automatically determining the number K of the center of the text,thereby improving the accuracy of extractable text summaries.4)For the lack of a large amount of labeled data,the unsupervised method is more suitable.The unsupervised models used in this paper are automatic encoder(AE),variational autoencoder(VAE)and extreme learning machine encoder(ELM-AE).Through the combination of three unsupervised feature learning methods,exploring sentences similarity and improving the quality of automatic text summarization.ROUGE evaluation indicators are used to evaluate and compare the results on the relevant data sets.In addition,this paper designs and implements a text information extraction system.The core functions in the system are compared with existing methods.Experimental results show that the system has better practical application value than some existing methods and open source systems.

Keywords/Search Tags:

extractive text summarization, unsupervised learning, ensemble learning, word2vec, deep neural network

PDF Full Text Request

Related items

1	Research On Extractive Multi-document Summarization Using Supervised Deep Learning
2	Research On Extractive Summarization Of Scientific And Technological Information Text Based On Deep Learning
3	Research On Extractive Summarization Methods For Cambodian Language Multi-documents
4	Research On Meeting Text-oriented Extractive Summarization
5	Research Of Hybrid Text Summarization User Dynamic Interest Model Technology Based On Deep Learning
6	The Research And Implementation Of Automatic Text Summarization System For New Media
7	Text Summarization Based On Neural Network Joint Learning
8	Research On Extractive Text Summarization Based On Maximal Marginal Relevance
9	Research And Application Of Automatic Text Summarization Technology Based On Deep Learning
10	Research And Application Of Text Summarization Model Based On Deep Learning