Font Size: a A A

Research On Extractive Text Summarization Method Based On Unsupervised Ensemble Learning

Posted on:2021-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:J F LongFull Text:PDF
GTID:2428330602989105Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the growth of user-created content in social media on the Internet,the number and size of electronic documents availablel on the network have become enormous.In this case,in order to analyze a large amount of generated data,a natural language processing(NLP)application is required.Automatic text summarization(ATS)is an ever increasing and challenging task in the NLP field.Its goal is to produce a simplified version of a large document while retaining the main ideas that existed in the original document.The traditional automatic text summarization methods are mostly based on supervised learning methods,which require a lot of manual annotation data:At the same time,the representation of high-dimensional data and sparse data will make it difficult to capture semantic information.Aiming at these problems,this paper explores the extractive text summarization method based on unsupervised integrated learning,and designs and implements a method based on the integration of unsupervised deep neural network and word embedding method to improve the quality of automatic text summarization in automatic text summarization tasks.1)First of all,this article uses the Word2Vec word embedding model.Compared with the traditional bag of words model(BOW),it can convert high-dimensional data into a low-dimensional vector representation.At the same time,it is a more expressive form of representation,and the generated vectors have the semantic relevance of the context.2)Secondly,this article combines Word2Vec and TF-IDF coefficients to improve the Sentence2Vec sentence vector representation method.3)In addition,this paper proposes an adaptive K-value text abstraction extraction algorithm,which improves the accuracy of the clustering algorithm by automatically determining the number K of the center of the text,thereby improving the accuracy of extractable text summaries.4)For the lack of a large amount of labeled data,the unsupervised method is more suitable.The unsupervised models used in this paper are automatic encoder(AE),variational autoencoder(VAE)and extreme learning machine encoder(ELM-AE).Through the combination of three unsupervised feature learning methods,exploring sentences similarity and improving the quality of automatic text summarization.ROUGE evaluation indicators are used to evaluate and compare the results on the relevant data sets.In addition,this paper designs and implements a text information extraction system.The core functions in the system are compared with existing methods.Experimental results show that the system has better practical application value than some existing methods and open source systems.
Keywords/Search Tags:extractive text summarization, unsupervised learning, ensemble learning, word2vec, deep neural network
PDF Full Text Request
Related items