Research On Text Classification And Automatic Summarization Based On Distributed Representation

Posted on:2019-09-04

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhu

Full Text:PDF

GTID:2428330572461857

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

How to represent texts for machine processing is the first problem for natural language processing.Mapping text to vectors is a good way to solve this problem.The Distributed Representation method maps text elements into fixed-length vectors by neural network training,and the distance between vectors characterizes the semantic relevance between text elements.The Distributed Representation method overcomes the problems of one-hot vectors that maps too high dimension and can't characterize the similarity between text elements.In this paper,a new text categorization algorithm and multi document automatic summarization algorithm are designed based on the distributed representation of texts.To deal with the large dimension and extremely sparse structure of the text graph model,this paper designs the concept directed graph model for the text by combining the distributed representation of the words with the representation of the text graph model.First,we map the words in the text to the word vector,and gather the words with high semantic relativity into the concept through word vector clustering.Then the concept directed graph model is constructed according to the order relation of the words,and save the adjacency matrix corresponding to the concept of a text directed graph model as a grayscale image.The Natural Language Processing task is transformed into image processing task,and the text is mapped to the grayscale image.Finally,a three-layer convolutional neural network is designed to classify text grayscales and compare the classification results with other text classification algorithms,the results show that the algorithm proposed in this paper is better than the other three text categorization algorithms.Aiming at the redundancy problem of abstract sentences on multi-document automatic summarization in China,this paper combined distributed representation method of model sentences,with the spectral clustering algorithm,and designed an extractive multi-document summarization algorithm based on spectral clustering.First,we maps the sentences in the text to the sentence vector,then clustered sentence vectors by spectral clustering algorithm,and divided the documents into different sub-themes document.Then,the sentence relation graph model is established in each sub topic documents,and the sentence weight is iterated by TextRank algorithm.Finally,extract the sentence with the largest weight as the summary sentence,and make a summary according to the location of the sentence in the original text.The internal evaluation methods for the abstract often require manual participation and can not be efficient and objective.Therefore,this paper proposes an automatic abstract evaluation method based on text information entropy.The quality of the abstract is measured by the ratio of the abstract information entropy to the original document information entropy.This evaluation method does not require humans to writer a reference abstract,and using this evaluation method,we compare the proposed multi document summarization algorithm with other algorithms,the results show that the multi-document automatic summarization algorithm proposed in this paper is better than the other two automatic summarization algorithms.

Keywords/Search Tags:

Distributed Representation Model, Concept Directed Graph Model, Text Classification, Multi-Document Summarization, Text Information Entropy

PDF Full Text Request

Related items

1	Research On Text Summarization Technology Based On Abstract Meaning Representation Graph
2	Research On Graph Model-based Short Text Classification Algorithm
3	Research And Application Of Multi-document Extractive Summarization
4	Research On Text Classification And Its Related Technologies
5	Multi-Document Automatic Summarization Based On The Term-Sentences—Document Tri-layer Graph Model
6	The Research And Implementation Of Single-document Chinese Text Summarization System
7	Multi-document summarization using concept chain graphs
8	Research On Multi-label Text Classification Based On Improved Seq2seq Model
9	Study On Chinese Text Automatic Summarization Based On Concept Extension And Integrated Evaluation Method
10	Research On The Construction And Application Of Event-Oriented Text Representation Model