Research On Extract Summary Based On Document Multi-dimensional Feature Integration

Posted on:2021-02-14

Degree:Master

Type:Thesis

Country:China

Candidate:L Shen

Full Text:PDF

GTID:2428330629486198

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the advent of the current Internet 5G era,the text data such as news,comments and literature are growing explosively,and people have to spend a lot of time to find the information they need,so it is urgent to extract the effective summary of these massive texts.The use of computer to carry out automatic text summary is one of the effective means to solve this problem.The essence of abstract is the understanding of document semantics.Therefore,this paper conducts a series of researches on how to improve the quality of abstract by utilizing the deep semantic features of documents and presents an extract summary method based on document multi-dimensional feature integration.The main works are as follows:(1)In order to solve the problem of using Heuristic and shallow semantic features in extract summary,a representation model based on multidimensional semantics of documents is proposed.The importance of sentences in documents is closely related to the semantics of documents,which are represented differently in different dimensions.The model presented in this paper constructs the semantic representation of a document from the topic,granularity,and context of the document.Specifically,the LDA model is firstly used to analyze the theme of the document and generate corresponding theme words,and then the affective preference analysis is carried out to avoid function words affecting the document theme.Then the division of different granularity based on document,to build the document through the CNN layer semantic representation of words,sentences and paragraphs,which can effectively reflect the document between the different levels of hierarchy,finally through the Bi-LSTM layer to build relationships within the context of the sentence in the document characteristic,and on the different dimensions of deep semantic characteristics,said the document preparation for the follow-up of the extraction.(2)For the problem that the grading and extraction of sentences are separated into two parts in the process of abstract generation and the method of redundancy judgment is too single,an extraction summary model based on redundancy control is proposed.The traditional method to eliminate redundancy is to calculate the similarity of two sentences directly.If the similarity is larger than a threshold,one is discarded randomly,which may cause the loss of information and inaccurate summary.Under the constraints of redundancy and diversity,the model proposed in this paper evaluates and extracts sentences at the same time,and sorts the extracted sentences in order of importance,so as to reduce the redundancy of documents as much as possible on the premise of maximizing the retention of document semantics.Finally,this paper selected the data set of LCSTS short essays as experimental data,and used Rouge-1,Rouge-2 and Rouge-L as quality evaluation criteria for the generated abstract and compared it with the Text Rank and RNN and subject-based extraction methods.The experimental results show that the proposed abstractions summary model based on multi-dimensional feature fusion of documents can effectively express the semantics of documents deeply and well control the redundancy of documents,thus verifying the effectiveness of the proposed model in automatic abstractions.

Keywords/Search Tags:

Automatic summary, topic model, semantic features, Multidimensional features

PDF Full Text Request

Related items

1	Topic Discovery From Social Network Texts With Heterogeneous Semantic Features
2	Keyphrase Extraction Using LDA Topic Models
3	Automatic Summarization For Chinese Text Based On Sub Topic Partition And Entence Features
4	Research On Semantic Reinforcement Based On Topic And Word Features For RNN Language Model
5	Research And Realization Of Web Information Mining Model Based On Topic Features
6	Establishing Of The Car Styling Features Database Base On Mapping Between Features And Semantic
7	Medical Image Retrieval Based On Low Level Features And Semantic Features
8	Research On Multi-features Link Prediction Based On Matrix
9	Research On Multi-Features Link Prediction Based On Matrix
10	Research And Application Of Text Classification Model Combining Character Features And Topic Features