Font Size: a A A

Research On Extract Summary Based On Document Multi-dimensional Feature Integration

Posted on:2021-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:L ShenFull Text:PDF
GTID:2428330629486198Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the current Internet 5G era,the text data such as news,comments and literature are growing explosively,and people have to spend a lot of time to find the information they need,so it is urgent to extract the effective summary of these massive texts.The use of computer to carry out automatic text summary is one of the effective means to solve this problem.The essence of abstract is the understanding of document semantics.Therefore,this paper conducts a series of researches on how to improve the quality of abstract by utilizing the deep semantic features of documents and presents an extract summary method based on document multi-dimensional feature integration.The main works are as follows:(1)In order to solve the problem of using Heuristic and shallow semantic features in extract summary,a representation model based on multidimensional semantics of documents is proposed.The importance of sentences in documents is closely related to the semantics of documents,which are represented differently in different dimensions.The model presented in this paper constructs the semantic representation of a document from the topic,granularity,and context of the document.Specifically,the LDA model is firstly used to analyze the theme of the document and generate corresponding theme words,and then the affective preference analysis is carried out to avoid function words affecting the document theme.Then the division of different granularity based on document,to build the document through the CNN layer semantic representation of words,sentences and paragraphs,which can effectively reflect the document between the different levels of hierarchy,finally through the Bi-LSTM layer to build relationships within the context of the sentence in the document characteristic,and on the different dimensions of deep semantic characteristics,said the document preparation for the follow-up of the extraction.(2)For the problem that the grading and extraction of sentences are separated into two parts in the process of abstract generation and the method of redundancy judgment is too single,an extraction summary model based on redundancy control is proposed.The traditional method to eliminate redundancy is to calculate the similarity of two sentences directly.If the similarity is larger than a threshold,one is discarded randomly,which may cause the loss of information and inaccurate summary.Under the constraints of redundancy and diversity,the model proposed in this paper evaluates and extracts sentences at the same time,and sorts the extracted sentences in order of importance,so as to reduce the redundancy of documents as much as possible on the premise of maximizing the retention of document semantics.Finally,this paper selected the data set of LCSTS short essays as experimental data,and used Rouge-1,Rouge-2 and Rouge-L as quality evaluation criteria for the generated abstract and compared it with the Text Rank and RNN and subject-based extraction methods.The experimental results show that the proposed abstractions summary model based on multi-dimensional feature fusion of documents can effectively express the semantics of documents deeply and well control the redundancy of documents,thus verifying the effectiveness of the proposed model in automatic abstractions.
Keywords/Search Tags:Automatic summary, topic model, semantic features, Multidimensional features
PDF Full Text Request
Related items