Font Size: a A A

Multiple Documents Automatically Summary Based On Semantic Word Vector

Posted on:2019-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:2348330542998298Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Multi-document automatic summarization technology is an important research topic in the field of natural language processing.Its aim is to solve the information fragmentation and information redundancy by compressing the text information by using the relevant information extraction technology for multiple text documents,so that in the massive information For the user to get simple and readable text messages efficiently reduce the user's information load and improve the efficiency of the user to read the information.The traditional multi-document automatic summarization technology mostly adopts the important sentence with high relevance to the document as the abstract candidate,but in the process of generating the abstract,the semantic judgment between the words is missing while ignoring the difference of the same word order Different semantics.The development of deep learning technology provides a new theoretical foundation and technical support for the study of automatic summarization in the field of natural language processing.More and more scholars are beginning to pay attention to the automatic summarization and the realization of new technologies,and the natural language processing based on deep learning technology New breakthroughs have been made.This topic combines the traditional digest generation technology with the deep learning technology,integrates the improved method based on the semantic word vector in the traditional multi-document automatic digest implementation method,and the automatic digest system will be applied to a specific scenario,by input Semantic matching between the massive random documents and keywords input by the user to obtain the semantic-related document set,and based on this semantic-based multi-document automatic digest generation,making the resulting semantic more relevant and more redundant Low,the main research work is as follows:1)This topic will build a summary generation system for user query,combine document retrieval with ordinary abstract extraction system.This paper proposes a document matching method based on semantic keywords and proposes an improved semantic-based word vector in the implementation process.The keyword extraction method solves the lack of semantics in the keyword extraction process.2)This paper will study the method of document semantic matching for user query,and propose a document matching algorithm based on semantic vector package.3)This article will study the automatic summary generation method based on word vectors,propose an improved clustering method based on sentence vector packets to complete the clustering,and extract the abstract sentence based on the central topic sentence based on the user topic and sentence weights.Study the sentence redundancy by sentence vector packets.4)This topic will study the evaluation method of summary sentence extraction results,and evaluate the abstract sentences extracted by randomly selecting the test files of the whole network news corpora,and compare with other abstract extraction methods to analyze the evaluation effect of the system.
Keywords/Search Tags:multi-document auto-digest, keyword extraction, clustering, word vector, semantic vector
PDF Full Text Request
Related items