Font Size: a A A

Research And Application Of Multi Document Summarization Based On Big Data

Posted on:2018-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2348330542474246Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the coming of knowledge economy era,People put forward higher and higher requirements for the speed and quality of obtaining information,at this time,multi-document summarization technology emerged.It becomes a hot research topic in the field of Natural Language Processing.But with the explosive growth of network information,the processing performance of traditional multi-document summarization technology is unable to meet the actual needs.Therefore,this paper combines the Hadoop platform with the traditional document extraction technology,and proposes a ranking strategy based on the adjacency degree between sentences.It aims to design an efficient,scalable,high quality summary generation system.The main work includes the following three aspects:(1)Research onsummary extraction technology and parallelization.The purpose of summaryextraction is to extract the text or paragraphs with wide coverage and low redundancy,which affects the summary quality of the whole system.In this paper,the traditional IF*ISF algorithm is combined with the MapReduce model,and the parallel sentence feature modeling algorithm is designed.On this basis,the graph ranking algorithm is also implemented in parallel.Finally,the redundancy removal module is designed to solve the problem of information redundancy.The experimental results show that the algorithm has good scalability and speedup.(2)Research on sentence orderingtechnology.The sequence of extracted sentencesaffects the acceptability of the generatingsummary.In order to conform with the text logic,we focus on the topic document collection as the basis for ranking,and learn the adjacency degree of sentence pairs by conditional entropy and context.In the end,we Combine with the maximum weight reduction ordering algorithm to obtain the final results.The experimental results show that the proposed algorithm provides a higher precision than former sentence ordering strategies.(3)Application of multi-document summarization technology in enterprise public opinion monitoring system.The system crawls the Internet related information of enterprise through the crawler and generates hot topics.Helping enterprises to deal with the negative impact of unexpected events.Multi document summarization can shorten the time for users to get information by compressing the similar document collections,and helps companies to make quick decisions.
Keywords/Search Tags:Big Data, MapReduce, multi-document summarization, Graph Ranking extraction algorithm, sentence adjacencyRanking algorithm
PDF Full Text Request
Related items