Font Size: a A A

Research On Document Summarization Based On LDA Model

Posted on:2016-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:J Q BianFull Text:PDF
GTID:2298330452465358Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the explosion of the web information, it has become more urgent to obtain therequired information efficiently.The aim of this paper is to design and realize an automaticsummarization for the documents downloaded from web pages. In this paper, it firstintroduces the conception of the summarization, the current methods and constituent parts.Then according to the advantages and disadvantages of the automatic summarizationbased on VSM(Vector Space Model)or the LexRank algorithm, this paper proposes a newmethod of based on LDA (Latent Dirichlet Allocation) Model.1) According to the analysis of the automatic summarization based on LDAModel,in the paper, based on LDA Model, a new method of sentence-ranking is proposed.In order to find the sentences which can covery more topic content, the method compulatesthe sentence-importance by the similarity between the sentences-topic-distribution andtopic-importance-distribution. Then the method selects the sentences orderly by sentenceimpotance and make up the summarization.2) According to the advantages and disadvantages of sentence compression methodsbased on probabilistic statistical models and syntactic-heuristics, a method for sentencecompression is proposed. Combining the syntactic-heuristics and constituent-significancetogether, the method removes the low significant constituents, thus, on the condition thatsemantic information of original sentences is not lost, the method improves thecompression ratio.3) An automatic summarization system is realized by programming. By conductingtests on DUC2006(standard data test set), the accuracy rate, recall rates and otherindicators of the system have been improved. Test results indicate that the methods ofsentence-ranking and sentence-compression can effectively improve the performance of theautomatic summarization system.
Keywords/Search Tags:LDA Model, constituents significance, sentence-ranking, sentencecompression, automatic summarization
PDF Full Text Request
Related items