Font Size: a A A

Research On Multi Lingual Multi-document Summarization Based On Determinantal Point Processes

Posted on:2019-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z ZhangFull Text:PDF
GTID:2348330545961548Subject:Intelligent Science and Technology
Abstract/Summary:PDF Full Text Request
The development of informatization greatly improves the working efficiency of users,but also makes the scale of information in the network greatly increase.There are many kinds of information in different languages and different styles in the network.Therefore,it is very important to compress the document set under the same topic in different languages into a pithy corresponding language summary,that contains the main information.The multi-lingual multi-document summarization provides an effective solution.This thesis chooses this as the main research object,with the aim of using unsupervised methods to generate summaries that are language-independent,accurate,informative and readable with limited corpus.This thesis proposes four issues in multi-lingual multi-document summarization,including language difference,less corpus,redundancy and diversity.With this goal in mind,topic diversity and syntactic diversity are defined.And summaries with both properties are called as diversity summaries.In this thesis,different methods are formulated to reduce the linguistic difference for different categories of languages,and the document is represented as the minimal semantic unit sequence set.On the basis,the hierarchical Latent Dirichlet Allocation and Determinantal Point Processes are chosen as the main research objects.Based on the former,digging the latent topic information of the document deeply,the hierarchical topic model features in multi-lingual environment are put forward,and the multi-feature fusion scheme is proposed for corpus in different fields to model the quality and similarity of sentences.In this thesis,we combine the two methods and propose the L-DPPs sampling algorithm based on sentence length,Sum-DPPs sampling algorithm for summarization,secondary sentence filtering algorithm and Topic-DPPs algorithm with minimum semantic unit as the basic unit to enhance the diversity of summarization.Finally,we propose an extensible Unsupervised language-Independent multi-Document Summarization framework called UIDS.And experiment on the corpus of MMS-2015,BIRNDL-17 and MSS-2017 summarization task verifies the effectiveness of proposed methods.
Keywords/Search Tags:determinantal point processes, hierarchical topic model, multi-lingual multi-document summarization, unsupervised learning
PDF Full Text Request
Related items