Font Size: a A A

The Research On Topic-oriented Multi-document Summarization

Posted on:2014-02-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:P LiFull Text:PDF
GTID:1228330392460339Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid rising of Mobile Internet, user always need retrieve useful information fromhuge data sets via Mobile device. This motivate the information service suppliers have the capabil-ities that can offer fast and deep mining on the huge data sets and then present useful informationto the user in a concise way. User can feed summarization service via Mobile device, then thisservice can extracts interesting information from multiple documents and present to the user ac-cording to the topics. High quality of automatic generated summary has well defined structure,good readability and can present the important context of the particular event according to thetopics. This advantage can save user’s browsing time and reduce the heavy burdens of readingmultiple documents from which use need digest complete information. Follow this trend, we ex-plore topic-oriented multi-document summarization.This paper form many innovated theories and approaches, including:1. We proposed a novel LDA based modeling process for capturing topics in multiple doc-uments. To quantitatively evaluate the effectiveness of LDA model, we implement a novelapproach that can generate templates for topic-oriented summarization with LDA model. Wefirst develop an entity-topic LDA model to simultaneously cluster both sentences and wordsinto topics. Then apply frequent subtree pattern mining on the dependency parse trees of theclustered and labeled sentences to discover sentence patterns that well represent the topics.To quantitatively evaluate the effectiveness of automatically generated templates, we use thegenerated templates to construct summaries for new Wikipedia entities.2. We propose an unsupervised approach to automatic generation of topic-oriented summariesfrom multiple documents. In this method, we propose an event-topic model which based onthe traditional LDA model. It can improve sentence clustering effectiveness via computingprobability distribution that words appears in both domain and specific news event. Then useextended LexRank algorithm to rank the sentences in each cluster and select representativesentences using Integer Linear Programming. The advantage of our approach is that it canunify clustering, ranking, selection component together.Also we proposed a new rule-based sentence compression algorithm which uses dependency tree can reduce the redundancy ef-fectively.3. We proposed a novel approach to automatic generation of topic-oriented summaries withnatural language generation model. We first extract important information items from depen-dency parser tree of the sentence, then generate new sentences with these information itemsusing English grammatical knowledge. With grammatical relation in dependency parser tree,we can translate the information item according to the input format of natural language gen-eration engine. Finally, we select topic-oriented sentences form generated sentence list withInteger Linear Programmer.4. We proposed cross collection topic aspect model to joint modeling topic and aspect. Thengenerating complementary summary by random walk on bipartite graph with iterative mutualreinforcement.Based on the proposed theories and methodologies above, we implement an topic-orientedsummarization system. Our system evaluation based on the TAC guided summarization task thatwe attend in recent two years and have good performance.
Keywords/Search Tags:Multi-Document Summarization, Topic Model, Dependency ParserTree, Integer Linear Programming, Recognized Text Entailment
PDF Full Text Request
Related items