Research On Document Summarization Algorithms And Their Applications

Posted on:2012-06-30

Degree:Doctor

Type:Dissertation

Country:China

Candidate:F Jin

Full Text:PDF

GTID:1118330362467997

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Document summarization is one of the possible ways to solve theinformation overload problem caused by "information explosion". It can also beused to produce concise text for widely used portable devices with small andlow resolution screens, such as cell phones, to help mitigate the inconvenienceof reading large sections of text. In this thesis, we focus on some key issues indocument summarization, including the ranking and selection problems of textunits, structure-aware summary generation, sentence compression and gene textsummarization. The major contributions of this thesis include:Firstly, we present a systematic comparative study on the two key problemsin extractive summarization methods: the ranking and selection of text units.Experimental results on standard datasets show the superiority of pairwse andlistwise learning to rank methods, as well as the Integer Linear Programming(ILP) based selection strategies. Then a joint learning framework forsummarization which combines generalized perceptron learning and ILP isproposed. And experimental results show the effectiveness of this approach. Weproceed to study the performance upper bound of extractive methods.Secondly, we present a novel diversification framework to generatestructure-aware summaries with the guidance of pre-defined aspects. Theproposed framework attempts to maximize the expected satisfaction of allaspects during summary generation. The aspects and the given documentcollection are modeled through Labeled Latent Dirichlet Allocation (LDA).Then the importance of each aspect and the relevance of each sentence to eachaspect are calculated based on probabilistic inference.Thirdly, we propose a Markov Logic Network based sentence compressionmethod, which compresses English sentences by deleting unimportant words.Through first order logic formulae, the method is able to incorporate locallinguistic features and capture global dependencies between word deletionoperations to determine whether a word should be removed. Experimentalresults on both written and spoken news corpora show that the proposed approach outperforms two state-of-the-art methods.Finally, a gene text summarization system GeneSum which automaticallyextracts representative sentences from biomedical documents is presented. Thesystem employs ListNet learning to rank method and incorporates various textfeatures and biological data resources to rank sentences, and then an IntegerLinear Programming based method is used to choose salient sentences toconstruct a summary. We evaluate the system on a large dataset of7,294genesand conduct in-depth analysis of the test results.

Keywords/Search Tags:

document summarization, learning to rank, structure-awaresummarization, sentence compression, gene text summarization

PDF Full Text Request

Related items

1	Research On Deep Neural Networks Based Automatic Text Summarization
2	The Approach For Event-based Multi-document Automatic Summarization
3	Joint Scoring Automatic Text Summarization Generation Based On TextRank Algorithm
4	Research On Sentence Compression And Its Application
5	Research On Extractive Multi-document Summarization Using Supervised Deep Learning
6	Chinese Query-Focused Multi-document Summarization Based On Cloud Model
7	Automatic Summarization Of Multimedia Information And Related Technology Research,
8	Research And Application Of Multi-document Automatic Summarization
9	Research On Generation Method Of Evolutionary Multi-document Summarization Based On Sub-topic Enhancement
10	Research Of Web Multi-document Automatic Summarization