Font Size: a A A

Distributional semantics for robust automatic summarization

Posted on:2015-11-11Degree:Ph.DType:Dissertation
University:University of Toronto (Canada)Candidate:Cheung, Jackie Chi KitFull Text:PDF
GTID:1478390017993775Subject:Computer Science
Abstract/Summary:
Large text collections are an important resource of information about the world, containing everything from movie reviews and research papers to news articles about current events. Yet the sheer size of such collections presents a challenge for applications to make sense of this data and present it to users.Automatic summarization is one potential solution which aims to shorten one or more source documents while retaining the important information. Summarization is a complex task that requires inferences about the form and content of the summary using a semantic model.;This dissertation examines the feasibility of distributional semantics as the core semantic representation for automatic summarization. In distributional semantics, the meanings of words and phrases are modelled by the contexts in which they appear. These models are easy to train and have found successful applications, but they have until recently not been seriously considered as contenders to support semantic inference for complex NLP tasks such as summarization because of a lack of evaluation methods that would demonstrate their benefit.;I argue that current automatic summarization systems avoid relying on semantic analysis by focusing instead on replicating the source text to be summarized, but that substantial progress will not be possible without semantic analysis and domain knowledge acquisition. To overcome these problems, I propose an evaluation framework for distributional semantics based on first principles about the role of a semantic formalism in supporting inference. My experiments show that current distributional semantic approaches can support semantic inference at a phrasal level invariant to the constituent syntactic constructions better than a word overlap baseline.;Then, I present a novel technique to embed distributional semantic vectors into a generative probabilistic model for domain modelling. This model achieves state-of-the-art results in slot induction, which also translates into better summarization performance. Finally, I introduce a text-to-text generation technique called sentence enhancement that combines parts of heterogeneous source text sentences into a novel sentence, resulting in more informative and grammatical summary sentences than a previous sentence fusion approach. The success of this approach relies crucially on distributional semantics in order to determine which parts may be combined.;These results lay the groundwork for the development of future distributional semantic models, and demonstrate their utility in determining the form and content of automatic summaries.
Keywords/Search Tags:Semantic, Distributional, Automatic
Related items