Font Size: a A A

Generating coherent extracts of single documents using latent semantic analysis

Posted on:2004-08-12Degree:M.ScType:Thesis
University:University of Toronto (Canada)Candidate:Miller, TristanFull Text:PDF
GTID:2468390011968963Subject:Computer Science
Abstract/Summary:
A major problem with automatically-produced summaries in general, and extracts in particular, is that the output text often lacks textual coherence. Our goal is to improve the textual coherence of automatically produced extracts. We developed and implemented an algorithm which builds an initial extract composed solely of topic sentences, and then recursively fills in the lacunae by providing linking material from the original text between semantically dissimilar sentences. Our summarizer differs in architecture from most others in that it measures semantic similarity with latent semantic analysis (LSA), a factor analysis technique based on the vector-space model of information retrieval. We believed that the deep semantic relations discovered by LSA would assist in the identification and correction of abrupt topic shifts in the summaries. However, our experiments did not show a statistically significant difference in the coherence of summaries produced by our system as compared with a non-LSA version.
Keywords/Search Tags:Extracts, Semantic, Summaries
Related items