Font Size: a A A

Research On Text Summarization Based On Latent Semantic Analysis

Posted on:2015-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WangFull Text:PDF
GTID:2268330431454945Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The widespread use of the Internet has dramatically increased the amount of accessible information. With the help of search engine, the majority of irrelevant documents are filtered out, however, users still hesitate to determine which particular search result should navigate to. Text summarization is an effective way of tackling information overload. Nowadays, with the substantial increase of portal devices, the process of searching in a small screen often makes users feel overwhelmed. In order to display as more information as needed, text summarization systems are in urgent need.Latent Semantic Analysis is an algebraic method that extracts the hidden dimensions of the semantic representation of words and sentences. In this paper, we systematically demonstrate how to use LSA for text summarization. The main work of this paper are as follows:(1) We first give a framework of summarization algorithm that are based on topic model, then we put forward the concept of neighbor weight, and give a new weighting scheme. We verify this model through experiments and experimental results show the model can better reveal the latent semantic space.(2) In the latent space, we not only consider the relationship between latent concepts and sentences, but also consider the relationship between latent concepts and terms. We propose a comprehensive LSA-based text summarization algorithm that combines term description with sentence description for each topic. We also refine the sentences in order to get as less redundancy as possible.(3) We conduct experiments on DUC2002datasets and DUC2004datasets. On the summarization performance, our approach obtains higher ROUGE scores over the previous LSA-based methods and several current summarization methods.
Keywords/Search Tags:Text summarization, Topic Model, Latent Semantic Analysis, SingularValue Decomposition, Weighting Scheme
PDF Full Text Request
Related items