Font Size: a A A

Scalable Multi-Document Summarization Using Natural Language Processing

Posted on:2015-12-25Degree:M.SType:Thesis
University:Rochester Institute of TechnologyCandidate:Prabhala, BhargavFull Text:PDF
GTID:2478390017495540Subject:Computer Science
Abstract/Summary:
In this age of the Internet, Natural Language Processing (NLP) techniques are the key sources for providing information required by users. However, with the extensive usage of available data, a secondary level of wrappers that interact with NLP tools have become necessary. These tools must extract a concise summary from the primary data set retrieved. The main reason for using text summarization techniques is to obtain this secondary level of information. Text summarization using NLP techniques is an interesting area of research with various implications for information retrieval.;This report deals with the use of Latent Semantic Analysis (LSA) for generic text summarization and compares it with other models available. It proposes text summarization using LSA in conjunction with open-source NLP frameworks such as Mahout and Lucene. The LSA algorithm can be scaled to multiple large-sized documents using these frame-works. The performance of this algorithm is then compared with other models commonly used for summarization and Recall-Oriented Understudy of Gisting Evaluation (ROUGE) scores. This project implements a text summarization framework, which uses available open-source tools and cloud resources to summarize documents from many languages such as, in the case of this study, English and Hindi.
Keywords/Search Tags:Summarization, NLP
Related items