Font Size: a A A

The Research Of Automatic Single Text Summarization Based On Latent Semantic Analysis

Posted on:2012-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2218330338463054Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology, Internet becomes the main place for people to exchange information, but the explosive growth of information on Internet makes retrieval difficult. The summary of Information can improve the search and retrieval speed of Information, which is important for the users of Information and search engines, so that the automatic summarization techniques have been widely appreciated and studied. Automatic text summarization based on discourse structure developed rapidly in recent years. Latent semantic analysis is a text structure analysis method, and its core is the text generation mechanism model, that is the topic model. A good topic model should be able to grasp the topics of the text, the transition between the topics and to organize the right words into sentences under a topic. Furthermore, if the topic length is known, the summary of the contents is balance, that is important to single text summary. Polysemous words make word frequency inaccurate based on word shape. In order to get a good topic model word disambiguation is needed.The mainly work and contribution come out of the thesis are:1. Improves current word relatedness computing methods; puts forward the concept of sentence coherence and its measure for word disambiguation task; Achieves word disambiguation task based on WordNet.2. After word disambiguation, the sentence clustering is based on the sentence relatedness instead of sentence similarity enabling a more reasonable initialization of latent semantic analysis model.2. Proposes an extension model of HMM——DHMM, this is a dynamic Bayesian model which can analyze the semantic structure of a text and give the length of each topic to ensure the coverage, balance and diversity of the summary.
Keywords/Search Tags:word relatedness, word disambiguation, sentence relatedness, latent semantic analysis, topic model, HMM, automatic text summarization
PDF Full Text Request
Related items