Font Size: a A A

Study Of Multi-WebPages Automatic Abstracting Based On Latent Semantic Analysis

Posted on:2009-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y HeFull Text:PDF
GTID:2178360245467363Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with the rapid development of Internet, the update frequency of web resources is very quickly. However, we don't neglect the issue of the mass of information and the pinch of knowledge. How to get valuable information is becoming more and more important. At present, we have the tool of search engine, a basic method of access to information. But, the search engine often returns a large number of redundant information, and wastes a large amount of manpower and material resources when we read these redundant informations. So, the mulit-WebPages automatic abstracting comes. Multi-WebPages automatic abstracting technology is committed to give users a comprehensive information and concise documents directly. It improves the efficiency of access to information obviously.The goal of this paper is to study multi-WebPages automatic abstracting. We will mainly focus on the theory and technology of multi-WebPages automatic abstracting based on Latent Semantic Analysis (LSA). Based on the LSA theory, this thesis will partition the semantic paragraphs of multi-WebPages, cluster these sentences, generate the primary abstract and then obtain the final abstracting through impoving the primary abstract. First of all, a notion of multi-WebPages semantic paragraph, and a semantic paragraph partition algorithm based on LSA. The weight calculation algorithm is innovated to make the partition of semantic paragraph more efficiency. Then, We improve the K-Medoids cluster algorithm, and implement clustering of both paragraphs and sentences. We also improve the weight calculation of a sentence through considering the length of this sentence, the keyword's importance in this sentence, and whether or not these keywords appear in the title of Webpage, and compute the similar degree between sentences with the HowNet. After then, we analyze the main features of the multi-WebPages automatic abstracting system based on LSA, including the design of module, implementing methods and key technologies. Finally, We implement this system and experiment some results on this system. These results show that this multi-WebPages automatic abstracting is more consistent and comprehensive.
Keywords/Search Tags:Automatic Abstraction, Latent Semantic Analysis, Semantic Paragraph, Singular Value Decomposition, Weight Compute, Cluster Analysis
PDF Full Text Request
Related items