Font Size: a A A

Research And Implementation On Chinese Web Pages Summarization

Posted on:2008-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y HanFull Text:PDF
GTID:2178360212476077Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet makes huge information online, and 90% of the information exists in the forms of plain texts. This greatly stimulates the development of natural language processing technology. NLP has become a hot focus for many scholars. Meanwhile, there is also higher demand for the technology behind: People want to acquire information they need swiftly and precisely from the huge information online.This paper focuses on automatic web page summarization technology and aims at providing users with succinct and precise information in the form of summaries. The paper categorizes summarization into two classes; one is the multi-document web page summarization with one topic. In order to remove the redundant information in these pages, we put forward a clustering based summarization method. The experiments show that the algorithm is superior to other clustering ones in performance and can eliminate web page redundant information a lot. The other class of summarization targets at the news of breaking events. With this kind of web pages, we adopt a template based event summarization...
Keywords/Search Tags:Multi-document Summarization, Clustering, K-Means algorithm, Event Summarization, Template, Information Fusion
PDF Full Text Request
Related items