Font Size: a A A

Research On Classification And Automatic Summarization Of Web Information

Posted on:2007-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2178360212971594Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the inflation of web resources makes web user access useful information more and more difficultly. In order to resolve this problem, we need to collect, analyze, classify and value information in a dynamic circumstance to provide helpful information services.In view of the diversity and structure complexity of web information, this paper mainly research on classification and automatic summary generation for web documents from angle of theory and application. With MM and RMM algorithm to segment the text extract from web documents, and K-means clustering algorithm cooperating with Bayers classification algorithm to classify the information extract from web document, an automatic classifier for web documents of news is implemented. In this paper, the structure and ingredient of key word, and all hypertext tags are also analysed. Nine important tags are assigned with higher priority, and key words priority coefficient formula is designed. Then a score of sentence according to the position of the sentence is calculated. With the score of sentences and LUHN, LSA algorithms, we extract the summary from documents, and finally implements an automatic Web documents summary extraction system.According to experiment, after improvement of key words evaluation, the summary generation using LUHN and LSA algorithms can reach 70% of accuracy and recall rate, 72.5% using K-means algorithm, and 90% using Bayers algorithm.
Keywords/Search Tags:Data-Mining, Automatic Classification, Automatic Summarization, Thematic Words
PDF Full Text Request
Related items