Font Size: a A A

Network Hot News Event Mining And Tracking Analysis Methods

Posted on:2011-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:L H LiuFull Text:PDF
GTID:2208360305493584Subject:Software engineering
Abstract/Summary:PDF Full Text Request
We will construct a system which integrates information acquiring, extraction, mining and visualization. We gain web information automatically and use text classification and clustering for recommendation. In our system, we simultaneously use classifying and clustering to enhance the precision and recall. We can generate clustering labels to describe each information cluster which can help people to comprehend information. Finally we will display our result in a web page and illustrate the evolution of information in a line chart.Web spider is improved to fit into timed and orient capture of information, which speeds up the update process. At the same time, the amount of information is not compromised. As of the gathered info, we sort first and then clutter so as to avoid distance from news of the same kind and the noise of news cluttering. In the process, the smoothness of the word frequency, compression of vector space and improvement of KMedoids method all contribute to higher efficiency, accuracy and the recall rate. The accuracy and the recall rate are not lower than 70%, which reduces the probability of invalid data. The degree of accuracy reaches 86%. The system would produce cluster labels to mark the meaning of the particular news, the accuracy being 92.5%. Finally, we demonstrate the evolution of info to the users in the form of website UI together with various figures. By integrating the key technique of Web processing, we construct a small system based on real data to offer the service of hot news chase. In the practical observation in daily life, the product of the system is reliable.This system serves purposes as follows:First, the users have an easier access to what they like to browse. Second, the constructors could design the website better based on users' preference. Last, it facilitates the study of evolution and surveillance of internet. In all, it's an interesting attempt to integrate and analyze the hot news. In a word, our work is a great try for information integration in new era and it can work efficiently.
Keywords/Search Tags:Web Crawler, Main Content Extraction, Hot News Analysis, KNN Classification, KMedoids Cluster
PDF Full Text Request
Related items