Font Size: a A A

Internet News Hot Mining System Research And Implementation

Posted on:2011-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:W H PengFull Text:PDF
GTID:2178330338489607Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the emergence of the Internet and its rapid development in recent years, people began to enter the era of information explosion. Reading news via the Internet has become a more and more important means for people to read news. Faceing of such vast Internet news, how fast and accurate extract useful information from these data has turn to be people's urgent need and researchers'focus attention, especially in the most recent major events and hot spots.This paper realized the hot Internet news mining system. First it used the method of text classification to classify the news, and then use the topic detection and tracking technology to automaticly generate a range of topics which was represented by the title of news, the relevant word grop and the event trends. Using a method of calculating the scores of attention to rate the topic, this system at last represent the hottest topics to the users. In this way, users can easily choose a subject of particular areas according to their interest. This study covers the following aspects:(1) Analyzed the problems of feature selection in text classification, based on this type of feature it proposed a new feature selecting method which based on category aspect to improve the classification result.(2) With improving the traditional topic detection and tracking algorithm, a dual time-based windows, pool-type single-pass clustering hierarchical clustering combined with the second cluster of topic detection and tracking algorithm was proposed which introduce the time attenuation factor, incremental inverted document frequency and time-based similarity distance calculation, and using the updated event template method it is effective to deal with topic drift.(3) Combined with the proposed method, it design and implementation the Internet news hot tap system to find the most recent hot news and events.Compared the feature domain method with the mutual information, information gain, and CHI tests in the evaluation of text classification to prove its superiority. In the topic detection and tracking evaluation, it use three data sets, and do experiment to coparme this article's method which beased on time window and the two clustering method with the K-means, Single-Pass Clustering and Hierarchical Clustering. The experimental results show that the proposed method is superior to other methods and achieve the desired results.
Keywords/Search Tags:Topic Detection and Tracking, Text Classification, Feature Domain, Hierarchical Agglomerative Clustering, Single-Pass Clustering
PDF Full Text Request
Related items