Font Size: a A A

Design And Implementation Of News Hot Topic Discovery System Based On Multi-Class Text

Posted on:2019-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:D K HuaFull Text:PDF
GTID:2428330590975438Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous popularization and development of the Internet,the information provided by platforms such as news portals,forums and Weibo has been increasingly enriched.However,with all kinds of chinese data growing rapidly,people can not get the news in the vast amount of information.At the same time,it is difficult to effectively know the current topic for internet supervision department when facing such a huge amount of data and information flow.Therefore,hot topic discovery technology has become a hot issue that is generally concerned by current research.This article mainly aims at different types of Chinese News and designs and implements a news hotspots discovery system based on multi-class text according to category information,and its main work is as follows:(1)The basic framework and steps of web crawler are discussed in detail and data are crawled from Sohu,Wangyi,Tencent and ChinaNews.At the same time,data are tagged to construct the experimental corpus.The preprocessing of the news data is studied,and the word analysis technology is used to select and output the feature words of the news.(2)This thesis expounds the traditional method of calculating the weight of TF-IDF,and improves the TF-IDF with the structure and category attributes of the news.The clustering algorithm based on affinity propagation is studied,and the calculation method of similarity matrix is improved according to the time attribute of the news and the characteristics of the text.(3)The calculation method of the representative value of words is put forward,and the topic is described by the choice of the words in the top of the list.On the basis of topic detection,according to the three factors of news topic's time density,spatial density and cluster proportion,a calculation method of news topic heat is put forward.(4)A news hotspots discovery system based on multi-class text is implemented.The system includes data acquisition module,data preprocessing module,feature vector representa-tion module,topic detection module and hotspot discovery module.In order to verify the feasibility of the system,the performance evaluation and function test are carried out.In the hot spot discovery system,news is handled by a series of automatic processing.Th-e realization of this system not only greatly reduces the manpower and material resources to organize the news,but also saves a lot of valuable time for people to find hot news.Ordinary users can find hot topics of interest by the system,and thus have a better experience in obtaining information.The network supervision department can get the current public hot spot Through this system,so that it can better grasp the trend of public opinion.Therefore,this system and technology will produce great market and social value.
Keywords/Search Tags:Internet, Topic detection, Hot spot discovery, Feature vector representation, affinity propagation clustering
PDF Full Text Request
Related items