Font Size: a A A

The Analysis Of Hot Topic Diffusion On The Internet

Posted on:2014-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:J LvFull Text:PDF
GTID:2248330398472192Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of Web2.0, the Internet has become the main channel for people to obtain information. While the information on the Internet presents a trend of explosive growth, how to obtain the core information on the Internet and grasp meaningful information accurately puts forward a problem to every people. This paper focuses on the analysis of the web-information diffusion, as well as the key factors which promote the web-information’s spread to the whole network and finally evolving to the hot topics.First of all, in this paper we present an incremental clustering algorithm based on both Single-pass and K-means method, which not only clusters the dynamic updating web-information effectively, but also eliminate the defects of K-means method. The algorithm convert the problem of setting the value of K and choosing the initial clustering center in K-means algorithm into that of judging the similarity between the documents and classes. The value of the similarity threshold can be adjusted through experiments. With the more accurate results and lower cost, the algorithm becomes more adaptive to solving the real problem. Furthermore, the algorithm takes account of the infinity of the web-information and problem of the topic excursion.In the detection of hot topics, this paper takes into account two aspects. One side is media attention of the news, including the number, the duration, the velocity and the range; the other is user behavior including clicks, reading speed, comment number, aggregate number. We define topic attention degree, which can help us get the real-time hot topics. The topic diffusion model aims to analyzing the tail of hot topics on the Internet. In this paper, we build a model considering the activity of discrete time node. Based on the compare of the content’s similarity between web pages, we detect the relationship between the documents and the diffusion tail. Finally, the information sources are found and the diffusion map is drawn. On one hand, based on the hot-discussed concept of "water army", we introduce the concept of "water topic", and indentify the phenomenon of pure pursuit of commercial interests and speculation topics with the help of the diffusion map. On the other hand, by analyzing the different diffusion of the hot topics with different background and pattern, we explore the external environmental effects, such as site activity, site contribution, duration time, geography region and media-type of topic source, on topic diffusion tail. The results show that the model proposed reflects the dynamic characteristics of topic diffusion process effectively. It points out the key factors which affect the topic diffusion tail, and is a basis of estimation of diffusion’s state and trends.The model in this paper and the diffusion map contributes to helping the public grasp current hot topics, understand the emergence of popular opinion, as well as the diffusion process and pattern. Thus, people may be more rational, and departments related can get some supports of data and technology for the construction and management of the Internet.
Keywords/Search Tags:public opinion, hot topic, cluster, topic diffusion
PDF Full Text Request
Related items