Font Size: a A A

Research And Implementation On Hot Topic Of Chinese Webpage Retrieving System

Posted on:2010-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:L W HaoFull Text:PDF
GTID:2178360302961990Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology, the network has become the important platform to express public opinion. To maintain social stability and harmonious, the question arising for relevant government departments is how to find out hot spots in time and how to correctly lead the public opinion. So, it is of great significance to study the net public opinion.The thesis focuses on the internet public opinion mining techniques. Our work is as follow:(1) Webpage information collecting:This thesis studies the hypertext transfer protocol and the hypertext markup language in the process of network transmission, and fulfills the extraction of the webpage's titles by filtering the IP packets.(2) Chinese word segment:In this paper, through research on the characteristics of webpage structure, we use a Chinese word segment based on noun, and express the result as the form of digital sequences so as to earn high efficiency and low memory costs.(3) Frequent itemsets mining over the data streams:According to the features of the data streams which is limitless and mobility, an algorithm called FIM-SW is proposed to mine the frequent itemsets over the sliding window. The vertical database representation is adopted in the proposed algorithm, each item is represented by bitvector, and the Apriori property is used to get frequent itemsets.The experimental results show that it improves the efficiency for mining observably.Based on those studies above, we implement a hot topic of Chinese webpage retrieving system, which includes modules for webpage topic collecting, Chinese word segment, and hot topic counting. The experiment shows that the system can detect the hot topic in the network data stream. In addition, we find and analysis the effects of different parameters on system performance through the tests and provide the basis for obtaining the optimal performance.
Keywords/Search Tags:net public opinion, data stream mining, frequent itemset, sliding window, Chinese word segment
PDF Full Text Request
Related items