Font Size: a A A

Internet Public Opinion Mining Based On Hidden Markov Model

Posted on:2013-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:S S ZhouFull Text:PDF
GTID:2218330371498983Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Due to the rapid development of the Internet, dimensions of Netizen and new media increases ceaselessly and rapidly, which made the network public opinion become a powerful force and reaction to the society hot spot event. If these did not guide well, they will formed a larger threat on the social public security. Mining the Network public opinion is helpful in maintaining the social stability, promoting development and constructing a harmonious network. Network public opinion mining includes public opinion collection, topic detection, analysis and prediction. Among these three steps, public opinion collection is the most important, because only can we collect the real, reliable, complete network public opinion data timely, the in-depth analysis could be real and reliable, and have practical significance. The major study of this Master's thesis is the collection technology of network public opinion. Base on the analysis the characteristics of network public opinion and its source, this paper put forward the improvement method to problems that the existing collection technology have. Through literature study, recognize the Hidden Markov Model has certain feasibility to apply to the network public opinion collection.Firstly, the network public opinion collection is to collect the Netizens'comments about the same theme events, so the focused crawler can realize the collection. The focused crawler can effectively capture related webpage by predicting and extracting the URL. But the existing focused crawler can't meet the analysis requirements in timeliness and integrity. We analyzed the existing HMM crawler, aiming at their shortages proposed the improvement method from the webpage clustering strategy of training set, topic relevancy recognition algorithm and HMM modeling way three aspects, in order to improve the performance of HMM crawler.Secondly, the habitat of network public opinion is the new network carriers such as, micro-blog, blog, BBS and news comment. These mostly adopt asynchronous interactive AJAX technology to enhance the users' experience, which made the traditional crawlers cannot collect the dynamic information, and greatly reduce the coverage of network public opinion collection. In order to solve this problem, this article added the AJAX page crawl unit into the HMM crawler, which can complete the AJAX page collection work.Finally, we studied the open source Nutch system, combined Nutch system and our crawler which called AHHMCrawler (AJAX HMM Crawler), replaced the original crawler of Nucth system, built the experimental model, lists the experimental environment and detailed experimental steps, and did the experiments. The results of experiments verify the theoretical accuracy and validity, prove that the designed crawler not only has important theoretical value, but also has wide application prospect.
Keywords/Search Tags:HMM, AJAX, Collection of public opinion on Internet, AHMMCrawler
PDF Full Text Request
Related items