Font Size: a A A

Analysis And Design Of Micro-blog Public Opinion Collection System Based On Node Crawler

Posted on:2019-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LiFull Text:PDF
GTID:2428330548987726Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Hundreds of millions of information is increasing every day on the Internet platform.The way of public opinion's transformation from traditional carrier to network.And the arrival of the Web3.0 era,the main communication site of the network public opinion from the original big news websites,into the forum,micro-blog,blog,post-bar-based social network.This kind of network becomevery popular with the Internet users.Micro-blog users can publish their views anonymously on issues such as political and economic education,and discuss with other users on micro-blog.Because the information on the network can be published anonymously,and the communication is rapid and so on,the network public opinion often has the characteristics of deviant and sudden.If it does not carry on the guidance of network public opinion and control,it may cause adverse effects.Therefore,it is necessary to monitor and manage the network public opinion.In this paper,a micro-blog public opinion collection system is designed based on Sina micro-blog,which has a large number of users and a large amount of information.a web crawler is specially designed to be used as information collection,and the captured data is analyzed by information extraction,feature extraction,text segmentation,clustering and other techniques,and the final results are displayed in the form of web pages.Users.The research work is as follows:1.According to the characteristics of network public opinion,it is speculated that the users of the monitoring system tend to pay more attention to the public opinion in a certain field.The design information acquisition module uses a subject crawler,which can be used to carry out data capture and analysis in the field of interest.2,for micro-blog website to verify the user login,comment information asynchronous loading difficult to obtain and other characteristics,based on node asynchronous loading,friendly operation of the browser and other characteristics,designed a user to simulate user operation,asynchronous access to page information crawler program,Sina micro-blog information to grab.To extract the crawled web pages,and use the Chinese word segmentation technology,feature extraction algorithm(TF-IDF)and clustering algorithm BIRCH to process and analyze the hot topics and comments on information.
Keywords/Search Tags:Internet public opinion, micro-blog,crawler, information collection, cluster analysis
PDF Full Text Request
Related items