Font Size: a A A

Research Of Data Acquisition And Filtering Technology Based On Weibo

Posted on:2017-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:B G LiFull Text:PDF
GTID:2348330482984341Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of social networking platforms, people depend more and more on Internet social platform to send the message and express emotions, such as Weibo or WeChat. Moreover all kinds of social issues have been exposed and spread quickly by the public opinion promotion of the network platform and they will become hot spots. Because Sina Weibo platform has the characteristics of real-time and fission on spreading the message, it in recent years has been gradually become an important platform for the dispersal of big events and the outbreak of the public opinion in China. Opinion leaders,as the key of public opinion,have a huge number of fans and a high-profile status. Their celebrity and influence are amplified in the events of public opinion. Their posts or reposts are more likely to become hot topics and lead to the high tide of public discussion. So getting microblog data of opinion leaders in real time not only can guarantee the data of topic clustering analysis, but also is one of important methods for public opinion analysis.At present there are two major methods of opinion leaders’ microblog data acquisition.(1)Sina API: It can quickly and easily get messages, but its interfaces make different restrictions to different levels of authorized users on the acquisition rate. It can only obtain limited data, so it is difficult to guarantee the accuracy of public opinion analysis.(2)Directional Web crawler technology: Through repeatedly reading the opinion leaders’ URL list, Web crawler can get more complete data, but it is hard to analyze public opinion information in real time. In addition, visiting a lot of opinion leaders within a short time will face the danger of closing Sina Weibo account.This thesis analyzes the crawling current status and background of Weibo research. Based on this, a new microblog data acquisition method based on follow-group mode is proposed by making full use of Weibo working features. It can receive opinion leaders’ microblog message list, which system pushes automatically. The crawled Weibo data based on the new method is divided into two types by using the SVM classification algorithm. Entertainment information is removed and the social microblogging data is reserved so as to realize the preliminary filtering of Weibo data. Data acquisition and preliminary filtration system based on Weibo is designed in this thesis.Finally, the thesis has carried on the experiment of analysis and comparison to three methods of data acquisition, which are the traditional method of API, directional Web crawler and the data acquisition method based on follow-group mode this thesis proposed. By experimental analysis, the method of follow-group is practical and feasible. It can guarantee the integrity and real-time performance of microblog data. At the same time, the data is preliminary filtered based on the SVM classification algorithm can achieve a good classification effect. In general, the whole system can achieve a good performance to provide a comprehensive and accurate data support for analysis of public opinion.
Keywords/Search Tags:data acquisition, follow-group mode, filtration
PDF Full Text Request
Related items