Font Size: a A A

Data Acquisition Technology Of Microblogging Public Opinion Research System

Posted on:2015-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:T G LvFull Text:PDF
GTID:2298330431487547Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet, more and more information ispublished to the Web. Network information reflected the intentions of people has beenaffecting as well as inciting Internet users. Thus, network consensus is receiving anincreasing popularity at present. Government departments should predict, detect anddredge public opinions in order to develop a healthy Internet environment. Because ofthe rapid development of microblogging, more public opinion events have exposed onTwitter for the first time. Microblogging is playing a critical role both in governmentdepartments and enterprises.Our paper analyzes and researches problems of data collection from Microblogging, and raises a method that is the same as page login to solve the problems.Then, we use priority queuing method to capture more influential users of microblog.Firstly, we carefully analyze the current ways such as Web crawlers and methodbased on microblog API used to crawl data from web, and find that those two methodscould not meet the demand of current public opinion system both in size and real-timerequirements. Therefore, we propose ways simulating browser login to crawl the pagedata with easy access and high speed to get data from any microblog users.Secondly, we take big data problems into account and build microblog usernetworks to solve the problem. We have built huge microblog user networks byabstracting microblog user to a point, and fans, attention, forward and reviews to aline between two points, which would help us to discover new microblog users andensure data integrity.Finally, we efficiently get data from the web by using priority queue algorithms.Efficient data collection means that we collect data according to web user influencethat we firstly collect data from the high influence user, and then collect the data fromthe smaller influence users. This paper uses calculation model for the priority.Influence was calculated according to user’ fans number, attention, activity,communication and timestamp. We also collect data from non-active users bycalculating time intervals. In order to effectively analyze the collected Web pages, wedesign parsing program that can posit information directly through the characteristicvalue without resolving the label. With the "clean" microblog data, we obtained someinteresting information after simple analyses.Experimental results show that the method not only is versatile and complete without manual intervention, but also can obtain high quality data with higher speed.
Keywords/Search Tags:Microblogging Data, Analog Login, User Network, User Influence, Priority Queue
PDF Full Text Request
Related items