Font Size: a A A

The Design And Implementation Of The Data Acquisition And Analysis System For Micro-Blog

Posted on:2014-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:D FengFull Text:PDF
GTID:2248330398970735Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rise of social networking, the Micro-Blog has become the most important places for people to interact with each other. With the use of Micro-Blog, everyone can share their opinions with anyone else, but this also causes the phenomenon of massive information and the characteristics of the fragmentation. In this paper, we design the data acquisition and analysis system based on these characteristics of Micro-Blog. Our main work is to determine the user’s authority with the data that collected from the Micro-Blog, and after that, we excavate the hot micro-blogs and the hot words on the internet. Specifically, following are the key work we have done:Ⅰ. After we did some researches on the design and application of web crawler, we designed a new system to collect the web information, which can create different kinds of crawlers, so it allows the researchers to use this system to crawl any information they need. On the one hand, we use multi-thread technology to dramatically improve the efficiency of the crawlers. On the other hand, to break the API restrictions rules of Sina, we design the multi-user authorization mechanism to ensure the uninterrupted work of the crawlers. The experiment results showed that the system could acquire3,000,000relations between Micro-Blog users within three continuous day;Ⅱ. After the depth analysis of the characteristics of the Micro-Blog users’ network and traditional network node evaluation algorithm, we came up with two new concept called "relative authority of users" and "the user vitality". And we use these two concepts to complete the evaluation of the importance of Micro-Blog users. The experiments show that the evaluation results of the new algorithm is better than traditional algorithms which have improved more than20%. Besides that, the evaluation results are more reasonable and more in line with the actual situation;III. Proposed a method to evaluate the hotness of one micro-blog. This method is based on the forwarding and comments which are the basic action on the internet, and this ensures the accuracy of the evaluation. In addition, we use the degree of discussions tree layers to adjust the users’authority, which makes the evaluation more realistic. After we complete the calculation of the hotness of the Micro-Blog, we use the text processing method to extraction the buzzwords.Finally, the system will be a collection of micro-blog data acquisition, micro-blog user authority evaluation and micro-blog hot content found in an integrated software. Because the software is the data real-time update, researchers can use this software for micro-blog data query, micro-blog user authority query; General users can also through this software to check the current micro-blog in popular content.
Keywords/Search Tags:Micro-Blog, Information Collection System, RelativeImportance, User’s Activeness, Popular Content Extraction
PDF Full Text Request
Related items