Font Size: a A A

Design And Implementation Of Microblog Collection System Based On HITS Algorithm

Posted on:2019-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:J X QiaoFull Text:PDF
GTID:2438330545993141Subject:Engineering
Abstract/Summary:PDF Full Text Request
Micro-blog is an abbreviation for microblog,and it is a form of blog.It is a broadcast type of social networking that shares short and instant information through the attention relationship between users.Weibo has experienced explosive development in recent years and has gradually become one of the most popular social sites on the Internet.As of September 2017,Sina Weibo's monthly active users have reached 376 million,and the number of active users per day has reached 165 million.Weibo's active users continue to grow steadily.At present,the influence of Weibo's network is increasing.Weibo has been opened by governments,enterprises,schools,stars,and even major news media sites.More and more people are participating in the process,so that a large amount of new information is generated on Weibo every day.In order to make full use of the massive microblog information and tap the potential value of microblog,microblog information collected by key users of microblog information,especially those users with a large number of fans who have high influence in the microblog network,is on the Internet.The public opinion analysis is very necessary.Therefore,this paper is devoted to the study of the collection of microblog information and the analysis of the influence of microblog users,and a microblog acquisition system based on Hits algorithm is designed and implemented.The main functions of the system are keyword-based collection of Weibo content and the ranking of the collection results according to the influence of the Weibo users,and then presented to the user.The main work of this article includes the following aspects:(1)Reading a large number of documents and related materials and making a preliminary understanding of the current research status of Weibo's microblog information collection and result sorting algorithm.Based on grasping the research background and significance of microblog information collection and sequencing,The requirements analysis of the system is determined and two functions that the system needs to implement are formulated: information collection,collection of results,and learning of related technologies,including webpage information collection technology,API interface calling technology,webpage sorting algorithm,and the like.(2)In this paper,the web link analysis algorithm Hits algorithm is applied to the calculation of microblog user influence,and the relationship between attention and concerns of microblog users is regarded as the connection relationship between webpages and combined with microblog users.The characteristics of it are improved and a microblog user influence evaluation algorithm based on Hits algorithm is proposed.The improved algorithm can better improve the accuracy of the sorting result.(3)Based on the analysis of system requirements,various functional modules of the microblog information collection system based on the Hits algorithm are designed,including the microblog content acquisition module,the user information acquisition module,the user relationship acquisition module,and the improved Hits algorithm.Acquisition result sorting module.Specifically,the microblog content acquisition module mainly implements a keyword-based microblog content collection function;the user information acquisition module collects information of each user based on the user name of the previous step acquisition result,including the number of the user's friends.,number of fans,number of microblogs,etc.;user relationship module is mainly to collect the fan correspondence relationship between users;the acquisition result sorting module is mainly to perform data analysis on the collection results,calculate the user's influence and according to the user's influence.The size sorts the collection results and finally appears on the system interface.Finally,the database table is designed to ensure the integrity and accuracy of the stored data.(4)In the system level design part,based on the idea of improving cohesion and reducing coupling,the system uses the architecture and is divided into three layers: data access layer,domain layer,and presentation layer;in addition,the database version used by this system is Microsoft SQL.Server 2008,development environment for Visual Studio 2010.Finally,through the test,it is found that the design system of this paper can complete the data collection function better,and can provide more accurate sorting results.
Keywords/Search Tags:micro-blog information collection, Hits algorithm, influence of micro-blog user, Sorting results
PDF Full Text Request
Related items