Font Size: a A A

Research On Online Social Network Data Collection Strategy

Posted on:2021-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q W ZhaoFull Text:PDF
GTID:2428330602482627Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent decades,the Internet has experienced the changes of the times.Online social network(OSNs)has gradually changed people's face-to-face communication into online virtual way.The total number of registered users of OSNs is huge,people pay more and more attention to the research of this kind of network data.At the same time,because of the huge amount of data and the complex network structure,it costs a lot of human and material resources to study the whole OSNs.A reliable OSNs sampling algorithm can help researchers to collect small representative sample networks from the whole network,which is of great significance for OSNs data research.Existing network sampling algorithms,such as BFS,MHRW and so on,have been able to collect samples from large networks such as twitter.However,the degree of sample nodes collected by BFS is too high,and MHRW sampling is easy to fall into the well-connected sub areas.In view of the disadvantages of the existing OSNs sampling algorithm,which makes the collected sample network unable to reflect the original network characteristics,this paper focuses on the network sampling algorithm.The main research contents are as follows1.For the MHRW sampling method cannot collect nodes of low connected social networks,it is easy to fall into well connected sub areas during the sampling process,which leads to excessive sampling of some nodes,and the characteristics of the sample node set deviate greatly from the original.A multi hop unbiased vertex sampling algorithm(MJU)is proposed by adding hop parameters of node storage area,global node and storage area,which not only solves the sampling defect of MHRW,but also collects enough sample nodes with less sampling consumption.Finally,based on the data sets of Twitter and Epinions,we carry out a variety of algorithm sampling experiments to evaluate the network characteristics such as node update rate,sample network degree distribution and algorithm convergence.The experimental results show that mju sampling algorithm can collect samples close to the original network characteristics,and the small sample data collected has the best matching degree with the original network,which can accurately reflect the nature of the original network data2.Taking MJU algorithm as the core of the controller,an online social network crawler system is designed.In this paper,the framework and structure of the crawler system are introduced in detail,as well as the workflow of the crawler system for network data collection.Taking Zhihu network as an example,the URL manager of the controller is designed based on MJU sampling algorithm to determine the crawling path of the web page,download and analyze the data of the web page,and store it in the resource library after analysis and cleaning.Using this crawler system to collect user data can represent the whole network,which is convenient to study network characteristics.To sum up,the MJU sampling algorithm studied in this paper is more efficient and feasible,and the samples collected are highly matched with the original network.The network crawler system designed based on MJU algorithm can effectively crawl network information.
Keywords/Search Tags:OSNs, vertex sampling, MHRW, double hopping, unbiasedness
PDF Full Text Request
Related items