Font Size: a A A

UNI64Sampling Method On Online Social Networks

Posted on:2015-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2298330467958018Subject:Control engineering
Abstract/Summary:PDF Full Text Request
The fast development of Online Social Network (OSN) has attracted a lot of researchers to analyze and study on its features. The empirical studies on online social network have to be based on the nodes and structure data of real-world networks. However, collecting the entire network data is a hard work. So, to guarantee the follow-up studies to be conducted smoothly, it is necessary to obtain a representative sample dataset of real networks by studying the network sampling method.Although a lot of network sampling methods have been proposed by scholars, an unbiased uniform sample dataset is necessary to be used as the "ground truth" to evaluate the pros and cons of those sampling methods. UNI method is the sampling method which could provide an unbiased and uniform sample dataset of a network. However, the updating of user ID system on OSN sites leads to a dramatic extension of sampling range, so the hit rate of UNI method nearly drops to zero. UNI method is unavailable now.In this paper, we summarized the background and significance of OSN sampling method systematically, analyzed the problem and drawback of UNI method when it was used on real networks, and put forward the hypothesis which would help us solve the problem that UNI method is unavailable in64-bit user ID system. Then taking Sina Weibo for example, we collected nearly100million user IDs, and analyzed the distribution of them. We discovered the ID distribution pattern of Sina Weibo, and proved the hypothesis that user IDs of OSN were not distributed in user ID space sparsely. After that, we proposed UNI64sampling method, which was based on the ideas of hierarchical clustering and greedy algorithm. By analyzing a number of original ID samples, this method could divide a group of valid intervals from the entire user ID space, and control the original UNI method to sample in those valid intervals. In this way, the hit rate could be improved so that the drawback of the original UNI method could be solved. Finally, the effectiveness of UNI64method and the quality of the sample dataset were examined and evaluated on the real-world OSN environment. The results showed that the hit rate of UNI64method met the target hit rate, and the distribution of sample IDs matched the actual situations.
Keywords/Search Tags:Online social network, Sampling method, Sina Weibo, User ID, UNI64
PDF Full Text Request
Related items