Font Size: a A A

Data Mining On Online Social Networks

Posted on:2016-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q ChenFull Text:PDF
GTID:2308330503456362Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years, online social networks, especially online micro-blogging services have experienced a rapid growth. Various types of online social networks have appeared and they have not only influenced people’s lives but also attracted researchers’ attention.Analyzing and mining social networks has already become a hot topic. This thesis studies how to build an efficient and stable framework to collect data from online social networks,and social networks data visualization and mining.First, a Master-Slave framework to collect data from online social networks is designed and implemented. This framework uses multiple computers and is easy to deploy,easy to expand and has high stability, which also overcomes restrictions of APIs. Since the deployment, it has run for nearly two years and collected 160 million user profiles,6.9 billion user relations, 10 million micro-blogs of 2,000 core users, and daily hot topics from Sina Weibo. Based on the collected data, data interfaces are developed including underlying read and write classes, graph data processing interfaces, and text data processing interfaces.Second, a new visualization method is proposed and the collected data are visualized and analyzed. In order to visualize large scale online social networks, a new method is proposed based on the intersection of followers and spectral clustering. Users are divided into core users and ordinary users according to power-law property and different visualization methods are designed respectively. In this way, it becomes possible to visualize hundreds of millions of users and billions of relations while maintain the structural properties. Then Sina Weibo data are visualized to analyze geographical distribution, correlation between penetration and regional indicators, connections between different regions, and the whole network structure.At last, a new method is proposed to measure the balance property and structural features of Sina Weibo and Twitter are compared. This new method is called Edge Balance Ratio, which is able to measure the balance property of an edge and the whole network.Then comparative studies are performed on basic structural features, such as degree distribution, micro-blog distribution, correlation between degrees and micro-blogs, the average path length, and users’ sorting. Moreover we focus on analyzing the following preference of Sina Weibo and Twitter. Friends’ homophily, following distribution, assortative mixing, and Edge Balance Ratio are studied. Results show that Sina Weibo users have a more obvious social hierarchy than Twitter users, which is believed to be caused by cultural backgrounds.In summary, the data collection framework has practical engineering significance,and the visualization and data mining of social networks have revealed their structural features, users’ preferences, differences and correlations between real life, as well as how users’ backgrounds affect their behaviors in social networks, which has important scientific value.
Keywords/Search Tags:Social Network, Visualization, Network Structural Analysis, Data Mining, Edge Balance Ratio
PDF Full Text Request
Related items