Research On Sina Weibo User Information Based On Two Improved Clustering Algorithm

Posted on:2015-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:Z Zhao

Full Text:PDF

GTID:2267330428460390

Subject:Applied statistics

Abstract/Summary:

In recent year, The Weibo (a micro-blog service released by SINA) has developedstrongly, and has become a necessary part in peopleâ€™s daily life. As a platform fordisseminating information, the Weibo can help people get the first-hand informationtimely. As a social platform, it can help people make friend with each others in a new way.Because the users play the core rule in the Weibo platform, the partition and refining ofthe Weibo users is a extremely important step for the Advertising marketing or the publicopinion monitoring or the other work with Weibo.The paper taken the Weibo users information data as the research objects, based onthe fansâ€™ number, the weibosâ€™ number, the followersâ€™ number, the friendsâ€™ number andthe weibo age of the users, partition the Weibo users into different group. Fritsly, thepaper visualize the data information to get the whole understanding of the distributionfeature, and standardize the data as the data preprocessing. For the data has a very largevolume (which is21481), and the dimensions is more than three that make it impossibleto observe the cluster tendency. So the paper apply two improved clustering algorithm.One algorithm is the improved k-Means algorithm which added the C-H index into thetraditional K-Means algorithm, so the algorithm can select the number of the clustersautonomously. The other algorithm is the TwoStep algorithm which is the combination ofthe hierarchical clustering algorithm and the Birch algorithm which can handle with thevery big data set. The paper named the two different clusters which produced by the twoabove algorithms.Finally, the paper measured the quality of the clusters with three different indexes.The result told that the improved K-means algorithm has the better effect. Maybe tworeasons for this result, the first one is the loss of the information of the data that causedby the pre-clustering in the calculation of the TwoStep algorithm, the other one isunsuitable choice of the threshold T.

Keywords/Search Tags:

Sina Weibo, Information of users, Clustering, K-Means algorithm, TwoStepalgorithm, Cluster validation

Related items

1	Research On Interests Of Sina Weibo Users Based On LDA Topic Model
2	Research On Weibo Users’ Attitudes Toward Homosexuality
3	Research On Application Level Of Information Technology In Special Education Schools Based On FCM Clustering Algorithm
4	Research On Spatiotemporal Behavior Of Campus Network Users Based On Clustering Algorithm
5	Research On The Emotion Mining And Communication Of Weibo Users In The World Cup Situation
6	Research On The Present Situation And Countermeasures Of The Value Orientation Of Sina Weibo Blogger
7	Investigation On Sina Sports Weibo Dissemination Of Content
8	Studies Of Social Network Structure And The Influence Of Its Users
9	Improved Spectral Clustering Algorithm And Its Application In Risk-model
10	Subculture Research On Sina Weibo