An Enhanced Clustering Algorithm With Parallelization Improvement And Its Application In Micro-blog User Clustering

Posted on:2015-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:R X Zhang

Full Text:PDF

GTID:2298330452964002

Subject:robotic leanring

Abstract/Summary:

PDF Full Text Request

K-means algorithm is one of the most popular clustering algorithms andit is widely used in computer vision, text mining, customer analysis and otherfields. The K-means algorithm is simple and efficient but it suffers from twomain problems. K-means is sensitive to the initial cluster centers and needuser to give the K value in advance. Agglomerative fuzzy K-means algorithmis not sensitive to initial cluster centers and can find real number of clusterswith an agglomerative procedure. But the agglomerative fuzzy K-meansalgorithm has a disadvantage in time cost for it takes a lot of iterations to findthe best k.In this thesis, we first propose an enhanced algorithm based onagglomerative fuzzy kmeans. In the enhanced algorithm we replace therandom initial value selection method used in agglomerative fuzzy kmeanswith a new initial center selection method to reduce time cost of thisalgorithm. We also present a mapreduce implementation of the enhanced toimprove the algorithmâ€™s ability on handling large scale dataset. In this thesiswe also study the method and problem when clustering micro-blog users. Weintroduce a topic model method based method to get user vectors. The topicmodel is trained on Chinese Wikipedia and then applied to micro-blog.Finally, we apply the enhanced agglomerative fuzzy kmeans on micro-bloguser clustering. Experimental results show that the new algorithm can reducethe. Weibo user clustering results were analyzed show that users can obtainthe clustering results of the algorithm suitable.

Keywords/Search Tags:

Clustering Algorithm, Mapreduce, User Clustering Analysis, Topic Model, Micro-blog

PDF Full Text Request

Related items

1	Micro-blog Hot Topics Detection Method Based On Hybrid Clustering
2	Research On Topic Recommendation Based On User Clustering In Micro-blog
3	Topic Clustering Analysis Of Popular Micro Blog Event
4	Research On Chinese Micro-blog Hot Topic Detection
5	The Research And Implementation Of Text Clustering Based On The Platform Of Micro-blog
6	Research On Particle Swarm Optimization Clustering Algorithm Oriented To Micro-blog Topic
7	Research Of The Micro-blog Hot Topic Detectionbased On VSM-BTM Topic Model
8	Design And Implementation Of The Micro-blog Topic Detection System Based On Incremental Clustering
9	The Research And Implementation Of The Micro-blog Influence Evalution Model Based On Clustering Algorithm
10	Research On Micro-blog Recommendation Method Based On Topic Model