Font Size: a A A

Research And Application Of K-means Algorithm Based On Density And Distance

Posted on:2017-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2348330536976780Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining is the exploration on large data set which reveals a calculation for the implied rules.It is an important branch of Computer Science and it combines many technologies.Cluster analysis is one of the important techniques in data mining.It is divided random data samples into different clusters according to the similarity.This paper selects the K-means algorithm,which is the most basic clustering algorithm in data mining.The advantage of the algorithm easy to operate.There are many shortcomings.For example,The K of cluster number is specified by the user,The initial cluster centers are randomly selected,the algorithm can only find cluster of sphere-like type.The work of this paper mainly consists of the following three aspects:first of all,in the theoretical study of K-means algorithm.On the one hand,the isolated points which affect the clustering result are eliminated and the initial cluster center selection is improved.On the other hand,The data reasonably assigned to each cluster when determine the initial cluster centers;Secondly,the improved algorithm is implemented on the Spark platform in order to deal with the massive data.;Finally,the improved algorithm is applied to the mobile customer segmentation.The experimental show that the improved K-means algorithm is more accurate than the clustering results of the traditional K-means algorithm.The improved algorithm proposed in this paper can reduce the execution time of the algorithm without affecting the accuracy of the algorithm,which is realized by the Spark platform.Based on the similarity of the collected data,the mobile customer data can be divided into different categories by selecting different segmentation variables to help the mobile data information analysis personnel to take different marketing strategies for different customer groups.
Keywords/Search Tags:Data mining, cluster analysis, K-means algorithm, Spark, Customer segmentation
PDF Full Text Request
Related items