Research And Application Of K-means Algorithm Based On Density And Distance

Posted on:2017-03-10

Degree:Master

Type:Thesis

Country:China

Candidate:L Li

Full Text:PDF

GTID:2348330536976780

Subject:Computer technology

Abstract/Summary:

Data mining is the exploration on large data set which reveals a calculation for the implied rules.It is an important branch of Computer Science and it combines many technologies.Cluster analysis is one of the important techniques in data mining.It is divided random data samples into different clusters according to the similarity.This paper selects the K-means algorithm,which is the most basic clustering algorithm in data mining.The advantage of the algorithm easy to operate.There are many shortcomings.For example,The K of cluster number is specified by the user,The initial cluster centers are randomly selected,the algorithm can only find cluster of sphere-like type.The work of this paper mainly consists of the following three aspects:first of all,in the theoretical study of K-means algorithm.On the one hand,the isolated points which affect the clustering result are eliminated and the initial cluster center selection is improved.On the other hand,The data reasonably assigned to each cluster when determine the initial cluster centers;Secondly,the improved algorithm is implemented on the Spark platform in order to deal with the massive data.;Finally,the improved algorithm is applied to the mobile customer segmentation.The experimental show that the improved K-means algorithm is more accurate than the clustering results of the traditional K-means algorithm.The improved algorithm proposed in this paper can reduce the execution time of the algorithm without affecting the accuracy of the algorithm,which is realized by the Spark platform.Based on the similarity of the collected data,the mobile customer data can be divided into different categories by selecting different segmentation variables to help the mobile data information analysis personnel to take different marketing strategies for different customer groups.

Keywords/Search Tags:

Data mining, cluster analysis, K-means algorithm, Spark, Customer segmentation

Related items

1	Research On Air Customer Segmentation Based On Spark Platform
2	Design And Implementation Of E-commerce Customer Segmentation System Based On Cluster Analysis
3	K-means Clustering-based Fusion Algorithm And The Mobile Customer Segmentation
4	Research On Mobile Phone Customer Segmentation Based On Data Mining
5	Research On Segmentation Of Spring Festival Return Customers Based On Data Mining
6	Application Of Improved K-MEANS Algorithm In Customer Segmentation
7	Data Mining Technology And Its Application In The Supermarket In Crm
8	The K-means Algorithm Improvement And Its Application In The Customer Segmentation Of The Communications Industry
9	Research And Design Of Telecommunications Customer Segmetation System Based On Cluster Analysis
10	Research On Parallelization Of Data Mining Algorithm Based On Distributed Platforms Spark And YARN