Font Size: a A A

Application Research Of K-Means Algorithm In Customer Segmentation

Posted on:2008-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:L W XingFull Text:PDF
GTID:2189360215455413Subject:Statistics
Abstract/Summary:PDF Full Text Request
Emerging competition in telecom and finance makes them realize the importance of customers and regard their customers as their important assets. Retaining existing customers and acquiring new customers are their main tasks. One-to-one marketing substitutes for the past massive marketing to gain their customers'satisfaction and increase profits. Effective business development strategies often begin with market segmentation, which involves the grouping of customers and non-customers with similar characteristics. Segmentation is useful to the extent that customers within a segment have similar purchasing behavior and/or profitability that differs from customers in other segments.Traditionally, marketers often segment their customers via one- dimension attribute. For example, bankers always divide their customers as high,median and low group according to their deposits. This method has its merits in simpleness, convenience and approachability. But it can't keep pace with customers'needs diversification and technical progress. But customer behavioral segmentation can do. This new segmentation methodology can handle hundreds of variables, and its results can make marketers understand their customers better. This new method is also called"segmentation based data mining".Cluster analysis is usually used for customer segmentation. K-Means clustering is one of the most popular data mining algorithms. K-Means can handle big data and cause perfect results. But there are few studies about K-Means in clustering customer data, so as a data miner, academic and experimental analysis about K-Means in customer segmentation makes sense.Chapter 1 sets forth the research background, motivation and process of customer segmentation.Chapter 2 describes the related literature of general segmentation methodology, K-Means clustering, result evaluation, and principal component analysis. Chapter 3 designs the experimental process according to the Cross-Industry Standard Process for Data Mining and SAS SEMMA(Sample, Explore, Modify, Model, Assess).Chapter 4 analyzes customer data from a domestic bank by SAS/STAT, EM , applies SAS FASTCLUS to customer segmentation and explores general data mining process.How to choose initial starting points for K-Means clustering is highly focused. Two methods of choosing initial starting points are discussed. Four synthetic initial starting point involved includes random centriods, scrambled midpoint, scrambled median, unscrambled midpoint. Four actual sample data initial starting points involved are replace full, random, breakup, and feature Value Sums.As a whole, I find that synthetic method performs better than actual sample method for its representative. For the same cluster numbers, any one of eight methods can make the within-cluster sum of squares more and more smaller with the increase of experiment times. So it is necessary for clustering via K-Means algorithm to run many times by choosing different initial starting points to improve local minima.To evaluate K-Means clustering results'validity, I perform Kohonen's Self-Organized Map for the same data. It shows that K-Means result is robust and effective. Linear discriminat analysis for cluster result strengthens the conclusion.Finally how to deploy segmentation model and the aspects about model application strategies that should be paid attention to are discussed. Chapter 5 offers a summary and gives concluding remarks.The main contributions and innovations of this paper are as follows.1) Many experimental papers generally summarize all aspects of data mining but pay little attention to the details of application of one specific data mining algorithm such as K-Means clustering. Most of current experimental papers pay more attention to business strategies after clustering rather than the details of K-Means clustering process. As we know, K-Means technique itself is also very important, if we have a proficient skill about K-Means clustering, we can achieve more desirable results and develop more significative business strategies.2) SAS is so powerful but so little people investigate customer segmentation by SAS FASTCLUS in practice. This paper provides SAS code of eight methods about choosing initial starting points, and data miners who want to perform customer segmentation via SAS K-Means can refer to them. K-Means clustering results'validity is evaluated by Kohonen's Self-Organized Map and linear discriminat analysis.3) Initial starting points has a strong effect on K-Means clustering result. This paper indicates that the evaluation of K-Means result is also based on the least within-cluster sum of error of many runs, not only algorithm convergence. At the end of each run, the within-cluster sum of error is calculated, the minimum one of many times run is choosed as the final result. This paper also provides a new idea that changing initial starting points for enough times regardless of which method to choose can improve local minima and achieve a more desirable result.
Keywords/Search Tags:Data Mining, Customer Segmentation, K-Means, Initial Starting Points
PDF Full Text Request
Related items