Font Size: a A A

Customer Segmentation And Outlier Detection Of Auto Client Based On Sampling Matrix

Posted on:2013-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:2248330371970917Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Many large companies in various industries are now increasingly focused on cu-stomer relationship. In order to provide products and services accurately and maximize profits, it’s necessary to fully understand the customer requirement. The most basic elements of which is customer segmentation of data mining, using clustering analysis and outlier detection to gain more knowledge about how customer needs can best be met. It’s profound significance to the customer relationship management of commercial market.In this paper choose automobile customer data that the features is typical, customer segmentation and outlier detection is done by improved cluster algorithm DBSCAN and distance-based outlier. In this paper, simplify the steps of parameter determination is studied to improve the density-based clustering DBSCAN and distance-based outlier. The main research works in the paper include the following aspects:(1) The fitness of chosen algorithm:overall, combine two algorithms by using the common in principle, and then analyzed data sets. Because of there does not exist the clustering algorithm which is best, but the most appropriate, we should select the most suitable method according to the property of the data. First of all, make sure the chosen algorithm is the most appropriate. The experimental results show that DBSC-AN is the only method, similarly, in view of the similarity in principle with DBSCAN, distance-based outlier must be suit the automobile customer data set, I won’t say more about it here.(2) Determine the parameters by extracting some data:In order to save time and space, and ensure the quality of clustering, extract part of data to determine the parameters and make clustering analysis and outlier detection for all the data are proposed in the paper. Above all we should select suitable sampling method, and then ensure the accuracy of the parameters. The experimental results prove that the data distribution by systematic sampling are more similar to the distribution of all data, the parameter are basically the same with the result by original method.(3) Determine the parameters of distance-based outlier based on the DBSCAN: density-reached which is the basic conditions to forming clusters in DBSCAN considered as a breakthrough, in other words, outlier must not meet the conditions. According to it the distance-based outlier determine parameters make use of the parameter of DBSCAN is proposed to simplify the calculation process. The experimental results show that the outlier result is ideal, and the rate of outlier detection to two UCI data sets with class mark is high.
Keywords/Search Tags:Automobile Dataset, Data Extraction, DBSCAN, Distance-based Outlier, Parameters Simplification
PDF Full Text Request
Related items