Font Size: a A A

Researching And Improvement Of Clustering Algorithm In Data Mining Area And Its Application

Posted on:2015-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2298330467975480Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Clustering technology is one of the most critical technologies of all the datamining techniques currently, mainly used for the corresponding classification of datasources. Because clustering techniques play an important role in the process of datamining, it has also attracted more and more attention of the scientific community andindustry area. Meanwhile, the clustering technique has excellent accuracy andefficiency in dealing with the hidden relationship of massive data. The application ofclustering technology is extremely extensive, from general information index,artificial intelligence to analysis of data throughput, Intrusion detection systems arebased on the clustering technology.First of all, this article is introduce common clustering algorithms, and analyzestheir characteristics and application scenarios. Afterwards, analyzing the correlationof K-means clustering algorithm. However, the presence of K-means clusteringalgorithm partially defects, including in the initial stage of the algorithm needs todetermine the number of final clustering results. In addition, K-means poly classalgorithm is also unstable, for the same set of data objects, if the selected initialcluster centers are different, then the resulting clustering results is not the same. Thisfeature is very easy to make the final result is only partial clustering solution, ratherthan global optimal solution.To solve these problems K-means clustering algorithm, some issues has beenimproved in this research, mainly the exclusion of isolated points, as well as selectedaspects of the initial cluster centers to determine the final number of proposedclustering results corresponding improvement algorithm. The ultimate goal is toensure that the improved algorithm clustering results accurate in the last. In theimproved algorithm to determine the number of clusters should be included in thefinal results by using the average silhouette coefficient index function improved,using an improved method to determine the maximum and minimum distance fromthe initial cluster centers, and the density clustering algorithm using a combination ofmethods to exclude outlier.Finally, through an electronic enterprise CRM system application and improvedcustomer segmentation algorithm in conjunction with this paper. First, theestablishment of a CRM customer segmentation models and use the data preprocessedthrough the CRM system as the test data set. In the end, get the results of clustering,customers will be divided into several categories, and made corresponding marketingprograms to these types of customers.
Keywords/Search Tags:Data Mining, Cluster analysis, K-means algorithm
PDF Full Text Request
Related items