Font Size: a A A

Research On Incremental Clustering Algorithm In Data Mining

Posted on:2017-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2348330482984845Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As one of the most important methods of data mining, cluster analysis has been studied extensively in recent years, and it has been developed rapidly.Actually, the clustering problem is a set of data objects according to some similarity criterion function is divided into a number of categories of, the same group similarity as high as possible, between groups differences as large as possible, and between these different groups to search for contact, further operation. Cluster analysis the current can be divided into the following categories: partition clustering(partitioning clustering, hierarchical clustering,hierarchical clustering, grid clustering(grid clustering, density clustering(density clustering, fuzzy clustering, fuzzy clustering algorithm(FCM), clustering model(model clustering). K-means clustering algorithm is based on the division algorithm, because of its easy to implement, easy to operate, simple and efficient characteristics, is widely used by the majority of researchers and. But K-means clustering algorithm also has some defects due to the algorithm randomly select initial cluster centers, so the clustering results will due to initial centers selection of different and changing. Therefore, this article in the analysis based on K-means clustering algorithm, K-means clustering algorithm existing problems are studied and improved. The research is mainly improved in the following two aspects:1. In this paper, the traditional K-means clustering method is studied. In this paper, we propose a new method based on KD-tree to improve the initial cluster center. The method introduced KD-tree data structure, data set, a KD-tree,through the KD-tree rectangular unit segmentation, calculation and sorting operation picked representative K initial cluster center, effectively improve the quality of clustering; on this basis, combined with the optimizing selection of the k a initial clustering center and incremental data establish new KD-tree, bynearest neighbor search incremental data into the corresponding class and complete the dynamic process of incremental data clustering.2. Aiming at the problems of traditional collaborative filtering algorithm,such as data sparsity and cold start, this paper combines cluster analysis technology and collaborative filtering technology to produce product recommendation. Mainly has carried on the thorough research on the clustering analysis technology and traditional K-means clustering algorithm, for k-means clustering algorithm in the presence of defects, by using the Kruskal minimum spanning tree algorithm is improved and put forward a based on Kruskal improved k-means clustering algorithm—Krus K-means clustering algorithm. At the same time, the improved Krus K-means clustering algorithm to cluster the attributes and characteristics of the user item matrix, reduce the computational dimension, improve the efficiency of recommendation; on top of this, combined with the initial predictions based on the results of clustering and user clustering results of the final prediction based on generated recommendation set are recommended to the user and improve the recommendation accuracy.
Keywords/Search Tags:cluster analysis, K-means, incremental clustering, KD-tree, collaborative filtering recommendation
PDF Full Text Request
Related items