Font Size: a A A

Scmi-superviscd K-means Clustering Algorithm In Data Mining

Posted on:2014-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:J K SunFull Text:PDF
GTID:2248330398495862Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
The arrival of the digital age makes us confront the data expansion but poorknowledge, and data mining technology is created in the context of such a largerdemand. Cluster analysis is an important branch in the field of data mining.Semi-supervised clustering algorithm is a hot topic of scientific research in recentyears. Semi-supervised clustering algorithm takes the advantages of both supervisedlearning and unsupervised learning, making full use of a small amount of labeled datathat constrain the guidance of the clustering process, without marking a large mountsof data. Semi-supervised clustering algorithm is easy to implement and closer to theactual situation with high precision.This paper has the systematic research and improvement of the semi-supervisedK-means clustering algorithm. Specific research work is organized as follows:(1) We discuss the background of data mining and technical support, and point outdata mining’s study significance, application background and the future development.(2) According to the Kernel function K-means clustering algorithm, we discuss thenature of the kernel function, propose a kernel function construction method andsystematically discuss the construction theories of multi-core methods, as well as themulti-core kernel parameter selection optimization problem, analyze several typicalmulti-core methods’ advantages and disadvantages, and point out the further researchdirection.(3) We use multidimensional scaling transform method to reduce the dimension ofdimensional disaster problem in the processing of high-dimensional data, andcompare with CPA and other dimension reduction methods. This method maintainsthe intrinsic relationship between the data.(4) We propose a new method of measuring data similarity, while taking intoaccount the similarity measure among inner classes and classes, introduce an adaptivesearch method for the best clustering based on this method. At first we use treeclustering method to estimate cluster number to reduce the computational complexity, then use the adaptive method to minimize the objective function so as to obtain theoptimal number of clusters.(5) Previous semi-supervised clustering algorithm can only deal with complete tagdata. In order to solve this disadvantage, this paper introduces a new method that candeal with data without complete tag, and improve the optimal cluster centers searchalgorithm. Compared with the maximum minimum distance method, this methodgreatly reduces computational complexity.Finally, we conclude the work and prospect for the future research direction.
Keywords/Search Tags:Data Mining, clustering, Kernel function, K-means clustering, semi-supervised clustering
PDF Full Text Request
Related items