Scmi-superviscd K-means Clustering Algorithm In Data Mining

Posted on:2014-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:J K Sun

Full Text:PDF

GTID:2248330398495862

Subject:Basic mathematics

Abstract/Summary:

PDF Full Text Request

The arrival of the digital age makes us confront the data expansion but poorknowledge, and data mining technology is created in the context of such a largerdemand. Cluster analysis is an important branch in the field of data mining.Semi-supervised clustering algorithm is a hot topic of scientific research in recentyears. Semi-supervised clustering algorithm takes the advantages of both supervisedlearning and unsupervised learning, making full use of a small amount of labeled datathat constrain the guidance of the clustering process, without marking a large mountsof data. Semi-supervised clustering algorithm is easy to implement and closer to theactual situation with high precision.This paper has the systematic research and improvement of the semi-supervisedK-means clustering algorithm. Specific research work is organized as follows:(1) We discuss the background of data mining and technical support, and point outdata mining’s study significance, application background and the future development.(2) According to the Kernel function K-means clustering algorithm, we discuss thenature of the kernel function, propose a kernel function construction method andsystematically discuss the construction theories of multi-core methods, as well as themulti-core kernel parameter selection optimization problem, analyze several typicalmulti-core methods’ advantages and disadvantages, and point out the further researchdirection.(3) We use multidimensional scaling transform method to reduce the dimension ofdimensional disaster problem in the processing of high-dimensional data, andcompare with CPA and other dimension reduction methods. This method maintainsthe intrinsic relationship between the data.(4) We propose a new method of measuring data similarity, while taking intoaccount the similarity measure among inner classes and classes, introduce an adaptivesearch method for the best clustering based on this method. At first we use treeclustering method to estimate cluster number to reduce the computational complexity, then use the adaptive method to minimize the objective function so as to obtain theoptimal number of clusters.(5) Previous semi-supervised clustering algorithm can only deal with complete tagdata. In order to solve this disadvantage, this paper introduces a new method that candeal with data without complete tag, and improve the optimal cluster centers searchalgorithm. Compared with the maximum minimum distance method, this methodgreatly reduces computational complexity.Finally, we conclude the work and prospect for the future research direction.

Keywords/Search Tags:

Data Mining, clustering, Kernel function, K-means clustering, semi-supervised clustering

PDF Full Text Request

Related items

1	Semi Supervised Clustering Algorithm And Its Application And Research
2	Research And Improvement For Semi-supervised K-means Clustering Algorithm In Data Mining
3	Distributed Clustering And Evolutionary Clustering Algorithm Based On Semi-supervised Learning
4	Research On Clustering Ensemble And Semi-Supervised Clustering In Data Mining
5	Research On K-means Clustering Algorithm Based On Semi-Supervised Good Point Set And Leader
6	Research On Deep Semi-supervised Clustering Algorithm
7	Research On Robust Image Segmentation Algorithm Based On Neutrosophic Clustering
8	Research On Risk Degree-Based Safe Semi-Supervised Fuzzy Clustering Algorithm
9	Extended Researches On Convex Clustering
10	FCM Clustering And Research Of Its Increment Algorithm