Font Size: a A A

Study On The Spectral Clustering Algorithm Based On Mixed Data

Posted on:2017-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:H MaFull Text:PDF
GTID:2308330509455317Subject:Software Engineering Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology, people in life produced large amounts of data. The traditional clustering algorithms in data mining most are for a single attribute type of data, but in many areas, the attribute type of data is mixed, traditional clustering algorithms is difficult to deal with them. The spectral clustering algorithm derived from spectral graph theory, the algorithm has a strong theoretical foundation, algorithms without preset initial cluster centers, and to find the optimal solution in the overall situation. Because of the many advantages of spectral clustering algorithm, this paper on the basis of spectral clustering algorithm, improved spectral clustering distance measures model, so that it can handle mixed data. The main researches are listed as follows.This paper studies the similarity measure of mixed data, and improved the function of similarity measure of traditional spectral clustering algorithm. Traditional spectral clustering algorithms can only handle the data which attribute is numeric. Based on spectral graph theory, this paper establishes a similarity measure of mixed data, the mixed data is regarded as the vertex of undirected graph, and the similarity between the data is mapped to the weight of the edge between the vertices,then searching for the optimal solution of clustering by graph partitioning. When constructing the similarity matrix, the algorithm takes into account the characteristics of the mixed data. Firstly, construct the dissimilarity measure between the numerical attributes and categorical attributes, then establish the similarity measure. Using the spectral clustering algorithm can obtain the global optimal solution, it can solve the traditional mixed data clustering algorithm problem which is easy to fall into find the local optimal solution. Through experiments on real data sets in UCI, the experimental results are good, and the validity of the algorithm is proved.Secondly, this paper studies the distance measure model of mixed data, improves the existing problems of the hybrid distance model, and finally improves the distance measure function of the spectral clustering algorithm. The classical mixed data clustering algorithm K-Prototypes algorithm is simple and efficient, but the algorithm is easily affected by the initial centers, the algorithm is not stable. Based mixed data distance model by K-Prototypes algorithm, this paper proposes a probability distance method, which improved the similarity metric function of spectral clustering algorithm, so that it can handle mixed data clustering problem. This algorithm solves the instability problem of K-Prototypes algorithm, but there is a weight parameter need to manually select in the mixed distance model. The weight parameter has a great influence on the clustering results, verified by experiments. According to the rough set and information entropy theory, a method to determine the weight parameters in the mixed distance model is proposed. Rough set theory is a mathematical method to analyze the system uncertainties and integrity, in combination with the information entropy, a weight value is assigned to each attribute. And combined with the probability of the mixed distance model, the automatic determination the weight for spectral clustering algorithm is proposed. Through experiments on real data sets in UCI, the experimental results are good, and the validity of the algorithm is proved.
Keywords/Search Tags:spectral clustering, mixed data, distance measure
PDF Full Text Request
Related items