Font Size: a A A

The Research And Application Of Data Competition Clustering Algorithm

Posted on:2017-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:H SuFull Text:PDF
GTID:2308330488982491Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the improvement of Internet, we are entering the DT era. In order to gain the useful message from the vast amount of data, data analysis has been study hot focus, in which the cluster analysis is an important approach of data analysis has been improved rapidly. In 2012, Lu Zhimao and others proposed a new clustering algorithm based on aggregation field model called data competition(DC) clustering algorithm. It place all the data points in the aggregation field and compute the aggregation energy for every point, then let the points compete mutually according to the arrangement of competition order to generate center points and member points. DC clustering algorithm is simple and efficient, it is not susceptible to the initialization and efficient in real applications. But it still has some shortcoming, and the further study of DC still be worth doing. DC clustering is also a center based clustering method, so the results on complex structure datasets, manifold structure datasets and density inhomogeneous datasets are not ideal, and it also cannot efficiently handle multi-view datasets. Just because these shortcomings, in this paper, some new method based on original DC clustering were proposed, and their key innovations and work are as follows:Firstly, two kinds of distance measure, local distance and global distance, were designed. According to the priori consistency assumption of datasets: the close points are more similar; the points in the same structure are more similar. Then the neighbor graph based on a dataset was built, the points are seen as the vertexes of the graph, and the shortest path of a pair of nodes(global distance) can be used as the distance of them, and then the similarity of the two points is obtained. So the data points in the same distribution region have a short distance, the similarities between them are higher, and the distance between the data points in the different distribution region is larger, and the similarities between them are lower. Then the local distance and global distance were used in DC clustering algorithm instead of Euclidean distance, a density sensitive data competition(DSDC) clustering method was proposed. The result of DSDC is more accurate than the original DC method especially on the complex datasets.Secondly, view correlation factor based multi-view data competition clustering(VCF-MDC) was proposed. Firstly, a view correlation factor was introduced to connect different views together. Next, the view correlation factors were combined with spectral method, and a joint objective function which can utilize the message from different views was built. By solving the function majorization problem, the embedding matrixes of each view were obtained. Then, the embedding matrixes were used in DC clustering algorithm. As the joint objective function combines the spectral method with the view correlation characteristic, can utilize the message provided by different views, so the VCF-MDC can get better clustering results on the embedding matrixes.Thirdly, a density adaptive data competition(DA-DC) method is designed. This method defines a density adaptive similarity measure, which can be more conform to the natural cluster distribution of the dataset than the Euclidean distance measure. Then, the similarity was used in data competition method. The DA-DC method can handle the density inhomogeneous dataset better.Fourthly, the two improved data competition clustering algorithms, DA-DC and VCF-MDC, are applied to image clustering. Three kinds of image features are extracted as multi-view data, the two algorithm is used to handle the image datasets, and the image clusters gained by the two improved data competition clustering algorithms are compared with the original DC clustering algorithm, K-means algorithm and some multi-view clustering algorithms.
Keywords/Search Tags:DC clustering, similarity matrix, density inhomogeneous, multi-view clustering, image clustering
PDF Full Text Request
Related items