Font Size: a A A

Research On The Extension Of Correlation Clustering And Its Applications

Posted on:2015-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y B WangFull Text:PDF
GTID:2298330428499788Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Intelligentization is the development trend of all kinds of facilities and applications in this big data age, and the technologies of data mining are adopted to achieve this goal. In big data age, despite of its promising future, there are many great challenges. Firstly, the cost of data collection and data storing is decreasing, while the speed of processing data of human beings can not catch up with the speed of data growth. Secondly, the quality of the collected data suffers from high noise, heterogeneity and lack of attributes etc., which prevent us from analyzing the data very well. One thing to note is that most of data in real world is unsupervised, where the predictor variables are unknown. Fast and efficient unsupervised learning methods can not only alleviate the pressure of data processing, but also help the follow-up work of data collection, benefiting from the patterns discovered.Clustering analysis is one of the important topics in unsupervised learning, and it has a long history. Compared with other clustering methods, an advantage of correlation clustering is that it could group data points automatically without the number of clusters given a prior. This special property makes it more suitable for real data. However, solving the objective of correlation clustering is hard which prevents it being widely applied in the past years. For that reason, this paper has conducted the following research on correlation clustering:(1) We rewrite the objective of correlation clustering, minimum k-cut problem and quadratic semi-assignment problem respectively, to show the difference and relation between them and discuss the difficulty of solving them further more. Moreover, it can be inferred that it is almost impossible to break through the algorithm of correlation clustering by adoping the related relaxation technologies, so we should look for new ways.(2) By discussing the concrete difficulty of correlation clustering, we introduce the clustering indicator matrix and clustering assignment vector to reformulate the problem, and propose an iterative algorithm called Pseudo Expectation-Maximization (Pseudo-EM) whose time complexity is0(|V|+|E|) in each iteration by relaxing the non-critical constraints. Moreover, we discuss a heuristic initialization of clustering indicator matrix and sparsity issue of the data. At last, the experiments are conducted to demonstrate the effectiveness of the proposed Pseudo-EM algorithm. (3) Finally, we explore some applications related to correlation clustering including image segmentation and community detection. In image segmentation, the Pseudo-EM could generate more natural segmentation than spectral clustering. In community detection, the algorithm could recover the true cluster number and achieve a higher quality than the others in terms of communities.
Keywords/Search Tags:Clustering analysis, Correlation clustering, Pseudo Expectation-Maxmization, Applications of clustering, Image segmentation, Community detection
PDF Full Text Request
Related items