Font Size: a A A

Research On Multi-View Subspace Clustering Ensemble And Its Distributed Implementation

Posted on:2017-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q DengFull Text:PDF
GTID:2308330485477440Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In reality, many complicated data are valuable. People always want to be able to obtain valuable information from a number of complex data. And with the development of computer technology, such as cloud computing, big data, and etc, data is becoming more and more important. Faced with the seemingly messy data, the data can be effectively divided into several categories by clustering analysis. For each category, it may mean one kind of real class in the original data. In the machine learning, clustering analysis is an important unsupervised learning method. When the data label is unknown, clustering algorithms divides data into classes, and each class is called a cluster. With the wide use of clustering technology, the demand for multi-view data clustering is appeared. Multi-view clustering becomes a branch of the clustering analysis, and is concerned by many researchers. Multi-view data is a kind of datasets, which have many sides or multiple perspectives. For example, different image features are seen as different views of the image data, different sensors obtain different perspective data of the same data source. Multi-view clustering considers the difference and complementarity between views, and finally gets consistency results.Subspace clustering is one way to solve the high-dimensional clustering, the traditional subspace clustering is divided into hard and soft subspace clustering. In hard subspace clustering, the way to get cluster need to find a sub-dataset which attributes is a subset of the original dataset; in soft subspace clustering, the way to get cluster need to find a vector to get attribute-weighted dataset. Based on the idea of soft subspace clustering, this thesis presents a locally adaptive attributes weighting method for multi-view data. This algorithm improves the locally adaptive metrics clustering algorithm (LAC), and brings a new view weight vector, and introduces a balancing factor for each view by the differences of each view attributes. Through this method, the algorithm solves the high dimension clustering, and overcomes the curse of dimensionality. At the same time, the algorithm has low time complexity and convergence fast. Experimental results show that the proposed algorithm has better clustering results than the other exist multi-view clustering algorithms.Clustering ensemble is an effective method to improve robustness, stability and accuracy of the clustering. This thesis presents an improved multi-view clustering ensemble algorithm based on link-based ensemble method, which uses single-view clustering and multi-view clustering to get diversity clustering components. Experimental results show that the algorithm is better than other compared algorithms.At present, large-scale datasets are becoming more common. The big data processing ability of clustering algorithm is one of the performance indicators to measure the clustering algorithm. This thesis designs a multi-view soft subspace clustering and implements a distributed multi-view clustering ensemble based on Spark, which is a big data processing platform. In the experiment, this thesis carried out tests on Spark clusters. The results proved that the distributed algorithm may process large-scale data even for Gb level of multi-view data in parallel, and improve the efficiency of the clustering.
Keywords/Search Tags:Multi-view Clustering, Subspace Clustering, Clustering Ensemble, Distributed Computation, Spark
PDF Full Text Request
Related items