Font Size: a A A

Multi-Kernle Spectral Clustering Based On Incomplete Multiple Views And Its Distributed Implementation

Posted on:2019-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2348330569988915Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In this era of information explosion,the amount of data is also increasing.In order to mine the valuable information of these data,the clustering analysis is widely used.The clustering analysis is an important means of data mining and machine learning to classify the data whose labels is unknown.Meanwhile,as the structure of data becomes more complex and the sources of data gets more diverse,the traditional clustering method cannot process data from the different angles.Hence,the multi-view clustering algorithm is concerned by many researchers.Multi-view data is a kind of datasets that describes different features of the same thing and contains multiple sides.By studying the relationship within views and between views,the multi-view clustering method is used to analyze the characteristics of the data and excavate the hidden important information.For the increasingly larger data dimensions,the popular clustering method cannot get the effective partition.Kernel function is a kind of processing method of high-dimensional data,which can dispose the linearly inseparable data through a nonlinear mapping,and then clustering analysis can be performed in high-dimensional space to obtain the good effect of clustering.However,a single kernel function is not flexible to deal with the heterogeneous data for multi-view data,thus the multi-kernel learning is introduced to handle different characteristics of the data by different kernel functions.Moreover these kernel functions are effectively combined so as to find more potential information inside data.In the practical application,the multi-view data is mostly absent,so the study of incomplete view data has become the current hot point.In the incomplete view clustering,the point is how to estimate the data and improve the clustering performance of incomplete views.Firstly,the mean estimation is used as the initialization result of the data.And then the spectral clustering algorithm has an advantage of processing more different types of data,therefore,the spectral clustering algorithm and incomplete multi-kernel matrix estimation are combined into a whole to iteratively update.Experiments show that the clustering effect of incomplete view is improved and more stable under different complete rates.At present,with the explosive growth of data,it is increasingly difficult to cluster these large-scale data on a single machine.However,cloud computing technology can effectively deal with such data.Therefore,based on Spark distributed platform,a multi-kernel spectral clustering algorithm for distributed incomplete views is proposed in this thesis.Meantime,the algorithm is implemented on Spark cluster,which proves that the parallelization algorithm efficiently process the data of large scale and improve the efficiency of clustering algorithm.
Keywords/Search Tags:Multi-view Clustering, Multi-kernel learning, Incomplete views, Distributed Computation, Spark
PDF Full Text Request
Related items