Font Size: a A A

Research And Application Of Soft Subspace Clustering Algorithms

Posted on:2019-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y D FanFull Text:PDF
GTID:2348330545981040Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering analysis can discover hidden clusters in datasets automatically and it is a very challenging research area in data mining.With the rapid development of information science and technology,all walks of life have accumulated a large number of complex high-dimensional data.Affected by the curse of dimensionality and inherent sparsity,the performance and efficiency of distance metric based traditional clustering algorithms sharply decrease and even cannot work when processing high-dimensional datasets,so the research of high dimensional clustering analysis methods has become an important direction to improve clustering validity and extend application.At present,subspace clustering has become the research focus of high-dimensional data clustering analysis because of its ability to find proper clusters in different subspaces of high-dimensional data.It can be divided into two categories:hard subspace clustering and soft subspace clustering.Different from the former which in order to find the exact subspace of clusters,the latter can find the soft subspaces that contain clusters and determine the contribution of each dimension based on feature weighting.Most of soft subspace clustering methods implement based on partition clustering and add an extra step to calculate weight for each feature,aiming to find the optimal solution of the objective function.Soft subspace clustering is very sensitive to the initial cluster centers and most methods' objective functions are not complete and have many parameters which are difficult to determine.Currently,there is a lack of subspace clustering algorithms that can handle big data.In this paper,motivated by the problems and challenge faced by subspace clustering,we first select proper amount of well-distributed and high-density points as initial cluster centers for high-dimensional partition clustering by proposing a new initial cluster center selection method.The experiments show that our algorithm is insensitive to the size,dimensionality and cluster changes of datasets and can improve the robustness and quality of high-dimensional partition clustering relatively.Then,this paper proposes a new soft subspace clustering algorithm with improved objective function considering the intra-class compactness,inter-class separation and projection subspace quality and achieve adaptive adjustment of algorithm parameters.The experiments show that this algorithm can achieve improvement of soft subspace clustering algorithm effectiveness,adapt to different kinds of datasets and maintain stable clustering performance.Finally,based on the background of the era of big data,this paper proposes a distributed parallel subspace clustering algorithm based on MapReduce,greatly improve algorithm scalability towards data size and dimensionality and make it able to be applied to practical big data mining environment.
Keywords/Search Tags:high-dimensional clustering, initial cluster center, subspace clustering, MapReduce
PDF Full Text Request
Related items