Research And Application Of Soft Subspace Clustering Algorithms

Posted on:2019-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y D Fan

Full Text:PDF

GTID:2348330545981040

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Clustering analysis can discover hidden clusters in datasets automatically and it is a very challenging research area in data mining.With the rapid development of information science and technology,all walks of life have accumulated a large number of complex high-dimensional data.Affected by the curse of dimensionality and inherent sparsity,the performance and efficiency of distance metric based traditional clustering algorithms sharply decrease and even cannot work when processing high-dimensional datasets,so the research of high dimensional clustering analysis methods has become an important direction to improve clustering validity and extend application.At present,subspace clustering has become the research focus of high-dimensional data clustering analysis because of its ability to find proper clusters in different subspaces of high-dimensional data.It can be divided into two categories:hard subspace clustering and soft subspace clustering.Different from the former which in order to find the exact subspace of clusters,the latter can find the soft subspaces that contain clusters and determine the contribution of each dimension based on feature weighting.Most of soft subspace clustering methods implement based on partition clustering and add an extra step to calculate weight for each feature,aiming to find the optimal solution of the objective function.Soft subspace clustering is very sensitive to the initial cluster centers and most methods' objective functions are not complete and have many parameters which are difficult to determine.Currently,there is a lack of subspace clustering algorithms that can handle big data.In this paper,motivated by the problems and challenge faced by subspace clustering,we first select proper amount of well-distributed and high-density points as initial cluster centers for high-dimensional partition clustering by proposing a new initial cluster center selection method.The experiments show that our algorithm is insensitive to the size,dimensionality and cluster changes of datasets and can improve the robustness and quality of high-dimensional partition clustering relatively.Then,this paper proposes a new soft subspace clustering algorithm with improved objective function considering the intra-class compactness,inter-class separation and projection subspace quality and achieve adaptive adjustment of algorithm parameters.The experiments show that this algorithm can achieve improvement of soft subspace clustering algorithm effectiveness,adapt to different kinds of datasets and maintain stable clustering performance.Finally,based on the background of the era of big data,this paper proposes a distributed parallel subspace clustering algorithm based on MapReduce,greatly improve algorithm scalability towards data size and dimensionality and make it able to be applied to practical big data mining environment.

Keywords/Search Tags:

high-dimensional clustering, initial cluster center, subspace clustering, MapReduce

PDF Full Text Request

Related items

1	Study On High-dimensional Data Subspace Clustering Analysis And Application
2	Research On Subspace Clustering Algorithm For High Dimensional Data
3	Research On Subspace Clustering Algorithms Based On Density
4	Research On Clustering Algorithms For High-Dimensional Data
5	Research On Improved Sparse Subspace Clustering Algorithm
6	Research On Key Technologies Of Clustering High-dimensional Data Based On Sparse Subspace And Their Applications
7	Ksummary Analysis Method Based On Adaptive Multiple Clustering
8	Research On Projective Clustering Algorithms With Applications For High-dimensional Data
9	Research On Clustering Methods For High Dimensional Data And Their Applications
10	Research On Mixed Attribute Clustering Technology Based On Cluster Center Selection Strategy