Font Size: a A A

Improved K-medoids Clustering Algorithm And Feature Selection Algorithm Based On Spectral Clustering

Posted on:2018-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2358330542978421Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The field of data mining and machine learning is faced with big challenge when the age of big data is coming,and clustering and classification are important research problems of this field.We can discover the hidden patterns and rules in data by learning clustering algorithms.Partition clustering algorithm is very commonly used,but cannot recognize the clusters with arbitrary shape and detect the number of clusters automatically.K-medoids clustering is one of the typical partition clustering method.Graph clustering algorithm is also a kind of traditional clustering methods.It not only can recognize the clusters with arbitrary shape,but also can converge to global optimal solution.Feature selection is a kind of classification task which is widely used in medical,image and text etc.However,such data have high dimensional features and some features are redundant or have no relations with class labels.Especially for the gene expression data with high dimension,high redundancy and high noise,it is very important to eliminate the redundant features,reduce the feature dimension,and improve the quality of classification,and finally get effective feature subsets.In this paper,the proposed algorithms are to solve the problems that K-medoids clustering algorithm cannot detect the number of clusters automatically and cannot recognize the clusters with arbitrary shape;the selection,problem of the scale parameter in the kernel function;the feature importance methods in spectral feature selection algorithms.The main work and innovation of this paper are as follows:1.We proposed a fast K-medoids clustering algorithm based on the geodesic distance and a new statistics Fr:(1)define a new statistics Fr as clustering evaluation and iteration stopping criterion for detecting clusters of datasets automatically;(2)In addition,using the geodesic distance between samples on the minimum spanning tree(MST_path)instead of the direct Euclidean distance between samples to measure their similaritie for recognizing arbitrary shape clusters.The power of the proposed criterion Fr and the power of the adaptive fast K-medoids algorithm are tested in real world datasets from UCI machine learning repository and in the synthetic datasets.All the experimental results demonstrate that our proposed criterion Fr is a valid clustering criterion,and the adaptive fast K-medoids algorithm based on the Fr and geodesic distance can recognize the clusters with arbitrary shape,and can detect the number of clusters automatically,and is robust to noises as well.2.The typical spectral clustsering algorithm NJW proposed by Ng et.al used a global scaling parameter sigma give manually to construct the affinity matrix,and cannot discover the real clustering.Therefore Self-Tuning spectral clustering algorithm was proposed to overcome the weaknesss of NJW by a local scaling parameter ?i which is determined automatically via the p-th neareast neighbours of point i.However,?i may be affected by outliers.Above this problem,we propose a new spectral clustering algorithm based on the local standard deviation scaling parameter ?std_i.Several UCI datasets,typical synthetic datasets and noise datasets are used in our experiments to test the performance of SCSD algorithm by ACC,AMI and ARI.All the experimental results demonstrate that our proposed method has better performance than NJW;most results of SCSD are better than Self-Tuning's when their neighbor parameters are same;the results of SCSD of noise datasets are best of all.3.We proposed a new spectral feature selection algorithm(FSSC)based on SCSD spectral clustering.According to the sample local standard deviation parameter defined in section third,we apply this parameter in features as?fs_i;we adopt entropy and cosine similarity ranking to measure feature importance,then can get a proper feature subset.Support vector machine(SVM)classification accuracy and other evaluation methods are used to compare our method with MCFS and Laplacian on seven gene datasets.All experiments show that our proposed method had good performance and the cosine similarity ranking measure is very effective.
Keywords/Search Tags:K-medoids clustering algorithm, clustering evaluation index, spectral clustering, spectral feature selection
PDF Full Text Request
Related items