Font Size: a A A

Research On Clustering And Unsupervised Feature Selection Algorithms Based On Density Peaks

Posted on:2017-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y N QuFull Text:PDF
GTID:2358330512468058Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering and classification is an important research problem in data mining, machine learning and so on. We can discover the hidden knowledge, patterns and rules in the data by learning clustering algorithms. Partition clustering algorithm is widely used in practice. K-medoids clustering is the typical partition clustering algorithm. How to determine the optimal initial cluster centers, the correct cluster numbers and effective clustering criterion function become the key issue in K-medoids clustering algorithms.Feature selection is one of the important methods in data preprocessing, which is widely used in the fields of medical, image and text. The feature dimension of gene data, image data and text data is higher. And these features may be redundant or irrelevant. Therefore, the focus of feature selection of high dimensional data is reducing the data dimension, eliminating redundant features and improving the classification accuracy.In this paper, the proposed algorithms are to solve the defects that K-medoids clustering algorithm need artificially given cluster number and the selected initial clustering centers may locate in the same cluster and unsupervised feature selection algorithms have low classification accuracy. The main work and innovation of this paper are as follows:(1) K-medoids clustering algorithms were proposed with optimized initial seeds by density peaks. Inspired by the paper published in Science journal, the new algorithms define the new sample density and distance with K nearest neighbor, then the decision graphs of sample density relative to its distance was plotted. The points with higher density and distance were chosen as the initial seeds for K-medoids, so that the seeds will be in different clusters and the number of clusters of the dataset was automatically determined. In order to get better clustering results, the new measure function was proposed. The proposed two new K-medoids algorithms were tested on the real datasets from UCI machine learning repository and the synthetic datasets. The experimental results demonstrated that the proposed new K-medoids clustering algorithms can recognize the number of clusters of a dataset, and can find its proper initial seeds, and can reduce the clustering time, and can improve the clustering accuracy, and are robust to noises as well.(2) Unsupervised feature selection algorithms based on density peaks were proposed. The representativeness and discriminability of features were defined with K nearest neighbor, and the product of a feature's representativeness and its discriminability is considered as the significance of the feature in the proposed two new algorithms. Support vector machines are used as classification tools, and its classification accuracy is adopted to evaluate the power of the selected feature subsets. The proposed algorithms are tested on UCI machine learning repository, popular face datasets and gene datasets. All experimental results demonstrated that the feature subsets selected by the proposed feature selection algorithms possess good classification power.
Keywords/Search Tags:clustering, initial seeds, density peaks, K-medoids algorithms, unsupervised feature selection
PDF Full Text Request
Related items