Font Size: a A A

Dimension Reduction And Clustering For High-Dimensional Data

Posted on:2017-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:X L SunFull Text:PDF
GTID:2308330503961503Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Dimension reduction and cluster analysis are all important application in Pattern Recognition and Data Mining. In recent years, with the development and employment of internet, both quantity and the variety of the images, video, file etc. data have presented exponential growth. With the amount of data increasing, more and more features should be extracted, which will lead to a dramatic growth of the dimensionality of the data. However, the data analyzing and processing becomes very complex in high-dimensional space, because there are a large number of superfluous information and certain correlations hiding among data in high-dimensional space. Data visualization is also a difficult task for high-dimensional data.The main goal of dimension reduction is to transform the high-dimensional data into a more compact and meaningfully expression in low-dimensional space, and thus reducing the computational cost and facilitating the visualization of the data structure. And in low-dimensional space, data visualization is easy to realize. So an efficient dimension reduction method is urgently needed.Cluster analysis is not only an independent tool of data analysis but also a preprocessing in data mining. It plays a significant role in many fields of science such as image processing, statistical analysis and psychological, etc. Due to the curse of dimensionality, traditional clustering methods usually fail to produce meaningful results for the high-dimensional data. So, the research of high-dimensional data clustering is also very important.Many dimension reduction methods have been developed and widely used, such as PCA, LDA, LLE, Isomap and SNE etc. In this paper, we briefly introduce several classical dimension reduction methods. And then describe LLE method in detail. LLE is a nonlinear dimension reduction method which can preserve local configurations of nearest neighbors. In this paper, the Rank-order distance measure is used to substitute the traditional Euclidean distance measure in order to find better nearest neighbor candidates. The Rank-order distance between the data points is calculated using their neighbors’ ranking orders, and is shown to be able to improve the clustering of high dimensional data. The proposed method combines the Rank-order distance and LLE method is called Rank-order based LLE(RLLE).The RLLE method is evaluated by comparing with the original LLE, ISO-LLE and IED-LLE on two handwritten datasets. It is shown that the effectiveness of a distance measure in the LLE method is closely related to whether it can be used to find good nearest neighbors. The experimental results show that the proposed RLLE method can improve the process of dimension reduction effectively, and C-index is another good candidate for evaluating the dimension reduction results.The main high-dimensional data clustering method is subspace clustering, and hypergraph partition is believed to be a promising method for clustering high-dimensional data. In this paper, we propose a new high-dimensional data clustering method based on hypergraph partition. It first constructs a graph G from the data by defining an adjacency relationship between the data points using Shared Reverse k Nearest Neighbors(SRNN). Then a hypergraph is created from the graph G by defining the hyperedges to be all the maximal cliques in the graph G. After the hypergraph is produced, a powerful hypergraph partition method called dense subgraph partition(DSP) combined with the k-medoids method is used to produce the final clustering results. The proposed method is called DSP+k-medoids, it is evaluated on several real high-dimensional datasets, and the experimental results show that the proposed method can improve the clustering results of the high-dimensional data compared with applying k-medoids method directly on the original data.
Keywords/Search Tags:cluster analysis, dimension reduction, high-dimensional data, hypergraph partition
PDF Full Text Request
Related items