Font Size: a A A

Research On Dimension Reduction Algorithms For Preserving Clustering Structures

Posted on:2016-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2308330464965093Subject:Computer technology
Abstract/Summary:PDF Full Text Request
High dimensional data have been emerging in our life. They not only make people difficult to understand, but also make the existing data-mining algorithms difficult to process. To deal with such data, the dimension reduction methods have been proposed to reduce the high dimensional data to low dimensional subspace, making the subsequent computer processing easier. Furthermore, there is data redundancy and correlation between the features of high dimensional data. Dimension reduction method can not only eliminate the redundancy of the data features, but also find the correlation among them, thus making the features of the subspace more representative.The methods for dimension reduction can be roughly divided into two main types:one is a kind of methods attempt to preserve global properties of the original data including Principal Component Analysis, Linear Discriminant Analysis, Adaptive Dimension Reduction algorithm, etc.; the other is a kind of methods attempt to preserve local properties of the original data, including Local Preserving Projection, Locally Linear Embedding, Laplace Mapping, etc. We focus on the problem of dimension reduction of data mining, and do the following studies:(1) By analyzing Adaptive Dimension Reduction algorithm based on Kmeans and Discriminant Analysis, we find that this algorithm only considers the global structure information, but neglect the local structure information. In order to overcome the above shortcomings, this paper proposes a dimension reduction algorithm maintaining both global and local clustering structure. This algorithm is an unsupervised linear dimension reduction algorithm suitable for the data with cloud distribution. Specifically, the algorithm adopts Kmeans to obtain the clustering label in the original space, and then describe the global and local clustering structures, finally it constructs the dimension objective function suitable for the distribution of the clusters. By solving this objective function, the projection matrix is obtained and the corresponding the subspace is yielded. Repeats the above steps until clustering label no longer change. The experimental results in artificial data set, UCI datasets and the AR face dataset show the effectiveness of this algorithm.(2) On this basis, this paper puts forward a dimension reduction algorithm suitable for the manifold distribution considering the global and local structure information of data. The algorithm first uses clustering algorithm through ranking on manifold to get the clustering label, and then constructs the objective function of dimension reduction considering the scatter among multi-manifold, the compact of each manifold and locality of neighbors. It makes the high-dimensional data into low-dimensional subspace. In the subspace it uses Manifold Clustering Algorithm producing the cluster label until the cluster label no longer change. Compared with the existing algorithms, this algorithm can produce low dimension space projection reflecting the manifold distribution effectively.(3) In order to apply dimension reduction algorithm in a better way, a dimension reduction toolbox is developed. Since a Java toolkit is concluded in the new version of the Matlab, it is convenient to compile Matlab program of dimension reduction into a Java interface program. In our toolbox, GUI is designed by Java program. Java program can call Java interface program complied by the Java toolkit in the Matlab. Its results are preserved in an array of objects.
Keywords/Search Tags:High dimensional data, Dimension reduction, Cluster analysis, Manifold clustering, Matlab mix program
PDF Full Text Request
Related items