Neural Network Based Dimensionality Reduction And Its Application In High-dimensional Data Clustering

Posted on:2016-02-21

Degree:Master

Type:Thesis

Country:China

Candidate:X L Hou

Full Text:PDF

GTID:2308330461967278

Subject:Computer software and theory

Abstract/Summary:

Clustering as an important unsupervised method for analyzing data. Clustering is to divide different unlabeled data into multiple groups, making them have class information. With the rapid development of information technology, there are more and more data sources for clustering, and the data is becoming more and more complex, so more attributes are used to describe the data, therefore increasing the dimension of data. Because of the characteristics of high dimensional data and the limitations of traditional clustering algorithms, it is usually unable to obtain satisfactory results for high dimensional data clustering. In order to solve the problem, the study of the high-dimensional data becomes one of the main research directions.In the existing high-dimensional data clustering algorithm, the main idea is to process the spatial partition or dimension reduction first, and then complete the clustering with the traditional algorithm. In this article, we mainly discussed the dimension reduction algorithm. Traditional dimension reduction techniques can be generally divided into two categories:the linear and nonlinear. Linear method is ideal only in a few cases, due to that most of the data distribution in high dimensional space is nonlinear, and may be highly distorted, so the nonlinear dimension reduction method become the focus of attention. The emergence of the artificial neural network provides a new thought, it plays a good role in nonlinear problems.This article has carried on the comprehensive introduction about traditional clustering algorithm and high-dimensional data clustering algorithm, including its basic theory and common methods, and illustrates both the existing defects and other problems. Our focus is high-dimensional data clustering. One of the methods for dealing with high-dimensional data is based on neural network algorithms. In the article we introduced the related content, the advantages and the disadvantages of the algorithm. Because there are no established rules for the neural network structure, we studied the effects of the number of network layer and the number of nodes of every layerâ€™s, and found a new kind of network structure which is better than that of the original structure under the same objective function.hi view of the difficulties that traditional clustering algorithm in dealing with high-dimensional data, we exploited the structure found in our previous work to realize conversion from high dimensions to lower dimensions, and then complete the low-dimensional data clustering with the traditional algorithm. Comparison between the results of direct clustering and the indirect clustering after dimension reduction shows the effectiveness of the dimension reduction in high-dimensional data clustering. In dimension reduction, how to select the dimensions of the data is also a problem. In this paper, we used the maximum likelihood estimation method to estimate the intrinsic dimension of data, and calculated the clustering results under the intrinsic dimension. We also compared the results obtained with the results of other dimensions, which shows which dimension is suitable to the data.

Keywords/Search Tags:

Clustering, high-dimensional data, neural networks, dimension reduction

Related items

1	Research On Dimension Reduction Algorithms For Preserving Clustering Structures
2	Dimension Reduction And Clustering For High-Dimensional Data
3	Research On Dimension Reduction Methods Of High Dimensional Data
4	Dimension Reduction Of High Dimensional Data Based On The Autoencoder
5	Research On Dimension Reduction Methods For High-dimensional Complex Data
6	Research On Dimensionality Reduction Method Of High Dimensional Data For Trend Prediction
7	Research On Constructing Deep Structure Model For Dimension Reduction And Classification Of High-Dimensional Data
8	Research On And Design Of Dimensionality Reduction Algorithm For The High Dimensional Data
9	High-dimensional Anomaly Detection Based On Neural Networks Dimensionality Reduction And Support Vector Machine Classification
10	Cluster analysis of high dimensional data and dimension reduction for regression