Font Size: a A A

Neural Network Based Dimensionality Reduction And Its Application In High-dimensional Data Clustering

Posted on:2016-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:X L HouFull Text:PDF
GTID:2308330461967278Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering as an important unsupervised method for analyzing data. Clustering is to divide different unlabeled data into multiple groups, making them have class information. With the rapid development of information technology, there are more and more data sources for clustering, and the data is becoming more and more complex, so more attributes are used to describe the data, therefore increasing the dimension of data. Because of the characteristics of high dimensional data and the limitations of traditional clustering algorithms, it is usually unable to obtain satisfactory results for high dimensional data clustering. In order to solve the problem, the study of the high-dimensional data becomes one of the main research directions.In the existing high-dimensional data clustering algorithm, the main idea is to process the spatial partition or dimension reduction first, and then complete the clustering with the traditional algorithm. In this article, we mainly discussed the dimension reduction algorithm. Traditional dimension reduction techniques can be generally divided into two categories:the linear and nonlinear. Linear method is ideal only in a few cases, due to that most of the data distribution in high dimensional space is nonlinear, and may be highly distorted, so the nonlinear dimension reduction method become the focus of attention. The emergence of the artificial neural network provides a new thought, it plays a good role in nonlinear problems.This article has carried on the comprehensive introduction about traditional clustering algorithm and high-dimensional data clustering algorithm, including its basic theory and common methods, and illustrates both the existing defects and other problems. Our focus is high-dimensional data clustering. One of the methods for dealing with high-dimensional data is based on neural network algorithms. In the article we introduced the related content, the advantages and the disadvantages of the algorithm. Because there are no established rules for the neural network structure, we studied the effects of the number of network layer and the number of nodes of every layer’s, and found a new kind of network structure which is better than that of the original structure under the same objective function.hi view of the difficulties that traditional clustering algorithm in dealing with high-dimensional data, we exploited the structure found in our previous work to realize conversion from high dimensions to lower dimensions, and then complete the low-dimensional data clustering with the traditional algorithm. Comparison between the results of direct clustering and the indirect clustering after dimension reduction shows the effectiveness of the dimension reduction in high-dimensional data clustering. In dimension reduction, how to select the dimensions of the data is also a problem. In this paper, we used the maximum likelihood estimation method to estimate the intrinsic dimension of data, and calculated the clustering results under the intrinsic dimension. We also compared the results obtained with the results of other dimensions, which shows which dimension is suitable to the data.
Keywords/Search Tags:Clustering, high-dimensional data, neural networks, dimension reduction
PDF Full Text Request
Related items