Font Size: a A A

The Study And Application Of Clustering Method Based On Projection

Posted on:2008-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:L G HuangFull Text:PDF
GTID:2178360218452813Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the development of information technology, the data quantity that people use in the information processing can be more and more. How to get the needful information from the massive data, such as: the data distribution, the trend of data development ?The cluster is just proposed as the data analysis tool .The cluster is the process that use the physical either the abstract object set to form classes or clusters which are composed of the similar object, and the goal is enables the difference of individual in the same category as small as possible, there with the difference of individual in the different category as big as possible.At present, most of classic clusters algorithms are aim at the low dimension data, but the data in the reality is mostly in the high dimension. Therefore it is a challenge to the classic cluster algorithms (In literature [2]), it points out that the effect is not ideal via the classic cluster algorithm like K-Means and the K-Medoid, and it has proposed new method that use the characteristic extracting to reduce the data dimension, such as PCA algorithm, yet this method can be easy to lose the information of data. The recent research indicates that the data cluster of the higher dimension conceals in the subspace of the lower dimension data .How to find the effective subspace? Agrawal [3] has proposed the projection cluster method .The projection cluster projects the data set into the subspace of low dimension through the mapping transformation, and then divides the cluster in this subspace with the aid of each method. This projection method can not only reduce the dimension of the data set effectively, dimension, but also reduce the complexity of the data processing .The existing projection cluster algorithm include: CLIQUE [3], PROCLUS [4], ORCLUS [5] and EPCH [2] etc. The CLIQUE algorithm is the first algorithm that involves the projection cluster and the subspace question, but it requests the direction of subspace extending must be parallel with the axis and it also needs to use a limited value to divide the subspace of different dimension, so it is inconsequence; the PROCLUS algorithm and the ORCLUS algorithm use mainly the central point to obtain the projection cluster and their interrelated subspace. The PROCLUS algorithm requests the direction of projection subspace extending must be parallel with axis, however the ORCLUS algorithm has not this limit, its subspace direction can be discretional. The EPCH algorithm [2] also is used to solve the same problem. Comparing with the former algorithms, not only its complexity reduces, but also the validity and the accuracy have prominent improvement .through analyzing the EPCH algorithm, we adopt different method to divide the subspace and proposed two kinds of subspace division algorithm:1) The method of projective cluster based on Parzen Window project data with high dimension into subspace with low dimension. It uses probability density function to simulate subspatial sample distribution and get the cluster results by merging density areas. The experiment demonstrates that it has more precise effect than EPCH.2) The algorithm of projective cluster based on Mean-Shift transforms high dimension data space to low dimension data space by nuclear function. Then the data in the subspace of the lower dimension would be divided into regions with center point and get the merging result. The experiment proved its validity.This thesis mainly introduces the basic concept of cluster, each kind of cluster algorithm and two improvement algorithms I proposed.
Keywords/Search Tags:Subspace partition, Histograms, Parzen window, Mean-Shift, Projective clustering, cluster
PDF Full Text Request
Related items