Font Size: a A A

Desing And Implementation Of Clustering Analysis Algorithm Based On Dimension Reduction

Posted on:2017-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y S ChenFull Text:PDF
GTID:2348330518491657Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the development of information technology,large amount of data are generated in daily life.To obtain useful information from these data,data mining technology was present.Cluster analysis is widely used in various fields,which is an important part of data mining technology.Due to the dynamic changes of data sets and the increase of data dimensions in the daily life,the traditional cluster analysis algorithm can't satisfy the increase of cluster analysis of data sets.Thus,a reasonable and effective cluster analysis algorithm is required to present,which can adapt to the cluster analysis of high-dimensional dynamic data sets.For the cluster analysis of high-dimensional data sets,firstly,it need to apply dimension reduction operation,simplify the complexity of the data processing calculation and avoid dimension disaster.Then,dynamic changing data sets are applied clustering operation by the way of incremental clustering,to avoid repeating iterative process and improve the operation efficiency of the algorithm.Thus,data dimension reduction algorithm and incremental clustering analysis algorithm are presented in this paper,to achieve the cluster analysis of higher dimensional dynamic data.About the problem of high-dimensional data sets,to reduce the amount of calculation,Scholars generally adopt the data dimension reduction method for high-dimensional data.The LLE algorithm is a common data dimension reduction algorithm,which uses adjacency point building local weight matrix to achieve data dimension reduction operation.Therefore,the LLE algorithm in the process of dimension reduction is enormously affected by the noise data.Meanwhile,the LLE algorithm in the process of building local weight matrix only considers the Euclidean distance between data without considering the density relations.It causes that the LLE algorithm can't adapt to the data sets with the uneven density distribution.To avoid the defects of the LLE algorithm,this paper proposes a data dimension reduction algorithm on the basis of the LLE algorithm——DKLLE,which can adapt to the data sets containing noise data and uneven density distribution.DKLLE adopts the improved Dijkstra distance and consider the density relations between data,so it can effectively deal with the data sets with uneven density distribution.In addition,DKLLE adopts K-neighbor graph,in order to avoid the influence of the noise data in adjacent points on the dimension reduction results.The good robustness of the DKLLE algorithm in the treatment of uneven density data is proved by simulations.The K-means algorithm is a classical clustering analysis algorithm.The K-means algorithm according to the center of the cluster of preset data sets and the number of clusters in the data,by the way of continuous iterative to update the cluster results,until the objective function converges and outputs cluster results.As the K-means algorithm needs to iterate again for each change data,therefore,the efficiency of K-means algorithm for clustering analysis of incremental data is poor.Meanwhile,in the process of iteration,objective function prone to generate the local optimal problem.An improved cluster analysis based on K-means algorithm is presented in this paper——IK-means.Firstly,the IK-means algorithm conducts the data cache and the cached data is processed with dimension reduction operation by the DKLLE algorithm.Then,cluster analysis is applied.In the process of cluster analysis,the IK-means algorithm need not set the number of clusters in advance,but using buffer to realize dynamic adjustment of data cluster number k.In the process of iteration,the IK-means algorithm with simulated annealing algorithm can effectively avoid the local optimal problems.It is proved that the IK-means algorithm can effectively realize the cluster of high-dimensional dynamic data set,and can avoid the local optimum by simulations.
Keywords/Search Tags:data mining, clustering, dimension reduction
PDF Full Text Request
Related items