Font Size: a A A

Research And Design Of Clustering Method Based On Large Data And High Dimensional Data

Posted on:2016-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:H LvFull Text:PDF
GTID:2208330470455438Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In data mining, the traditional cluster analysis method for large volumes of data and high-dimensional data space, due to the significant amount of data increases, resulting in the traditional clustering algorithms for data calculation and processing of computer memory space is also proposed huge challenge. In the biological, medical, shopping, often faced with a large number of clustering and classification of high-dimensional databases, these high-dimensional data due to the high dimensional space, in the traditional clustering method based on distance and density, you can not at the reference function does its clades categories, so when the data attribute is multi-dimensional space, there is no good effect or can not get the desired results.By the traditional method of clustering analysis of large amounts of data clustering and dimension reduction research, design experimental data sets to achieve local priority, and the clustering integration, and some classic dimensionality reduction of dimensionality reduction algorithm to obtain ideal clustering results, with a very far-reaching significance in today’s Internet applications. A plurality of small data sets, for example, the simulation of large data sets partial decomposition of the subset, and each subset of clustering, the global clustering local clustering results to the results of a large data set integration, to achieve from local to global clustering fusion results. And tested for their stability. In the face of high-dimensional data space, we use the current dimensionality reduction algorithms representative PCA principal component analysis to reduce the dimension of the simulated20to13-dimensional cube, given the specific experimental analysis.The experiment in visual studio2010development platform to achieve, write test procedures used C. Experimental results show that under the DOS interface. To ensure high accuracy, experimental design accuracy of the data processing cluster center value of each local cluster, in order to reduce errors.
Keywords/Search Tags:large amounts of data, high-dimensional data space, cluster analysis, clustering fusion, dimensionality reduction
PDF Full Text Request
Related items