Research And Application Of Clustering Algorithm On The High Dimensional Datasets

Posted on:2018-07-03

Degree:Master

Type:Thesis

Country:China

Candidate:Z P Sun

Full Text:PDF

GTID:2348330512959261

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years, with the rapid development of information technology, cloud computing and the social network, cumulative data of every domain is high-dimensional and increasing rapidly. These mass data potentially contains a large number of useful information. Therefore, how to effectively analyze these high dimensional data to receive potential information has become the research hotspot and difficulty. As an important method in data mining, clustering analysis has been widely used in education, finance, scientific research, the Internet and so on. Although the existing clustering algorithms can obtain higher clustering quality in dealing with low dimensional data, they may lead to lower clustering validity in dealing with high dimensional data. Hence, exploring a kind of approach to deal with large scale and high dimensional data clustering has become the key and difficulty. The thesis is based on the characteristics analysis of high dimensional data and has an intensive research for attribute reduction and membership function. The main work of the thesis is summarized as follows:(1) Rough K-means algorithm has been widely used in data clustering analysis. In order to overcome the defect of low quality by manually set the numbers of clusters and the inaccuracy description of the data objects, the thesis proposes self-adaptive rough K-means algorithm based on weighted distance on the basis of rough set theory. The improved algorithm adds attribute reduction on the high dimensional data, adjusts membership function and sets weight, then determines the numbers of clusters adaptively. It can effectively deal with high dimensional data. Extensive experimental results on UCI datasets demonstrate that the improved algorithm not only can guarantee efficiency, but also can obviously receive higher accuracy.(2) When the existing clustering algorithms deal with high dimensional data, they are unstable extremely because there exist the Curse of Dimensionality. Some algorithms can not describe high dimensional data effectively. Aim to this, the thesis proposes an improved algorithm for high dimensional data clustering analysis. The improved algorithm, firstly, this algorithm processed by attribute reduction,then calculated the weighted similarity measures of data object function and obtained the similarity matrix, finally, according to the similarity matrix and the threshold value proceeding Condensed Cluster Analysis on the processed data. Extensive experimental results on artificial and UCI datasets demonstrate the effectiveness of the proposed algorithm.(3) In order to further show practical application of the proposed efficient algorithms, adding the algorithm into food safety testing, which is data mining with large scale high dimensional datasets in reality. Experimental results demonstrate the proposed algorithm can achieve higher clustering performance. As a consequence, the clustering analysis of large scale high dimensional datasets can be implemented effectively.

Keywords/Search Tags:

clustering analysis, high dimensional datasets, attribute reduction, rough set theory, self-adaptive method, Similarity measure

PDF Full Text Request

Related items

1	Similarity Measures And Attribute Reduction In Rough Set Theory
2	Study On Attribute Reduction Theory And Method Of Probabilistic Rough Sets
3	Study On Methods For Uncertainty Measure And Attribute Reduction Based On Rough Set Theory
4	Research On Similarity Rough Set Theory Based On Pansystems
5	Research On Accelerated Algorithm Of Attribute Reduction In Rough Sets And Its Neighborhood Model
6	Multi-granulation Rough Sets And Granular Reductions Based On Similarity Measure
7	Research On Subspace Clustering Based On Attribute Reduction
8	Research Of Attribute Reduction Algorithm Based On Rough Set、 T-norm And Evidence Theory
9	Research And Application Of High Efficient Attribute Reduction For High Dimensional Data Based On Rough Sets
10	Research On Attribute Importance Measure Theory And Method Based On Data Coordination