The Research On Clustering Technology For Big Data

Posted on:2020-11-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Guo

Full Text:PDF

GTID:2428330602961126

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Clustering technique is a grouping technique.that groups the collection of physical objects or abstract objects into multiple collections composed by the same class of objects.The technology is widely used in various fields,and is one of the important research contents in data mining,pattern recognition and other research fields.It plays an extremely important role in identifying the intermal structure of data.With the development of the information industry,the attribute types of data are becoming more and more complex.However,traditional clustering algorithms such as k-means can only process single attribute data,while k-prototypes clustering algorithms can process mixed attribute data,which greatly expands the application field of clustering algorithms and improves the efficiency of clustering analysis.With the advent of the era of big data,traditional clustering methods can no longer process large-scale data.Therefore,the combination of clustering technology and cluster environment has become a new trend in processing massive data,and a large amount of valuable information can be analyzed.The main contents of this paper are summarized as follows:(1)This paper presents an effective prototypes clustering algorithm for GK-prototypes.This algorithm is based on the classical K-prototypes clustering algorithms,in which the defuzzy-similarity matrix is used to construct the coarse particle sets,the granular computing and the maximum and minimum distance method are used to determine the initial clustering centers,and the objective function is modified.The experimental results and theoretical analysis show that the clustering algorithms for GK-prototypes were more accurate,more effective and more robust than those for other prototypCs algorithms.(2)In this paper,a MK-prototypes clustering algorithm for big data was proposed.One of the characteristics of big data is that the attributes of data are mixed types,including numerical attributes and classification attributes.On this basis,this paper presents a method for processing large-scale mixed data by using MapReduce models to parallelize k-prototypes clustering algorithms.Experimental results and theoretical analysis show that under the premise of maintaining the clustering accuracy,with the continuous increase of data set size,the parallel clustering algorithm has good scalability and achieves the speedup effect close to linearity.

Keywords/Search Tags:

Clustering Algorithm, Mixed Attribute Clustering, Granular Computing, MapReduce, Parallelize Clustering

PDF Full Text Request

Related items

1	Parallel Clustering Algorithm Based On MapReduce
2	The Application Of Granular Computing In Clustering Analysis
3	Research On Mixed Attribute Clustering Technology Based On Cluster Center Selection Strategy
4	Research On The Clustering Algorithm Of Parallel Partition Based On MapReduce
5	Study On Granularity Clustering
6	Study On The Clustering Ensemble Algorithm Based On Granular Computing
7	Research Of Clustering Algorithms Based On Granular Computing
8	Research On Clustering Algorithm Based On Particle Computing
9	Research And Implementation Of Mapreduce-based Graph Clustering Algorithm
10	Research On K-medoids Clustering Algorithm Based On Improved Granular Computing