Font Size: a A A

Research Of Density Based Clustering Algorithm Using The Three Way Decision Theory Based On Spark

Posted on:2019-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y QuFull Text:PDF
GTID:2428330590465789Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The research of clustering algorithm is a hot issue in the field of machine learning and data analysis and an important analysis method.With the increasing dimension of the sample set in computer science and application field and the increasing amount of data,the traditional clustering algorithm can not adapt well to the change of the times.This paper proposes a three way decision theory spitting middle region and negative region samples again decision,and with Spark distributed parallel computing framework,clustering algorithm to solve effectively the operation time long in the large data set on the problem.The main contents are as follows:1.The research of the a new density clustering method of the ordering points to identify the clustering structure based on the three-way decision theory.First,the ordering points to identify the clustering structure is improved,and the core clustering and reachable distance are redefined in the range of the radius of the neighborhood.Then in the original data set is divided into each cluster positive region,middle region and negative region,sample point neighborhood radius of the middle domain if there are other clusters in the domain of sample points,is two times of the decision,but for the negative point outlier domain distance distribution to cluster recently.By UCI data and experiments,artificial data sets,show that the algorithm can complete the sorting through the cluster depict the cluster structure information of the data set,and for the middle class domain sample points between clusters have good judgment,so as to improve the accuracy of.2.The research of the ordering points to identify the clustering structure clustering algorithm based on Spark using the three-way decision theory.Aiming at the larger complexity of density clustering,we discuss the possibility of parallelizing processing the ordering points to identify the clustering structure clustering algorithm based on Spark using the three-way decision theory.By partitioning the data set,we calculate the neighbor points for each data block's sample points,and then classify the execution points of each partition to identify the clustering structure algorithm,so as to get the partition clusters of each partition,and finally merge to get the final result set.The experiment of UCI data sets shows that the parallel density clustering algorithm can solve the problem of long run time of large scale dataset.
Keywords/Search Tags:three-way decision, density-based cluster, parallel, spark
PDF Full Text Request
Related items