Research Of Density Based Clustering Algorithm Using The Three Way Decision Theory Based On Spark

Posted on:2019-02-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y Qu

Full Text:PDF

GTID:2428330590465789

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The research of clustering algorithm is a hot issue in the field of machine learning and data analysis and an important analysis method.With the increasing dimension of the sample set in computer science and application field and the increasing amount of data,the traditional clustering algorithm can not adapt well to the change of the times.This paper proposes a three way decision theory spitting middle region and negative region samples again decision,and with Spark distributed parallel computing framework,clustering algorithm to solve effectively the operation time long in the large data set on the problem.The main contents are as follows:1.The research of the a new density clustering method of the ordering points to identify the clustering structure based on the three-way decision theory.First,the ordering points to identify the clustering structure is improved,and the core clustering and reachable distance are redefined in the range of the radius of the neighborhood.Then in the original data set is divided into each cluster positive region,middle region and negative region,sample point neighborhood radius of the middle domain if there are other clusters in the domain of sample points,is two times of the decision,but for the negative point outlier domain distance distribution to cluster recently.By UCI data and experiments,artificial data sets,show that the algorithm can complete the sorting through the cluster depict the cluster structure information of the data set,and for the middle class domain sample points between clusters have good judgment,so as to improve the accuracy of.2.The research of the ordering points to identify the clustering structure clustering algorithm based on Spark using the three-way decision theory.Aiming at the larger complexity of density clustering,we discuss the possibility of parallelizing processing the ordering points to identify the clustering structure clustering algorithm based on Spark using the three-way decision theory.By partitioning the data set,we calculate the neighbor points for each data block's sample points,and then classify the execution points of each partition to identify the clustering structure algorithm,so as to get the partition clusters of each partition,and finally merge to get the final result set.The experiment of UCI data sets shows that the parallel density clustering algorithm can solve the problem of long run time of large scale dataset.

Keywords/Search Tags:

three-way decision, density-based cluster, parallel, spark

PDF Full Text Request

Related items

1	Research On Parallel Decision Tree Algorithm Based On Spark
2	Research On The Classification Algorithm Of Unbalance Data Based On Spark
3	Research And Application On Three-Decision KNN Algorithm Based On Incremental Learning
4	A Clustering Algorithm Based On Density With Its Application In The Customer Cluster In The Field Of Telecom
5	Research On Density Peak-based Clustering Algorithm And Its Parallel Implementation
6	Research On Spark Oriented Fuzzy C-means Clustering Algorithm
7	Research Of Task Scheduling Strategy For Heterogeneous Cluster In Spark Computing Environment
8	Research On Two Improved Density Peaks Clustering Algorithms
9	Design And Implementation Of Parallel Data Mining System Based On Spark
10	Research And Improvement Of Big Data Parallel Clustering Algorithm Based On Spark