Research On Cluster Analysis Technology Of Component Size Measurement Data Based On Spark

Posted on:2019-08-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2438330572965389

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The interchangeability of parts is an important property that affects the production process.The matching work is very dependent on this property.Grouping of components can significantly improve the interchangeability within the group.Matching is a labor-consuming step for the selection of two or more parts.Generally,the group selection scheme is adopted.Traditional grouping is to form a tolerance band and group them according to the band.This method can only be applied to a single dimension data.We chose the spectral clustering algorithm to cluster the size data,which can be applied to the multi-dimensional data and greatly improve the interchangeability of the components.Spectral clustering does not make assumptions about data distribution.We improved spectral clustering algorithm in two aspects.Firstly,the heap is used to search out the k-nearest neighborhood of each point quickly,whose efficiency is much higher.Secondly,neighborhood co-occurrence information is used to remove unreliable similar links to improve the performance.Finally,we combine the two strategies to form the new algorithm HCKNNSC(K-Nearest Neighbor Spectral Clustering algorithm based on Heap and Consensus).The theoretical analysis and experiments prove that the two strategies bring time efficiency gains and more precise and practical dividing boundaries.In addition,because of the rapid development of industrial and automation measurement,component size data has been greatly increased,and the expansion of clustering algorithm is poor.So we also design and implement a parallel algorithm for the new proposed algorithm.We mainly design distributed computing of distance matrix,quick search of k-nearest neighbor,neighborhood accumulation,parallel computing similarity graph and Laplacian matrix,parallel decomposition of Laplacian matrix.Finally,the algorithm of traditional partition in feature space is designed.Finally,we used Scala to implement our algorithm based on Spark.It can be optimized with the help of MLlib supported by Spark.We analysed the complexity of every step of the algorithm.The result indicates that the new algorithm has greatly improved in time and space.It was concluded that Spark implementation performed more efficiently than singleton one in the scenario of this article.The larger the data,the larger the number of nodes,the higher the efficiency when the amount of data does not exceed the cluster memory.

Keywords/Search Tags:

Spectral Clustering, Heap, Neighborhood Consensus, Parallelization, Spark

PDF Full Text Request

Related items

1	Research Of Parallel Text Spectral Clustering Algorithm Based On Spark
2	Hypergraph Clustering Method Based On Spark
3	Research And Application Of Parallelization Optimization Of Spatial Clustering Algorithm Based On Spark
4	The Design And Implementation Of Parallelization Of Canopy And FCM Clustering Algorithms On Spark Platform
5	The Parallelization And Optimization Of K-means Algorithm Based On Spark
6	Spectral Clustering Algorithm Based On Spark And The Application On QAR Data
7	Research On K-medoids Clustering Algorithm Based On Spark
8	The Application Of Spectral Clustering With Neighborhood Information
9	The Optimization Of Clustering And Classification Algorithms Based On SPARK
10	Implementation And Application Of Clustering Algorithm Based On Spark