Font Size: a A A

Research On Cluster Analysis Technology Of Component Size Measurement Data Based On Spark

Posted on:2019-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2438330572965389Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The interchangeability of parts is an important property that affects the production process.The matching work is very dependent on this property.Grouping of components can significantly improve the interchangeability within the group.Matching is a labor-consuming step for the selection of two or more parts.Generally,the group selection scheme is adopted.Traditional grouping is to form a tolerance band and group them according to the band.This method can only be applied to a single dimension data.We chose the spectral clustering algorithm to cluster the size data,which can be applied to the multi-dimensional data and greatly improve the interchangeability of the components.Spectral clustering does not make assumptions about data distribution.We improved spectral clustering algorithm in two aspects.Firstly,the heap is used to search out the k-nearest neighborhood of each point quickly,whose efficiency is much higher.Secondly,neighborhood co-occurrence information is used to remove unreliable similar links to improve the performance.Finally,we combine the two strategies to form the new algorithm HCKNNSC(K-Nearest Neighbor Spectral Clustering algorithm based on Heap and Consensus).The theoretical analysis and experiments prove that the two strategies bring time efficiency gains and more precise and practical dividing boundaries.In addition,because of the rapid development of industrial and automation measurement,component size data has been greatly increased,and the expansion of clustering algorithm is poor.So we also design and implement a parallel algorithm for the new proposed algorithm.We mainly design distributed computing of distance matrix,quick search of k-nearest neighbor,neighborhood accumulation,parallel computing similarity graph and Laplacian matrix,parallel decomposition of Laplacian matrix.Finally,the algorithm of traditional partition in feature space is designed.Finally,we used Scala to implement our algorithm based on Spark.It can be optimized with the help of MLlib supported by Spark.We analysed the complexity of every step of the algorithm.The result indicates that the new algorithm has greatly improved in time and space.It was concluded that Spark implementation performed more efficiently than singleton one in the scenario of this article.The larger the data,the larger the number of nodes,the higher the efficiency when the amount of data does not exceed the cluster memory.
Keywords/Search Tags:Spectral Clustering, Heap, Neighborhood Consensus, Parallelization, Spark
PDF Full Text Request
Related items