Font Size: a A A

Design And Implementation Of Spectral Clustering Algorithm For Large Scale Data

Posted on:2021-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:L C ZhaiFull Text:PDF
GTID:2518306308469664Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of data collection and storage technology,especially the extensive application of the Internet,a lot of data has been accumulated in all walks of life.In order to mine more useful information and knowledge from data,people combine machine learning and data mining for data analysis.As an important part of machine learning and data mining,clustering algorithm is widely used and studied.To improve the performance and accuracy of clustering algorithm has become the pursuit of researchers.In this thesis,the widely used spectral clustering algorithm is deeply studied and analyzed,and the algorithm is improved to make it more suitable for large scale data.This research provides technical support for users to quickly and efficiently extract useful information from massive data,and improves the efficiency of large scale data clustering analysis.The research points of this thesis can be roughly divided into the following three points.1.Design and implementation of fast spectral clustering algorithm:in view of the shortcomings of spectral clustering algorithm,this paper proposes a fast spectral clustering algorithm KSC based on Kmc2,which is optimized in the process of data representation and final clustering,reduces the complexity of data similarity calculation and point selection,so as to improve the overall efficiency of spectral clustering algorithm.2.The design and implementation of parallel spectral clustering algorithm:single machine spectral clustering algorithm is still insufficient to solve the problem of large-scale data.In this thesis,for the improved spectral clustering algorithm,reasonable parallel strategies are designed in five steps:data representation,similar matrix construction,Laplacian matrix construction,eigenvector decomposition and K-means clustering,and complete the design and Implementation of parallel spectral clustering algorithm based on spark.3.Clustering analysis system of large scale data:in order to reduce the difficulty of using the algorithm and improve the efficiency of data mining using clustering analysis,this thesis designs and implements a fast,convenient and user-friendly clustering analysis system for large scale data based on the existing parallel computing framework.
Keywords/Search Tags:Cluster Analysis, Spectral Clustering, Large Scale Data, Parallel Computing, Spark
PDF Full Text Request
Related items