Design And Implementation Of Spectral Clustering Algorithm For Large Scale Data

Posted on:2021-03-06

Degree:Master

Type:Thesis

Country:China

Candidate:L C Zhai

Full Text:PDF

GTID:2518306308469664

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of data collection and storage technology,especially the extensive application of the Internet,a lot of data has been accumulated in all walks of life.In order to mine more useful information and knowledge from data,people combine machine learning and data mining for data analysis.As an important part of machine learning and data mining,clustering algorithm is widely used and studied.To improve the performance and accuracy of clustering algorithm has become the pursuit of researchers.In this thesis,the widely used spectral clustering algorithm is deeply studied and analyzed,and the algorithm is improved to make it more suitable for large scale data.This research provides technical support for users to quickly and efficiently extract useful information from massive data,and improves the efficiency of large scale data clustering analysis.The research points of this thesis can be roughly divided into the following three points.1.Design and implementation of fast spectral clustering algorithm:in view of the shortcomings of spectral clustering algorithm,this paper proposes a fast spectral clustering algorithm KSC based on Kmc2,which is optimized in the process of data representation and final clustering,reduces the complexity of data similarity calculation and point selection,so as to improve the overall efficiency of spectral clustering algorithm.2.The design and implementation of parallel spectral clustering algorithm:single machine spectral clustering algorithm is still insufficient to solve the problem of large-scale data.In this thesis,for the improved spectral clustering algorithm,reasonable parallel strategies are designed in five steps:data representation,similar matrix construction,Laplacian matrix construction,eigenvector decomposition and K-means clustering,and complete the design and Implementation of parallel spectral clustering algorithm based on spark.3.Clustering analysis system of large scale data:in order to reduce the difficulty of using the algorithm and improve the efficiency of data mining using clustering analysis,this thesis designs and implements a fast,convenient and user-friendly clustering analysis system for large scale data based on the existing parallel computing framework.

Keywords/Search Tags:

Cluster Analysis, Spectral Clustering, Large Scale Data, Parallel Computing, Spark

PDF Full Text Request

Related items

1	Research On Parallel Clustering Algorithm For Large - Scale Data Set
2	Research On Fast Graph Clustering Algorithm On Large-Scale Data
3	Study On Three-way Decisions Clustering Ensemble Based On Spark
4	Large-scale Data Clustering Technology Research And To Achieve
5	Research On Cluster Analysis Technology Of Component Size Measurement Data Based On Spark
6	Research On Spectral Clustering Algorithm And Its Application
7	Research And Application Of Clustering Algorithms For Large Scale Data Sets
8	Research On Spectral Clustering Methods For Large Scale Datasets
9	Parallel Design And Implementation Of AP Clustering Algorithms Based On CUDA
10	Research On Key Technologies Of Parallel Optimization For Multi-computing Platforms For Large-scale Applications