Font Size: a A A

Research On Data Mining Clustering Algorithm Based On Improved SC

Posted on:2019-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiuFull Text:PDF
GTID:2348330569478158Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Spectral clustering algorithm is a clustering algorithm based on spectral segmentation.Its essence is to convert the clustering problem into the optimal segmentation problem of graph.For a given dataset clustering,an undirected weighted graph can be constructed first,where the vertices of the graph represent data samples,the vertices are connected by edges,and each edge of the graph has a weight value to describe the vertex similarity.The data sample points clustered multiple classes is equivalent to that graph is divided into a number of subgraphs,so that the subdivision of each connection within the maximum weight of image has the largest similarity,and subgraphs the smallest connection weight between the minimum similarity.Nowadays,with the rapid development of computers and information technology,data has become a valuable resource.As an effective method in data analysis,spectral clustering can mine the intrinsic relations among different data and provide valuable information for decision makers.Because spectral clustering algorithm can cluster on arbitrarily shaped sample space and converge to the global optimum,spectral clustering algorithm has been successfully applied in many fields.Although spectral clustering has many advantages,it is not sensitive to scale parameters and clustering multi-scale datasets when similarity matrices are constructed,and it still faces problems of too high temporal and spatial complexity when dealing with large-scale complex data is dealt with.So far,spectral clustering is still in the stage of developing and has many problems that require further research and improvement.In this paper,some shortcomings of spectral clustering algorithms are analyzed in detail from aspects of scale parameters,complexity and large-scale data,and the corresponding solutions are proposed.The specific contents of this thesis are as follows:(1)Aiming at the problems that the scale parameters are sensitive and its clustering result is not very good when the similarity matrix is constructed by spectral clustering algorithm,an improved adaptive spectral clustering algorithm based on density sensitivity is proposed in this paper.Firstly,the density difference between the cluster structures is applied to adjust the similarity of sample points,which are used to construct a new similarity matrix function.Then the newly constructed matrix is adopted to build a Laplace matrix.The corresponding eigenvectors of the largest K eigenvalues in the Laplace matrix are introduced to construct a new vector space.There is one by one correspondence between the data points in the vector space and the original data.Finally,K-means clustering algorithm is introduced to cluster the data points.The proposed algorithm improves the processing effect of multi scale data sets and reduces the sensitivity of the scale parameters.The results of the simulation experiments for artificial data sets and UCI data sets show that the proposed algorithm has better clustering results.(2)Gaussian kernel is usually used as the similarity measure in spectral clustering algorithm,and all the available features are used to construct the similarity matrix with Euclidean distance.The complexity of the data set would affect its spectral clustering performance.Therefore,an improved spectral clustering algorithm based on AFS is proposed.Firstly,AFS algorithm is combined to measure the similarity of more suitable data by recognizing features,and the stronger affinity matrix is generated.Then Nystro?m sampling algorithm is used to calculate the similarity matrix between the sampling points and the remaining points to reduce the computational complexity.Finally,the experiment is carried out by using different data sets and image segmentations,the effectiveness of the proposed algorithm are verified.(3)Spectral clustering is a clustering algorithm with superior clustering performance,which can cluster on any shaped data samples and converge to the global optimum.Although spectral clustering algorithm on a smaller data set clustering performance is good.However,faced with large-scale data sets,spectral clustering algorithm has the problem of scalability in the memory usage and computation time.To solve the above problems,is proposed a fast kernel spectral clustering algorithm(FKSC)based on large data.The algorithm first samples a set of m(m(28)N)points by using Nystro?m algorithm,and then approximates the feature space of the original matrix by using the eigenvector of the sample sub-matrix to solve the optimal problem of the nuclear matrix,and effectively reduces the computational complexity.Then an improved weighted kernel PCA achieve the purpose of optimization.Finally,k-means algorithm for data clustering analysis is combined.By using different clustering algorithms on different data sets for comparison experiment,the validity and speediness of the proposed algorithm are proved.
Keywords/Search Tags:Spectral clustering, Similarity matrix, Clustering algorithm, Euclidean distance, Data mining
PDF Full Text Request
Related items