Font Size: a A A

The Research And Application Of Spectral Clustering Algorithm In Data Mining

Posted on:2016-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2308330461969640Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information times find great changes in the use of information, people are looking forward to discovering valuable information and knowledge, also the law of them. As the important tool of data mining, clustering analysis has become the main research objects of data mining researchers. The clustering algorithm not only find "value" in the field of data mining, which itself has its own value and cannot be replaced.Spectral clustering algorithm can do better than some traditional clustering algorithms, faster and more effective, mainly because of the complexity of the algorithm has nothing to do with the dimension of data, is only concerned with the number of data points, which also determines that it has superior performance in high -dimensional data processing. Spectral clustering algorithm is easy to implement, converts the original data collection into a graph, and uses a data matrix to store the various types of features of data and carries out through eigenvalue decomposition. Its optimal results of the objective function tends to the global optimum rather than a local optimization, so that in the case of depression distribution, or uneven density, it has better clustering effect on data sets with complex shape, which also determines the spectral clustering algorithm can solve more practical application problems with high research value and great prospects.The paper first introduces the concept of information entropy, puts forward a new Rank algorithm to sort the Laplacian feature vectors, so that changes the traditional process of the largest k features selection by ordering the vectors according to the eigenvalue, improves the clustering effect and the quality of spectral clustering algorithm in small and medium-size data set. Based on this Rank algorithm, this paper then combines statistics and information theory to optimize feature selection of massive data. A new ReRank algorithm is proposed by further improving the Rank algorithm, which experimental results show that spectral clustering on the huge amounts of data is more efficiency.
Keywords/Search Tags:Data mining, Clustering, Spectral Clustering, Feature Selection, Information Entropy
PDF Full Text Request
Related items