Font Size: a A A

Research On Big Data Clustering Models And Algorithms Based On Sparse Representation

Posted on:2018-03-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:T XieFull Text:PDF
GTID:1368330563950927Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
The data is quantized symbol of the information.Data clustering is a process to find the effective information and hidden structure feature based on data collection and reasonable division by a similarity measure,which is an important data mining technique for unsupervised learning and have an important and widely used in pattern recognition,machine learning,image processing and other fields.In the era of Big Data,a great deal of valuable data information is produced at all times with the rapid development of economy and science and technology.Different from traditional data,Big Data usually has multi noise,high dimension,sparse,heterogeneous feature fusion and so on.How to construct efficient clustering models and algorithms for Big Data is a very important and challenging research topic,and has important scientific value and economic benefits.For the problem of clustering Big Data by using sparse representation technique,this paper obtained the K-means equivalent continuous non-convex optimization model and algorithm,presented the idea of K-means algorithm in high dimensional feature space,researched the Big Data clustering in nonnegative matrix decomposition model and the corresponding ADMM algorithm,and discussed the SON,SC and K-indicator clustering models.The main works are as follows.(1)K-means algorithm is one of the ten classical algorithms in data mining.The iterative refinement algorithm proposed by Loyd is a greedy algorithm,and it is difficult to deal with Big Data problems and lacks the support of optimization theory.This article proposed non convex optimization equivalent model of K-means based on the clustering matrix,used matrix optimization theory analysis and process of K-means,and designed effectively,the theory guarantee algorithm for the Big Data.(2)Aimed at the low efficiency caused by the high-dimensional data clustering,we consider clustering on feature space.Different from the traditional methods,the algorithm guarantees the consistency of the clustering accuracy before and after descending dimension,and accelerates the K-means algorithm when the clustering centers and distance functions satisfy certain conditions.For high-dimensional data,this method is completely matched in the preprocessing step and the clustering step,and it can effectively improve the efficiency of the algorithm while ensuring the accuracy.(3)Sparse nonnegative matrix factorization,as a new kind of Big Data processing technique,generally relaxes L0 constraints into L1 constraints.Different from the Hoyer's sparsity constraint,this article used ADMM algorithm to solve the SNMF problem with implicit L1 constraint,and proved that the convergence point of the modified algorithm is a stationary point of the problem,and used variable splitting technique to ensure that each sub problem has a closed form solution.(4)As the improvement of clustering model of Big Data,this article discusses application of Sparse Convolutional model promoted by Convolutional Sparse Coding to clustering Big Data,K-indicator clustering model which only search clustering index,SOS clustering model with sparse,orthogonal and stochasticity constraints.The above related models,analyzed theoretically and the corresponding algorithms are given.
Keywords/Search Tags:Big Data, Sparse Representation, Cluster, Non-convex Optimition, Alternating Direction Multiplier Method
PDF Full Text Request
Related items