Font Size: a A A

Research On Binary And Centralized Non-negative Matrix Factorization Algorithms

Posted on:2022-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:X D MaFull Text:PDF
GTID:2518306536980319Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In many research fields such as image and text,how to express the multi-dimensional observation data well is a very important issue.A good method generally has two characteristics: the reduction of data dimensions and the mining of the potential structure of the data.Compared with the classic linear transformation method,non-negative matrix factorization is widely used in real scenarios because the description of pure additive and sparsity is more physical.However,although the algorithm can achieve effective data compression and express key information well,in most cases the key information is unpredictable,which greatly affects the flexibility of the model and the decomposition efficiency.In addition,the algorithm aims to reflect some popular features or geometric structures of the original data.In fact,the space formed by the key vectors in the decomposed base matrix has some similarities with the concept of "class",which plays an important role in learning tasks such as clustering.Therefore,how to make effective use of these key information in matrix factorization is worthy of discussion.Based on hidden feature extraction and clustering information mining,this thesis proposes the following two improvements to the classic non-negative matrix factorization algorithm:In view of the fact that the key feature information cannot be predicted in advance,the performance of the decomposition algorithm is not efficient.From the perspective of probability,this thesis uses the priori distribution results on the finite-dimensional row and infinite-dimensional column binary matrix,which generated by the classic hidden feature model Indian Buffet Process.And then it proposes an exponential Gaussian model,aiming to decompose the non-negative matrix into a low-rank coding matrix with {0,1} constraints and a non-negative dictionary matrix,and use variational Bayesian inference method to find their true posterior.After the decomposition,the 0and 1 codes in the coding matrix intuitively reflect the occupancy of the feature by the observation object,and avoid the problem of pre-setting initial rank.At the same time,the comparative experiments on the synthetic dataset and real datasets such as Swimmer dataset,Cora document and CBCL dataset all show the effectiveness of this method on hidden feature extraction tasks.Aiming at the problem of how to effectively use the “class” information in the base matrix of non-negative matrix factorization,this thesis proposes a centralized non-negative matrix factorization method.Based on the graph regularization non-negative matrix factorization framework,this thesis uses regular terms to make the weight vector maintain the potential popularity structure of the data,and constructs a K-nearest neighbor graph network to obtain the weight information of the connection points by drawing on the idea of robust continuous clustering.Then as a useful supplement,the maximization of the class information in the base matrix is introduced into the objective function.This method uses the iteratively updated decomposition result to reconstruct the connected graph,realizes the redistribution of classes and merges the information belonging to the same cluster in the factor matrix.Finally,based on the synthetic and real dataset,the centralized method is compared with some classical clustering methods to verify the clustering effect.Through the experiment and the t-SNE visualization,it can be seen that the method has greatly improved the clustering accuracy.And it greatly eliminates the problem that the boundaries between the original clusters are not obvious.
Keywords/Search Tags:Non-negative matrix factorization, Indian Buffet Process, Exponential Gaussian model, Single binary components, Feature extraction, Clustering
PDF Full Text Request
Related items