Font Size: a A A

Research On Non-negative Matrix Factorization And Its Application To Unbalanced Data Classification

Posted on:2015-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:X Y FanFull Text:PDF
GTID:2308330464466630Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Non-negative Matrix Factorization(NMF) is a kind of matrix factorization method for dealing with large-scale and high-dimension data. With the unique advantages of non-negative constraints and local expression, NMF catches many researchers’ eyes, and has been widely used in data mining, computer vision and pattern recognition, etc. In addition, many practical classification problems involve unbalanced data, which is of uneven density, unbalanced category or common seen unqueal number of samples. Therefore, this thesis mainly studies the Nonnegative Matrix Factorization based on data structure information and its application to unbalanced data classification.Firstly, the basic theories of Nonnegative Matrix Factorization and unbalanced data classification are introduced, with emphasis on conventional NMF algorithms, their mathematics models and some classic derivative algorithms. Then the difficulties of unbalanced data classification and the common sampling methods are summarized.Secondly, aiming at the limitation of Graph Regularized Non-negative Matrix Factorization, which only measures the samples’ neighborhood structure with Euclidean distance, we introduce the metric of Neighborhood Sample Similarity and propose a Non-negative Matrix Factorization algorithm based on Neighborhood Sample Similarity(NSS-NMF). This method uses the neighborhood covariance matrix to compute Neighborhood Sample Similarity, and assigns higher weights to the constraints of the coefficient matrix of the decomposited sample points whose neighborhood structure are more similar, so as to adapt to the uneven sample density. Further, by introducing Neighborhood Label Similarity and taking the orthogonality of the basis vectors into account, we propose a Non-negative Matrix Factorization algorithm based on Neighborhood Similarity(NS-NMF). Inheriting Neighborhood Sample Similarity, NS-NMF constructs the neighborhood class distribution matrix according to the prior label information of the neighborhood samples. This combined neighborhood similarity effectively takes into consideration the unbalanced data density and label distribution simultaneously. The experiments vertify that the proposed algorithms outperform traditional methods in terms of clustering and classification performance.Lastly, in view of the common situation of unbalanced data, i.e., the number of samples is not balanced, we put forward a new Weighted Non-negative Matrix Factorization algorithm(WNMF). It calculates the proportion of the number of samples which belong to the same category to the total number of samples, and takes its inverse to weight the NMF. Consequently WNMF not only keeps the classification accuracy of the majority class samples but also effectively improves the classification performance for the minority class samples. In addition, combined the advantages of NS-NMF algorithm which respects neighborhood structure information, a Hybrid re-Sampling algorithm based on the Non-negative Matrix Factorization(HS-NMF) is put forward. It first maps the data into a more separable subspace through NS-NMF, then uses the classic over-sampling and under-sampling schemes to alleviate the degree of data imbalance. The experimental results show that the application of NMF related methods to the unbalanced data can lead to better classification performance than traditional pure re-sampling methods.
Keywords/Search Tags:Non-negative Matrix Factorization, unbalanced data classification, Neighborhood Similarity, re-sampling
PDF Full Text Request
Related items