Font Size: a A A

The Research Of Robust Nonnegative Matrix Factorization Algorithm

Posted on:2018-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:W K LuFull Text:PDF
GTID:2348330512979432Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet,big data era quietly come to our side,every day a variety of user behavior has produced hundreds of millions of data,including social information,shopping information and browsing information etc.A large number of data contains a lot of user behavior that we usually do not see,these rules are often able to bring better benefit or higher efficiency.Therefore,how to find valuable information from the mass of data has become a hot spot in the era of big data,data mining is in such an urgent need.Matrix factorization is an important research field in data mining,which is widely used in image and text mining.But matrix factorization in practical applications often have to face the image pixel values cannot be negative and negative in no document statistics significance,if not a good deal of negative,will make the algorithm's interpretability can be greatly reduced.In order to enhance the interpretability,the nonnegative matrix factorization slowly enters people's line of sight.Nonnegative matrix factorization makes the based matrix and the coefficient matrix increase the nonnegative constraint,which is fit for the fact that there is no nonnegative significant value in some practical applications,enhance the interpretability of the algorithm.In addition,it also has the characteristics of fast convergence speed and small storage space,which makes it very suitable for large data processing.However,the classical nonnegative matrix factorization is not very good for the control of noise data,and it can enlarge the effect of the noise data on the results of the algorithm,which limits its application in the actual scene.In the subsequent improvement,the square of the redundancy between data points is no longer calculated,just simply accumulated,to some extent,the effect of noise data is reduced,but it cannot adapt well to changes in data noise ratio of data,resulting in some data sets will not produce the desired results.In this paper,we propose two nonnegative matrix factorization algorithms for the problem,namely the capped robust nonnegative matrix factorization algorithm and the double capped robust nonnegative matrix factorization algorithm.The capped robust nonnegative matrix factorization algorithm introduces a data points truncation parameter,which is based on robust nonnegative matrix factorization algorithm based on L2,i norm.The residual of each data point is compared with the parameter,if the residual is larger than the given parameter,we will set the residual as 0,on the contrary to continue to calculate.The noise data points are eliminated and the impact on the results of the algorithm is reduced.At the same time,the robustness of the algorithm can be improved by adjusting the proportion of the noise data in the data set.The double capped robust nonnegative matrix factorization algorithm is better,which is based on the capped robust nonnegative matrix factorization algorithm.It takes into account the nature of the data structure,and introduces the Ridge Leverage Score to improve the standard of noise data.As the same time,we add the processing of noise attributes into algorithm.The truncation parameter is introduced to control the number of noise attributes.These improvements improve the accuracy of the results and enhance the robustness of the algorithm.These can make the algorithm adapt to the complex practical application scene and be widely used.
Keywords/Search Tags:Data mining, Nonnegative matrix factorization, Noise data, Capped robust nonnegative matrix factorization, Double capped robust nonnegative matrix factorization
PDF Full Text Request
Related items