Font Size: a A A

Research On KNN Algorithm Based On Sparse Learning And Manifold Learning

Posted on:2017-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:K SunFull Text:PDF
GTID:2348330488973272Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The purpose of data mining is to extract useful rules from a large number of complex data, and form useful model. With the development of science and technology, especially the big data concept proposed recent year, we are entering an era of data resources development. Data mining technology is getting more and more popular, and it plays an important role in many fields, such as industrial development, health care, information industry and so on.KNN (k-nearest neighbor) algorithm is one of the "ten classic algorithms of data mining", because of its simplicity and clarity, easy operation and superior performance characteristics of complex data sets, in machine learning, data mining and other fields has been widely used. But there are some disadvantages in KNN algorithm itself. For example, the determination of K value is an open problem, usually, K value is determined by the user, but this method has a strong randomness, to some extent, rely on the experience of user. Some other method gets the value of K by ten-fold cross method. Although this method can avoid the blindness of selecting K value to determine the specific value of K, there are shortcomings such as large amount of calculation, low efficiency, and not taking into account the structural characteristics of the data itself. In addition, in real sample data itself, it may contain noise, traditional KNN algorithm does not consider removing noise samples, thus a lot of noise samples will affect rule model and the accuracy of the formation. Secondly, during the projection transformation, KNN algorithm does not consider the sample the manifold structure problem of the sample space, sample space projection transformation in spatial position will change, but the sample manifold structure is maintained relatively constant, the relatively constant position of sample more information can be saved. Finally, the traditional KNN algorithm does not consider the relationship between samples, in fact these with the correlation between samples of the same property type, the existence of such a correlation also save the amount of useful information.Based on sparse learning theory and manifold learning theory, this paper uses the training samples of test samples of the reconstruction method for projective transformation matrix W, then the matrix is utilized to get the ability to choose to determine value of K for the KNN algorithm. Reconstruction process considering the relationship exists between the samples, to find useful information between samples and make full use of its correlation between samples; in order to make the transform matrix W a good sparse effect, to remove noise samples, the application with sparse row paradigm, select the appropriate data samples, sample space compression. Considering the problem that the data space should keep the same structure in the process of reconstruction, the LPP algorithm based on manifold learning theory is introduced in this paper. In this paper, we define the nearest neighbor method to get the K value by self learning for KNN Self-Adaption, which is called SA-KNN method. Then the SA-KNN approach is applied to classification and regression algorithms, and a large number of comparative experiments are conducted. Experimental results show that the new method is much better than both the traditional KNN algorithm and Entropy-KNN algorithm based on attribute information entropy.
Keywords/Search Tags:KNN, Correlation, Removal Noise, Manifold Learning, LPP, Sparse Learning, Data Mining, Classification, Regression
PDF Full Text Request
Related items