Font Size: a A A

Research On Improved K-nearest Neighbor Method For Imbalanced Data Set Classification

Posted on:2018-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:P J SuFull Text:PDF
GTID:2348330515968969Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In recent decades,the information explosion as the progress of society,how to extract efficiently the required information from the various of information is an urgent problem.In view of this problem,machine learning,pattern recognition,artificial intelligence and other fields of scholars are conducting in-depth study.After years of unremitting efforts,there have been many methods owned good classification performance applied to classification problem.However,these classification methods are mainly based on the overall classification error rate,accuracy and recall rate.In the unbalanced data set classification problem,these ways are easy to reduce the recognition rate of the minority class and sparse class.Due to the need of real life,people pay more and more attention to the minority class classification accuracy.It is a hot topic to improve the recognition rate of the minority class under the premise of ensuring the quality of the global classification of the unbalanced data set.This paper mainly studies the problem of K-nearest neighbor method in unbalanced data classification.The concrete work is as follows:(1)The representative samples and thresholds are introduced in the traditional K-nearest neighbor method,which solves the slow speed of classification due to a large amount of similarity computation when searching for the nearest neighbor samples.In general,the nearest neighbor samples of each test sample are selected only in the class which is not less than the corresponding threshold value,which reduces the amount of nearby calculation to a certain extent,and improves the classification speed without affecting the classification accuracy.(2)The representative degree of class and representation degree of sample are proposed,which solves the problem of low classification accuracy of imbalanced datasets based on the traditional K-nearest neighbor method.The accuracy of classification of unbalanced data sets is improved via giving the adjacent samples with a large degree of representation and minority categories larger weights to reduce the influence of most classes and distributed classes on classification.In this paper,UCI classification data set is used as experimental data.By comparing the traditional K-nearest neighbor method with the improved K-nearest neighbor method,the results show that the improved K-nearest neighbor method improves the classification performance to a certain extent.
Keywords/Search Tags:K-nearest neighbor method, imbalanced data sets, classification, sample representativeness, class representation
PDF Full Text Request
Related items