Research On Improved K-nearest Neighbor Method For Imbalanced Data Set Classification

Posted on:2018-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:P J Su

Full Text:PDF

GTID:2348330515968969

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

In recent decades,the information explosion as the progress of society,how to extract efficiently the required information from the various of information is an urgent problem.In view of this problem,machine learning,pattern recognition,artificial intelligence and other fields of scholars are conducting in-depth study.After years of unremitting efforts,there have been many methods owned good classification performance applied to classification problem.However,these classification methods are mainly based on the overall classification error rate,accuracy and recall rate.In the unbalanced data set classification problem,these ways are easy to reduce the recognition rate of the minority class and sparse class.Due to the need of real life,people pay more and more attention to the minority class classification accuracy.It is a hot topic to improve the recognition rate of the minority class under the premise of ensuring the quality of the global classification of the unbalanced data set.This paper mainly studies the problem of K-nearest neighbor method in unbalanced data classification.The concrete work is as follows:(1)The representative samples and thresholds are introduced in the traditional K-nearest neighbor method,which solves the slow speed of classification due to a large amount of similarity computation when searching for the nearest neighbor samples.In general,the nearest neighbor samples of each test sample are selected only in the class which is not less than the corresponding threshold value,which reduces the amount of nearby calculation to a certain extent,and improves the classification speed without affecting the classification accuracy.(2)The representative degree of class and representation degree of sample are proposed,which solves the problem of low classification accuracy of imbalanced datasets based on the traditional K-nearest neighbor method.The accuracy of classification of unbalanced data sets is improved via giving the adjacent samples with a large degree of representation and minority categories larger weights to reduce the influence of most classes and distributed classes on classification.In this paper,UCI classification data set is used as experimental data.By comparing the traditional K-nearest neighbor method with the improved K-nearest neighbor method,the results show that the improved K-nearest neighbor method improves the classification performance to a certain extent.

Keywords/Search Tags:

K-nearest neighbor method, imbalanced data sets, classification, sample representativeness, class representation

PDF Full Text Request

Related items

1	Research Of Nearest Neighbor Classification Algorithm Based On Sample Selection
2	Study On Improved LMS-KNN Nearest Neighbor Classification Method
3	Research On Classification Method Of High-dimensional Class-imbalanced Data Sets Base On SVM
4	Study On Generalized Nearest Neighbor Pattern Classification
5	Research On Classification Algorithm For Imbalanced Data
6	Efficient computation of k-nearest neighbor graphs for large high-dimensional data sets on gpu clusters
7	Imbalanced Classification Methods For Complex Distribution Characteristics
8	Collaborative Representation-Based Nearest Neighbor Classification
9	Granular Computing-oriented Dynamic Neighborhood Imbalanced Data Classification Algorithm
10	Imbalanced Data Classification Based On The Influence Of Training Instances