Granular Computing-oriented Dynamic Neighborhood Imbalanced Data Classification Algorithm

Posted on:2021-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:H Y He

Full Text:PDF

GTID:2428330602489105

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The problem of unbalanced data distribution is a research hotspot in the field of machine learning and data mining.In many real-world practical applications,people tend to pay more attention to those few types of data.In order to improve the classification effect of the imbalanced data classification algorithm,it is usually improved from the two directions of data sampling and classification algorithm.However,most data sampling methods can only improve the imbalance of the data set at the global level,but cannot improve the data distribution locally.Therefore,continuous improvement of the nearest neighbor algorithm,neighborhood rough set theory,and three-way decision theories make the classification algorithm more suitable for the imbalanced distribution of data.This paper uses a new neighborhood construction method to construct a dynamic equal query neighborhood,discriminate the degree of sparsity by generating a dynamic neighborhood,and adjust the forward posterior probability estimation to refine the classification decision.This method improves the sensitivity to rare data and provides the same query opportunity for all data without being too biased towards rare classes.The neighborhood rough set theory is used to deal with extreme distributions to eliminate the uncertainty of the lack of rare data.After the classification is determined based on the distribution of the refined instances,the dynamic equal nearest neighbor classification algorithm based on the neighborhood rough set can classify the query instances into categories more accurately.Secondly,this paper also proposes a dynamic equal nearest neighbor classification algorithm based on three-way decision,which also constructs a dynamic equality query neighborhood,and then uses the three-way decision theory to classify the test samples more accurately and carefully.The positive and negative domain deterministic information is used to classify a part of the samples with high certainty first,and then the local neighborhood data distribution is refined in the boundary domain.Adjusted posterior probability estimates make classification decisions based on the distribution of different data,which also helps the dynamic equal nearest neighbor classification algorithm based on three-branch decisions to classify more and more imbalanced dataset query instances more accurately and stably.The imbalanced data classification problem is one of the difficulties in data mining.In order to improve the classification performance of the classification algorithm under the imbalanced data distribution,this paper proposes the above two types of granular neighborhood-oriented dynamic neighborhood imbalanced data classification algorithms.Finally,through experimental verification,using multiple evaluation criteria to evaluate,two kinds of granular neighborhood oriented dynamic neighborhood imbalanced data classification algorithms are usually better than kNN family neighbor classifiers and other types of commonly used classifiers in their ability to classify unbalanced data.

Keywords/Search Tags:

imbalanced data, nearest neighbor classification, neighborhood rough set, three-decision

PDF Full Text Request

Related items

1	Research On Improved K-nearest Neighbor Method For Imbalanced Data Set Classification
2	Study On Generalized Nearest Neighbor Pattern Classification
3	Research On Classification Algorithm For Imbalanced Data
4	Imbalanced Classification Methods For Complex Distribution Characteristics
5	Mining Research, Based On The Integration Algorithm Of The K-nearest Neighbor Classification
6	Imbalanced Data Classification Based On The Influence Of Training Instances
7	Research Of Imbalanced Data Over-sampling Technique Based On Rough Set Theory
8	Random K-Nearest Neighbor Algorithm With Application To Bankruptcy Prediction
9	Research On Neighborhood-aware Imbalanced Data Sampling Classification
10	A Reseach For Imbalanced Data Classifi-cation Algorithm Based On Neighborhood Rough Set And Hypernetwork