Font Size: a A A

Granular Computing-oriented Dynamic Neighborhood Imbalanced Data Classification Algorithm

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:H Y HeFull Text:PDF
GTID:2428330602489105Subject:Engineering
Abstract/Summary:PDF Full Text Request
The problem of unbalanced data distribution is a research hotspot in the field of machine learning and data mining.In many real-world practical applications,people tend to pay more attention to those few types of data.In order to improve the classification effect of the imbalanced data classification algorithm,it is usually improved from the two directions of data sampling and classification algorithm.However,most data sampling methods can only improve the imbalance of the data set at the global level,but cannot improve the data distribution locally.Therefore,continuous improvement of the nearest neighbor algorithm,neighborhood rough set theory,and three-way decision theories make the classification algorithm more suitable for the imbalanced distribution of data.This paper uses a new neighborhood construction method to construct a dynamic equal query neighborhood,discriminate the degree of sparsity by generating a dynamic neighborhood,and adjust the forward posterior probability estimation to refine the classification decision.This method improves the sensitivity to rare data and provides the same query opportunity for all data without being too biased towards rare classes.The neighborhood rough set theory is used to deal with extreme distributions to eliminate the uncertainty of the lack of rare data.After the classification is determined based on the distribution of the refined instances,the dynamic equal nearest neighbor classification algorithm based on the neighborhood rough set can classify the query instances into categories more accurately.Secondly,this paper also proposes a dynamic equal nearest neighbor classification algorithm based on three-way decision,which also constructs a dynamic equality query neighborhood,and then uses the three-way decision theory to classify the test samples more accurately and carefully.The positive and negative domain deterministic information is used to classify a part of the samples with high certainty first,and then the local neighborhood data distribution is refined in the boundary domain.Adjusted posterior probability estimates make classification decisions based on the distribution of different data,which also helps the dynamic equal nearest neighbor classification algorithm based on three-branch decisions to classify more and more imbalanced dataset query instances more accurately and stably.The imbalanced data classification problem is one of the difficulties in data mining.In order to improve the classification performance of the classification algorithm under the imbalanced data distribution,this paper proposes the above two types of granular neighborhood-oriented dynamic neighborhood imbalanced data classification algorithms.Finally,through experimental verification,using multiple evaluation criteria to evaluate,two kinds of granular neighborhood oriented dynamic neighborhood imbalanced data classification algorithms are usually better than kNN family neighbor classifiers and other types of commonly used classifiers in their ability to classify unbalanced data.
Keywords/Search Tags:imbalanced data, nearest neighbor classification, neighborhood rough set, three-decision
PDF Full Text Request
Related items