Font Size: a A A

Research On Threshold Strategy For Class Imbalance Decision-Making Considering Prior Distribution Of Samples

Posted on:2024-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:M K LuFull Text:PDF
GTID:2568307154499244Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Imbalanced learning aims to address the performance degradation of traditional supervised learning algorithms in the presence of imbalanced data distributions,and has become a research hotspot in the fields of machine learning,data mining,and artificial intelligence.Decision threshold moving,as a post-processing technique,has been proven to be an effective strategy for addressing class imbalance.However,both experience-based and optimization-based decision threshold moving strategies attempt to determine a compensatory threshold for all samples,which results in the classification hyperplane being only shifted but not able to change its orientation.This strategy may lead to overcompensation on certain data within the same dataset and undercompensation on others,thereby limiting its performance,especially on complex and density-varying data.To further improve the performance of existing decision threshold moving strategies,this paper proposes an improved algorithm,namely Clustering-based Decision Threshold Moving(CDTM)algorithm.This algorithm partitions majority class training instances into multiple different density regions and independently performs decision threshold moving operations on each region to obtain the optimal combination of compensatory thresholds.Specifically,this research first utilizes the well-known density-based spatial clustering algorithm with noise(DBSCAN)to segment the training set,which effectively explores the distribution of samples in the feature space and adapts well to density variations.Then,the decision threshold moving technique based on optimization is employed to separately handle clustering clusters with different sample distributions,determining independent compensatory values for each cluster.During the prediction process,for each test instance,its sample distribution is first determined using Gaussian Naive Bayes(GNB)rule,and then the corresponding decision threshold moving algorithm is invoked for classification.The effectiveness and superiority of the proposed CDTM algorithm are validated in the context of Support Vector Machines(SVM)and Extreme Learning Machines(ELM).Experimental results on 40 benchmark imbalanced datasets demonstrate that the proposed CDTM algorithm outperforms several state-of-the-art decision threshold moving algorithms in terms of the G-mean performance metric.Imbalanced data distributions are commonly encountered in many real-world domains.Therefore,this study further validates the effectiveness of imbalanced learning techniques and the robustness of the CDTM algorithm in the domain of used car transactions.Firstly,the vehicle information from the used car market is treated as the raw dataset,and its features and meanings are thoroughly studied.Various data preprocessing techniques such as handling missing values,discretization of continuous attributes,and min-max normalization are employed to ensure the integrity and accuracy of the dataset,providing a solid foundation for subsequent modeling and analysis.On the final formed used car dataset,the performance of the CDTM algorithm is compared with several classical decision threshold moving algorithms using evaluation metrics such as G-mean and F1-measure.The results demonstrate that,whether based on SVM or ELM,the CDTM algorithm exhibits the strongest robustness and performance.Furthermore,the experiments confirm the outstanding performance of imbalanced learning techniques in the domain of used car transactions,offering reliable solutions to the class imbalance problem in this field.This conclusion emphasizes the successful application of imbalanced learning techniques in the automotive industry and provides valuable references for related research fields.
Keywords/Search Tags:Class Imbalance Learning, Decision Threshold Moving, Clustering, DBSCAN, Used Car Trading
PDF Full Text Request
Related items