| In the field of data mining,machine learning methods learn inductive rules from raw data,and the combination of rules with practical applications includes the classification problem of data.The regular classification methods are applicable to balanced data sets and can achieve superior results in handling the problem.However,when the classification problem is migrated to an imbalanced environment,regular classification methods are not able to learn generative rules for different classes fairly due to the defective data categories left in terms of data volume or misclassification cost,coupled with the guidance bias brought by global performance metrics.The minority class groups cannot be accurately predicted by classification methods,thus causing the classification task of imbalanced data to become difficult.In this dissertation,we conduct a study on the classification problem of imbalanced data,propose two improved classification methods for imbalanced data,and design a practical application of campus funding under imbalanced scenarios based on campus big data.The main research contents are as follows:(1)From the data pre-processing level,we address the problem that the existing undersampling methods do not consider the effect of minority group samples on the majority group,which may lead to information loss in the process of performing undersampling on the majority group samples.In this study,the kernel density estimation method is used to learn the density distribution of the minority group and perform undersampling on the majority class samples according to the distribution characteristics.The optimized method obtains better classification performance at a lower consumption cost.(2)From the classification algorithm level,an adaptive weighted extreme learning machine is proposed to address the problem of low overall classification performance caused by ignoring the differences between samples within classes when dealing with imbalanced data.In this study,the initial cost weights and extra cost weights are designed to construct an adaptive penalty matrix.This design approach takes into account the distribution of samples in different classes and effectively improves the overall classification accuracy of the algorithm on the imbalanced data set.(3)The number of economically disadvantaged students on campus is much lower than the number of regular students,and such applications are likewise classification problems with imbalanced data,making it a challenging task to apply predictive models in this area based on the context of big data on campus.In this dissertation,we combine research methods with practical applications to design a financial aid decision system for economically disadvantaged students. |