Font Size: a A A

Studies On Cost-sensitive Regression Learning Of Small Dataset

Posted on:2022-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:F F XuFull Text:PDF
GTID:2518306731477624Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the field of machine learning,regression prediction is a very important research direction,which is widely used in enterprise operation decision-making,financial risk control and other aspects.Traditional regression learning is based on the assumption that the prediction cost is equal to minimize the prediction error.However,in real scenarios,the costs of over-prediction and under-prediction are generally asymmetrical,such as demand forecasting and purchasing rate forecasting.This requires costsensitive regression learning with the goal of minimizing the cost of error prediction.In many cost-sensitive regression learning tasks,they often face the problem of small-data-set.However,the direct use of small-data-set for model training will bring two common challenges: 1.Small sample size is not enough to filter secondary information;2.Due to the small sample size,the training model cannot adequately represent the real data structure and thus has high variance.In the end,these problems lead to poor performance of the prediction model.Existing studies have proposed many solutions to the problem of cost-insensitive regression learning with small-data-set,including increasing sample size,feature selection,and learning methods.These solutions have defects in the application of cost-sensitive regression learning scenarios,and their help for regression prediction is limited.In order to improve the prediction effect of cost-sensitive regression learning for small-data-set,considering ensemble learning and feature selection,this paper proposes two new ideas,using the costsensitive regression learning framework "Model Manipulation",and using back propagation neural network model for regression prediction.The model asymmetric cost loss function uses Linear-Linear cost(LLC)and Quadratic-Quadratic cost(QQC).The research work of this paper mainly includes the following two aspects:(1)This paper proposes Intra-cluster Product Favored Bagging algorithm(ICPFB algorithm)based on the information in the cluster.The algorithm uses the Bagging ensemble algorithm to reduce the model prediction variance by combining multiple weak learners.At the same time,it uses the information in the cluster to sample the features with overlap and replacement.According to the experimental results,this algorithm can effectively reduce the average daily forecast cost and the variance of the model,so as to improve the prediction performance of the model.Based on the prediction results of the traditional Bagging ensemble algorithm model,the model uses LLC and QQC,and the average daily cost of the ICPFB test set is reduced by 6.3% and3.7%,respectively.(2)This paper proposes Intra-cluster Product Favored Feature Selection algorithm(ICPFFS algorithm).The algorithm uses the information in the cluster to perform secondary screening of features,so as to filter out important features while reducing the amount of data required for model training.The experimental results show that,based on the ICPFB algorithm,this method further improves the performance of model learning,and the average daily cost and variance of model prediction are reduced.The model uses LLC and QQC.Compared with the benchmark experiment,the prediction cost of ICPFFS is reduced by 33.5% and 32.4%,respectively;compared with ICPFB,it is reduced by 7.9% and 10.5% respectively.In addition,the algorithm has robust boosting effects under the three models of random forest,XGboost and neural network.
Keywords/Search Tags:Small-Data-Set, Cost-Sensitive, Regression Learning, Feature Selection, Asymmetric Cost Loss Function
PDF Full Text Request
Related items