Font Size: a A A

The Optimal Problem Of Feature Selection In Linear Regression

Posted on:2020-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:T WuFull Text:PDF
GTID:2480305732997799Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Sales forecasting plays an important role in enterprise decision-making.Nowadays,with the intellectualization of enterprises,the amount of enterprise data is increasing.It is worth discussing and analyzing to mine the internal information from sales data.Based on the above background,this paper uses data mining technology as a means,and has many related models in sales forecasting.For example,simple linear regression model and complex data mining algorithms such as ensemble learning and neural network.In the era of big data,the dimension of product attributes is high,while the traditional feature engineering and model are separated.In this paper,we will construct a new sales forecasting model based on linear regression,which can simultaneously carry out feature selection and linear regression.The main principle is to select several features from many features and construct a linear regression model to make it have better forecasting effect.Nowadays,the improvement of integer optimization and computer computing can speed up the solution of mixed integer quadratic optimization problems.The feature selection and linear regression model proposed by us are essentially solving mixed integer quadratic optimization problems,mainly solving the optimal subset problem of selecting several features from given n observations in linear regression.Adding initial solutions and some constraints to the mixed integer quadratic optimization problem can speed up the solving speed.We propose a DFO algorithm,which can be used as the initial solution of the mixed integer quadratic optimization problem,so that we can establish a linear regression model and feature selection at the same time.The solution provided by our model(a)guarantees that even if we terminate the algorithm ahead of time,the suboptimal solution can be obtained;(b)it can be applied to linear regression with constraints;(c)it can be extended to the objective function as absolute deviation loss function and loss function with regularization term.We will use a large number of artificial data sets and real commodity sales data for numerical experiments.Programming to solve the model,and from the accuracy,model interpretation and other aspects to make a comparison,compared with the simultaneous feature selection and linear regression model,Lasso regression,stepwise regression and other linear regression feature selection methods.
Keywords/Search Tags:Mixed integer quadratic optimization, DFO algorithm, Synthetic Datasets, Sales forecast
PDF Full Text Request
Related items