Font Size: a A A

Variable Selection And Forecasting Method For Complex Data

Posted on:2020-10-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:J X CheFull Text:PDF
GTID:1368330602450810Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Large data volume,complex data structure,high-dimensional data attributes and complex time-varying characteristics are the main characteristics of data sets in the era of big data,in the face of these complex data sets,how to effectively select the informative variables,how to effectively select the key data,and how to use the selected key data to infer the future development of things,have become a crucial research topic.In this thesis,the problem of variable selection and data forecasting for complex data is systematically studied through model building,algorithm design and theoretical analysis,and the related algorithms are applied to numerical simulation data sets and some real open data sets in engineering field.The concrete research results and innovative works in this thesis are as follows:1.The problem of variable selection in linear regression model is studied for complex data characteristics such as high correlation.An efficient stochastic correlation coefficient algorithm is proposed based on ensemble learning and information theory.Then,a variable selection ensemble method is constructed,and the correlation metric analysis,convergence analysis and performance theorem of three kinds of variable selection are given.In the numerical simulation experiment,the proposed algorithm can more effectively select related variables,exclude irrelevant variables and control redundant variables,and has been carried on the sample size analysis and the real case experiment.2.The problem of variable selection in nonlinear regression model is studied for highly correlated,nonlinear and other complex data characteristics.Because the function form of nonlinear regression model is difficult to know beforehand,this thesis focuses on studying the variable selection criteria independent of regression equation.Based on entropy and mutual information theory,a novel maximum correlation minimum common redundancy criterion is proposed,and an efficient variable selection algorithm is proposed based on this criterion.The algorithm can effectively deal with the problem of variable selection without model assumption.The correlation metric theorem and the performance analysis theorem of the algorithm are given.In the numerical simulation experiments,this algorithm has been applied to nonlinear problems with redundant features and high correlation features,and has identified the three types of variables.Furthermore,this algorithm has been applied to Boston Housing real case experiment.The model comparison experiments verify the superiority and effectiveness of the model.3.The forecasting method problem of support vector regression is studied for complex data characteristics such as non-linear,large number of samples,unbalanced data and so on.Considering that the selection of candidate data of “support vector” is closely related to the selection of model parameters in the process of SVR modeling,then the running efficiency of SVR model will be directly determined.Based on the theory of SVR,the learning problem of SVR under large samples is modeled.Using statistical learning theory,information theory and heuristic optimization algorithm,an improved support vector regression forecasting model is proposed.The algorithm can effectively combine training data selection and model selection,and its convergence analysis theorem is also given.On this basis,aiming at the problem of model selection of SVR,a new SVR model based on sequential grid method is proposed by using sequential analysis method.In the numerical simulation experiment,this algorithm has achieved good results;furthermore,this algorithm has been applied to the actual power grid case experiment.Through model comparison experiments,it is verified that the model can nest the optimal training data subset and model parameters.In this thesis,we study the intrinsic association structure of complex data by using statistical machine learning,then establish data-driven variable selection and forecasting algorithms for both linear and non-linear models by using sampling technique and ensemble learning,then apply them to some open data sets and power grid management in the industrial field.The research results can be applied to the learning problem without prior model structure,and can be expected to lay a theoretical foundation for the operation analysis of complex data.
Keywords/Search Tags:Forecasting methods, Support vector regression, Ensemble learning, Variable selection, Statistical learning, Complex data
PDF Full Text Request
Related items