Comparative Study On The Machine Learning Prediction Methods And The Statistical Modeling Prediction Methods

Posted on:2017-02-13

Degree:Master

Type:Thesis

Country:China

Candidate:H M Li

Full Text:PDF

GTID:2308330503473256

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

This paper makes comparisons between three machine leaning methods and statistical modeling methods on multivariate time series data and longitudinal data and multicollinearity data. This study investigates the multivariate time series dataset containing the highest and lowest temperatures of 16 cities around the world(total 32variables) by using VARX and five other methods including three machine learning methods for cross-section data and OLS methods with R software. This study mainly makes comparisons between VARX and the other methods on long-term to short-term forecasting with every variable to be dependent variable alternately. As the outcome,VARX method is inferior in general to most machine learning methods except the neural networks for most of the long-term forecasting. Here long-term forecasting corresponds to the larger size of training sets and smaller size of testing sets in machine learning terminology. However, the results also show for certain dependent variables especially for short-term forecasting VARX is comparatively better. In many cases even OLS could behaves very well. Therefore the selection of the best forecasting method must depends on both which variable to be dependent variable and whether to do long-term or short-term forecasting.This study makes prediction comparison between artificial neural network and statistical modeling methods for a data about Parkinsonâ€™s disease and a diabetes data. On the first dataset, linear mixed-effects model(lme model) and neural network are used for 95 different sizes of training sets to produce 95 normalized mean squares errors of prediction, and it is discovered that the neural network method is highly superior than lme model for any size of training set. On the second dataset, traditional parametric methods such as ridge regression, lasso and adaptive lasso methods and not-so-traditional PLS method are used in comparison with neural network method, and via 10-fold cross validation prediction results show the four traditional methods are much inferior than neural network although PLS is slightly superior than the others parametric methods.This paper contributes new and important reference value to comparative study for comparing case, at the same time provide the beneficial reference to the actual workers.All calculations of this paper have been basaed on the statistical software R.

Keywords/Search Tags:

statistical modeling methods, random forests, mboost, artificial neural network, cross validation, normalized mean squares errors

PDF Full Text Request

Related items

1	Optimization Modeling Methods Based On Analyzing Statistical Character Of Errors
2	The Algorithm Research Of Face Pose Estimation Based On Multi-layer Random Forests Classification
3	Predictive modeling of sugarbeet quality using vegetative index, statistical, and artificial neural network methods
4	Research On Block-regularized Cross-Validation Methods For Comparing Supervised Algorithms
5	Statistical Methods For Digital Forensics
6	Modeling large-scale cross effect in co-purchase incidence: Comparing artificial neural network techniques and multivariate probit modeling
7	The Research Of A Convergent Random Forests Algorithm For Faces Detection
8	Optimal use of regularization and cross-validation in neural network modeling
9	Application of artificial neural networks, repeated cross-validation and signal processing in chemometrics
10	Study On Artificial Neural Network Modeling For Dynamic Measurement Errors And Experiment Research