Font Size: a A A

Comparative Study On The Machine Learning Prediction Methods And The Statistical Modeling Prediction Methods

Posted on:2017-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:H M LiFull Text:PDF
GTID:2308330503473256Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
This paper makes comparisons between three machine leaning methods and statistical modeling methods on multivariate time series data and longitudinal data and multicollinearity data. This study investigates the multivariate time series dataset containing the highest and lowest temperatures of 16 cities around the world(total 32variables) by using VARX and five other methods including three machine learning methods for cross-section data and OLS methods with R software. This study mainly makes comparisons between VARX and the other methods on long-term to short-term forecasting with every variable to be dependent variable alternately. As the outcome,VARX method is inferior in general to most machine learning methods except the neural networks for most of the long-term forecasting. Here long-term forecasting corresponds to the larger size of training sets and smaller size of testing sets in machine learning terminology. However, the results also show for certain dependent variables especially for short-term forecasting VARX is comparatively better. In many cases even OLS could behaves very well. Therefore the selection of the best forecasting method must depends on both which variable to be dependent variable and whether to do long-term or short-term forecasting.This study makes prediction comparison between artificial neural network and statistical modeling methods for a data about Parkinson’s disease and a diabetes data. On the first dataset, linear mixed-effects model(lme model) and neural network are used for 95 different sizes of training sets to produce 95 normalized mean squares errors of prediction, and it is discovered that the neural network method is highly superior than lme model for any size of training set. On the second dataset, traditional parametric methods such as ridge regression, lasso and adaptive lasso methods and not-so-traditional PLS method are used in comparison with neural network method, and via 10-fold cross validation prediction results show the four traditional methods are much inferior than neural network although PLS is slightly superior than the others parametric methods.This paper contributes new and important reference value to comparative study for comparing case, at the same time provide the beneficial reference to the actual workers.All calculations of this paper have been basaed on the statistical software R.
Keywords/Search Tags:statistical modeling methods, random forests, mboost, artificial neural network, cross validation, normalized mean squares errors
PDF Full Text Request
Related items