Font Size: a A A

Data Preprocessing And K-Means Clustering Based Support Vector Regression Model

Posted on:2013-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:W G ZhaoFull Text:PDF
GTID:2248330371987458Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In the practice of people’s production and life, forecasting of something is a work which is very rich in practical significance, where the accuracy is its lifeblood. How to improve the forecasting accuracy has been the focus of the study researchers. They usually take the means of improving the fitting accuracy of the prediction model to the original series, but if the data itself is a problem and thus can not correctly reflect the trend of the series, no matter how good the fitting accuracy is, the model is also likely to have a poor forecasting accuracy.In view of this situation, this paper attempts to improve the forecasting accuracy through data preprocessing, specifically, that is pre-detection of data jumps, excluding the outliers or noise reduction for the original series prior to forecasting. For the choice of the forecasting model, since the training set with high internal similarity can be more effectively simulated, this paper introduces a new algorithm, that is K-means clustering based least squares support vector regression (denoted by K-LSSVR). It first divides the training set into several categories according to the Euclidean distance of the input vectors using K-means clustering. Then it uses them respectively to train the LSSVR model. In the phase of forecasting, according to what category each input vector belongs to, K-LSSVR selects the corresponding LSSVR model to predict.Through the inspection of three simulations, we can find the forecasting accuracy of K-LSSVR is generally improved compared with LSSVR (especially when the data contains data jumps or outliers). What’s more, preprocessing for the data can even further improve the forecasting accuracy.
Keywords/Search Tags:Data preprocessing, Least squares support vector regression, K-meansclustering, Data jump, Outliers processing, EMD-based signal filtering
PDF Full Text Request
Related items