| CSI 300 index is a representative of the Chinese stock market, it presents the trend in Chin’s economic landscape. Moreover, it is the first stock index which representing total stock market in China, and thus is regarded as a core factor for private investors. Therefore, accurate CSI 300 index forecasts is playing a crucial role for both private investors and government supervisors.This paper is aim to build a combined forecasting model which involving analysis of influencing factors of CSI 300 index, noise cancellation algorithm, selection of input variables and support vector regression (SVR). For the nonlinearity of financial time series, we use support vector regression model which is of nonlinear mapping capability; for the noise in the time series, we use ensemble empirical mode decomposition algorithm (EEMD) to extract noise and remove it. To address the one-sidedness of few influencing variables of CSI 300 index, enough variables is selected to forecast CSI 300 index. A large number of financial variables are likely to contain same information. To address this problem, we use variance inflating factor (VIF) to diagnose the repeating information and remove it. Furthermore, China’s stock market is susceptible to China’s policy and thus is extremely volatile. We use an outlier diagnosis algorithm based on attribute entry (ODAE) to remove the volatile and then smooth the data. Finally, to simplify the structure of the hybrid model and speed up its computing time, we also use a selection algorithm of input variables, named mean impact value method (MIV), matching the condition of a nonlinear model of SVR model. A series of strategies are used for improving the forecasting accuracy of CSI 300 index. With those insight, we built a rational hybrid forecasting model, named ODAE-VIF-EEMD-MIV-SVR model, which is matching the environment of China’s stock market.To examine the performance of the combined model, a real experiment is performed. The date is range from April 16,2010 to December 31,2014, the training dataset is from April 16, 2010 to December 31,2013, and the remaining data is the test dataset. The experiment show that: (1) ODAE-VIF-EEMD-MIV-SVR model perform best, the mean absolute percentage error(MAPE), mean absolute error(MAE), normalized mean square error (NMSE) and index of agreement(IA) are 1.3860,26.3832,3.6667*10-4 and 0.9916 respectively. The relative error of this model is a white noise series with mean of 0. (2) the contribution of every single strategy is descending as VIF, EEMD, ODAE, MIV; (3) the classified accuracy of advance-decline derived from the forecasts is approximately 60% and thus is useful for private investors. |