| In recent years,under the background of the global emphasis on passive investment,indexed investment has become an important development trend in the field of asset management.Investors want to hedge risks or make profits through stock index trading,and regulators want the market to run smoothly,all of which require correct judgment on the trend of stock market indexes.The CSI 300 Index is one of the most popular tracking indexes in the Chinese stock market index.Analysis and forecasting of its development trend will help the government to understand the market development situation in advance,avoid major risks in time,and promote the healthy development of the market;in addition,it can also provide assistance for readers to judge the timing of buying and selling index products.The commonly used forecasting models in the field of stock forecasting include artificial neural network(ANN),SVM and random forest.With the development of deep learning,deep learning models such as CNN,LSTM,etc.,have shown great advantages in computer vision and natural language processing,and have also been gradually applied to the study of financial time series.At the same time,due to the emergence of algorithms for automatic feature extraction,recent researchers are increasingly inclined to use deep learning models to study the stock market.However,whether it is the commonly used financial forecasting models in the past or the current deep learning models,most of the research on them only uses the technical indicators or historical price data of the A-share market,without considering the impact of macroeconomic environment and other stock markets on the A-share market,ignoring sources of information that could potentially improve the performance of stock index forecasting.This paper selects indicators from a variety of sources as the initial characteristics of predicting stock indexes,including the basic characteristics of the stock index itself,technical indicators,A-share market indicators,economic indicators,commodity prices,and the yields of other major stock indexes and their futures in the world,to predict the rise and fall of the 300 index the next day.In order to verify the effectiveness of the expanded feature set,three machine learning models,namely Logistic regression,multilayer perceptron and ANN,are used to predict the direction of the stock index.Before using the machine learning model,three dimensionality reduction techniques are introduced to the original features in order to process the data reasonably and extract better features from the original features as the input of the model.And compare the impact of different dimensionality reduction techniques on the model prediction results to find the most suitable dimensionality reduction method.Furthermore,to measure the effectiveness of the proposed dimensionality reduction techniques and machine learning models for predicting stock index direction,CNN and LSTM are used as benchmark models.The CNN model aggregates multiple information sources used in this thesis in order to automatically extract features to predict stock indices Direction,and the LSTM model uses the price and volume characteristic data of the CSI 300 index in the past 15 days to predict the ups and downs direction of the next trading day based on the performance of the stock index in the past time period.The prediction results of the combination of dimensionality reduction technology and three machine learning models are compared with the prediction results of t he CNN model and the LSTM model,and it is found that the prediction performance of ANN and multilayer perceptron is better than that of logistic regression.The prediction performance of the prediction methods using all 3 machine learning models is better than prediction using CNN or LSTM.The CNN model does not show better predictive ability than the machine learning model,the reason may be that the correlation between features is not strong and there is no obvious spatial relationship.The LSTM model has good prediction results for stock index prices,but its ability to predict the ups and downs of stock indexes is poor.In addition,by comparing with the prediction results in the reviewed literature,it is found that the prediction effect obtained by combining the dimensionality reduction technology with the machine learning model in this paper is ideal.Therefore,the idea of "improving the forecasting effect of stock index by using multiple information sources and expanding the feature set" proposed in this thesis is feasible. |