| In the context of the rapid development of big data artificial intelligence.Nowadays,in the face of the characteristics of stock data time series and the randomness of fluctuations,traditional multi-factor stock selection models can no longer predict stock price trends well,while machine learning models can handle nonlinear time series well,which is very suitable for predicting stock price trends.Based on the research of previous people,this paper proposes a multi-factor stock selection strategy that uses machine learning models(SVR and GA-LSTM)instead of traditional linear models to predict stock price trends.In addition to employing more advanced machine learning models,this article also improves the accuracy of multifactor stock selection strategies by screening for more comprehensive factors.The innovation of this paper lies in the use of machine learning models instead of traditional linear models,and the introduction of more comprehensive factor screening methods to improve the accuracy of multi-factor stock selection strategies.In this paper,all constituents of the CSI 300 Index were studied from 2016 to 2020,in which the first three years were used for factor screening and model training,and the last two years were used for empirical analysis.In order to maximize the coverage of candidate factors,eight categories of factors in the broadening factor library were selected,including quality,foundation,sentiment,growth,risk,style,momentum,and technology,plus some factors calculated financially,for a total of 276 factors.After preprocessing the factor data,36 factors with certain predictive properties on the future returns of stocks were finally screened out through F-test,mutual information method and random forest embedding method,forming a new factor pool.Next,a new factor pool is used to build a multi-factor stock selection model separately.In order to avoid the influence of multicollinearity between factors on the model results,principal component analysis is used to reduce the dimensionality of factor data,and then a multi-factor stock selection model is constructed based on the linear regression model.Grid search and cross-validation were used to optimize the parameters of penalty coefficient C and kernel function coefficient gamma,and an SVR multi-factor stock selection model with C=10 and gamma=0.1 was constructed.The genetic algorithm encoded by real numbers was used to optimize the number of layers of LSTM,the number of layers of fully connected layers and the number of neurons in each layer,and the number of LSTM layers was constructed to be 2 layers,the number of neurons in the first layer was 72,and the number of neurons in the second layer was 112.GA-LSTM multi-factor stock selection model with 1 layer and 45 neurons.The three multi-factor stock selection models constructed will be backtested to screen out portfolios with higher returns in the future to obtain excess returns.After all the parameter optimization and model construction work is completed,the backtesting of each model is comprehensively evaluated by maximum drawdown rate,annualized rate of return,and Sharpe ratio.The results show that the backtesting results of the three models have excess returns compared with the benchmark,among which the nonlinear regression model SVR performs better than the traditional linear regression model,while the LSTM neural network model performs best and shows better prediction ability. |