| Since the 21 st century,with the rapid development of big data,artificial intelligence,especially financial machine learning,stock selection models have become more scientific,quantitative and intelligent.Multi-factor stock selection model is the most widely used quantitative model in the current market,and its advantage is that it can synthesize a lot of information to finally come up with a stock selection result.Multi-factor stock selection involves two core issues,one is the selection of factors,and the other is the comprehensive stock score.Traditional factor selection methods generally use scoring methods,principal component analysis and other methods.However,these methods suffer from relatively strong subjectivity or over-fitting problems.In recent years,with the development of machine learning,random forest(RF)algorithm is widely used for factor classification,which can effectively solve the overfitting problem of principal component analysis method,and use RF algorithm to effectively classify stock factors,extract the factors that have a greater impact on stock returns,and then apply these factors to the support vector machine(SVM)model for stock price prediction,from which to screen out It is worthwhile to study whether the combination of these two methods for stock selection is better than the SVM model and PCA-SVM model without factor screening.Based on the above considerations,this thesis chooses to conduct a multi-factor stock selection study based on the RF-SVM algorithm.This thesis follows the research line of literature review-theoretical analysis-research design-empirical study to conduct an in-depth study of the selected topic.First,the basic principles of RF algorithm dimensionality reduction and SVM algorithm for stock prediction and stock selection are analyzed;second,the basic process and steps of RFSVM strategy for stock selection are introduced;including selection of factor pool,data preprocessing,feature selection based on RF algorithm,establishment of SVM prediction model,prediction evaluation and backtesting effect comparison,etc.;third,the CSI 300 constituent stocks as the initial stock pool with an initial capital of 1 million,data from January 1,2011 to December 31,2014 as the training set,and data from January 1,2015 to December 31,2019 as the validation set,the RF algorithm was used to evaluate the importance of the feature variables,and a total of 22 factors were screened each time;the grid search method was used to SVM algorithm for the feature variables in the screening and stock forecasting,and selected 10 stocks from the stocks with higher predicted increase rate and better accuracy,and adjusted positions with frequencies of 30 days,90 days and 180 days,respectively,to examine the backtest results and compare them with the market benchmark returns,and compare them with the SVM model without feature vector screening and the SVM model with feature vector screening based on PCA analysis for the concurrent backtest results.Meanwhile,the CSI 500,with different time intervals and different number of factors,is also selected for testing the RF-SVM strategy to further examine the advantages of the strategy.The main findings:(1)Theoretically,in the selection of stock factors,although the scoring method and the principal component analysis method are more subjective,there are problems such as relatively strong subjectivity or overfitting,the use of RF algorithm in the classification of stock factors to a certain extent to avoid the occurrence of overfitting,suitable for multifactor stock selection of such high-dimensional data processing,with good generalization ability.The basic principle of SVM algorithm can be summarized as finding the best classification hyperplane in the feature space,maximizing the classification interval between different classes of features,and using this best classification hyperplane as the basis of classification.(2)The empirical study shows that whether the frequency of position transfer is 30 days,90 days or 180 days,a higher strategy return and excess return are obtained when compared with the benchmark return;as the time interval of position transfer increases,the strategy return and excess return decrease in order,indicating that stock investment using RF-SVM strategy is only suitable for short-term operation;compared with PCA-SVM strategy and SVM strategy without factor screening Compared with the SVM strategy without factor screening and the SVM strategy with the same data and the same operation,the RF-based reduction is the best and the PCA-based reduction is the second most effective in terms of dimensionality reduction.In terms of backtesting results,the RF-SVM strategy has higher returns and excess returns than the PCA-SVM strategy and the SVM strategy;meanwhile,applying the RF-SVM strategy to different stock pools and changing the number of factors and the backtesting time period,we find that its backtesting performance is still good and it achieves higher excess returns,which explains the superiority of this strategy from another aspect.In terms of multi-factor stock selection,this thesis uses the RF-SVM strategy for stock selection,which is innovative in terms of research methodology,research content and research perspective.The RF-SVM algorithm proposed in this thesis provides an effective tool for multi-factor quantitative stock selection and can provide a new perspective for designing and innovating quantitative stock selection models in the future. |