Font Size: a A A

Research On Multi-factor Stock Selection Model Based On Ensemble Learning

Posted on:2024-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:S W WangFull Text:PDF
GTID:2568307100488874Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the development of big data technology and machine learning techniques,ensemble learning algorithms,as important techniques in machine learning,have been widely applied in the field of multi-factor stock selection.Among them,classification and regression are two main methods in multi-factor stock selection.Ensemble algorithms can perform both classification and regression tasks.Currently,researchers in the relevant field mainly focus on the comparative analysis of different algorithms in the application of quantitative investment,without explicitly distinguishing whether the classification and regression applications of the same algorithm would have different effects on experimental results.Therefore,this paper aims to explore the different impacts on investment models when the same algorithm is applied for classification and regression in ensemble learning.Through reviewing relevant literature and analysis,it has been found that the most widely used excellent algorithms in ensemble learning for multi-factor stock selection are the Bagging-type Random Forest algorithm and the Boosting-type Light GBM algorithm.Therefore,this paper selects these two algorithms as representatives for research analysis.The main work and innovations of this paper are as follows:(1)By analyzing factor effectiveness and correlation,five effective factors have been constructed using the latest dataset,and these five factors have passed the factor effectiveness test.(2)Four investment models are constructed using the Random Forest algorithm and the Light GBM algorithm respectively for both classification and regression,and backtesting models that conform to the actual trading scenarios of A-share stock market are constructed.In the process of model construction,the latest effective factor combinations,datasets,and test sets developed by the author are uniformly used.The backtesting results show that the regression models of these two algorithms are generally superior to their corresponding binary classification models.Through research,it is found that the reason for this result is that the binary classification model simply divides stocks into two categories,ignoring the relationship between factors and returns at different levels,leading to the loss of a lot of relevant information,resulting in unsatisfactory backtesting results.(3)Optimization is performed on the traditional binary classification investment models.By discretizing the data labels of the regression models to achieve multiclassification of stock prices,two new investment models are constructed,and the same backtesting model is used for backtesting.The backtesting results show that the improved models are not only superior to the original binary classification models,but also superior to the original regression models.Through research,it is found that this is because the data labels of the investment models are returns,and the subtle differences between returns are not important for stock selection.Therefore,in the process of model training,the ordinary regression model,due to the continuity of its data labels,learns a lot of unnecessary noise.On the other hand,the regression model with discretized data labels can avoid learning meaningless noise data between the data and labels,and fully learn the necessary connection with practical significance between the two,thus achieving better stock selection performance.(4)Through this study,this paper verifies that the Light GBM algorithm still has superiority on the latest dataset.At the same time,this paper demonstrates that when the same ensemble learning algorithm is applied in the field of multi-factor stock selection,its regression models are generally superior to their corresponding binary classification models.In addition,we can also obtain better investment models by discretizing the data labels.
Keywords/Search Tags:Ensemble Learning, Classification and Regression, Multi-factor Stock Selection
PDF Full Text Request
Related items