Font Size: a A A

Research On Stock Selection Model Based On Ensemble Learning

Posted on:2021-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ChenFull Text:PDF
GTID:2518306107454064Subject:Master of Finance
Abstract/Summary:PDF Full Text Request
Due to the excellent performance of machine learning models in information mining and nonlinear information acquisition,the combination of multi-factor stock selection models and machine learning models is gradually gaining attention in quantitative stock selection models,and is widely used with the improvement of hardware.The ensemble learning algorithm is one of the machine learning algorithms.By repeatedly optimizing many weak learners and finally combining them to obtain a strong learner,the final model performance can be greatly improved,and it has high accuracy and stability in classification tasks.In this paper,the purpose of stock selection is set to distinguish between rise and fall,so as to construct a classification problem.Based on the Adaboost classification algorithm in the Boosting cluster and the Random Forest algorithm in the Bagging cluster,the quantitative stock selection model is constructed by combining the multi-factor stock selection model with each one of them.And then we analysis of the results of the two ensemble learning algorithms comparatively.First we need to build a factor library and select effective factors in quantitative stock selection model construction.Most of the factor data used for training today is based on daily frequency data.The increase in mining and combination of similar data information has led to a decline in factor performance in market applications.In this paper,factors are constructed based on high-frequency data,in order to obtain valid information.And factor library was constructed based on momentum-type factors,volatility-type factors,other technical indicators and some indicators in Alpha101.By using a ranking and scoring method,we select 20 factors from the library.And according to the result of correlation test,11 principal components were extracted as the final effective factor by principal component analysis.The model performance and stability are measured by the mean and variance of AUC sequence respectively.The validation of base learner is tested before base learners are applied to the ensemble learning algorithm,and the results show that SVM has a better performance than logistic regression and decision tree models.But SVM and logistic regression are not applied to ensemble learning algorithm owing to the hug cost of training time.After testing and backtesting the ensemble learning model,the final results show that the Adaboost model has a stronger profitability,with an annualized return of 18.6%,which is higher than the 17.4% of the Random Forest model.And the Random Forest model has better stability with better performance in terms of information ratio and maximum drawdown than Adaboost.
Keywords/Search Tags:Multi-factor stock selection model, Adaboost, Random Forest, High frequency data
PDF Full Text Request
Related items