Font Size: a A A

A Two-stage Stock Selection Study Based On WHSBoost Classification Model And TALSTM Prediction Model

Posted on:2020-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:X F LiuFull Text:PDF
GTID:2439330575994919Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the continuous development of the stock market,people gradually take investing in stocks as one of the important methods of financial management.How to select high quality stocks with investment potential and low risk efficiently has become an important problem for investors to solve.In order to maximize the utility of investment and build a reasonable and efficient stock selection model,this paper combines the analysis of long-term intrinsic value changes and short-term price fluctuations of stocks to conduct two-stage stock selection research based on machine learning method and deep learning method.Two-stage stock selection method includes two stages:constructing long-term value high-quality stock pool through stock classification and conducting short-term stock price prediction and re-screening for each stock in the high-quality stock pool.In the first stage,this paper proposes a new ensemble algorithm based on hybrid sampling,Weighted-Hybrid-Sampling-Boost algorithm(WHSBoost).WHSBoost is value stock classification model,which solves the problem of data imbalance in value stock classification,and constructs a long-term value high-quality stock pool.For value stock classification model,we choose 45 financial indicators of self-profitability,debt paying ability,operating ability,development ability,cash flow ability and per share index as model features.We use earnings per share to obtain stock classification label.We compare the classification effects of SMOTE algorithm,SMOTEBoost algorithm,RUSBoost algorithm,HSBoost algorithm and WHSBoost algorithm based on three basic classifications such as support vector machine(SVM)decision tree(DT)and naive bayes(NB).Then we choose WHSBoost-DT model as the value stock classification model.The model accuracy was 86.8%and the AUC value was 0.927,which was obviously higher than other methods.Finally,we use WHSBoost-DT model and 2018 financial data to get 2019 value stock pool,which has 387 stocks.In the second stage,to predict the future price of each stock in the stock pool,we first propose a new LSTM model based on trend attention mechanism,TALSTM model.Features of TALSTM model are composed of endogenous variables and exogenous variables.The historical closing price,historical opening price,historical high price and historical low price of the stock to be forecasted are endogenous variables.Closing price of stock index and related stocks based on Copula function are exogenous variables.TALSTM model firstly uses attention mechanism to reconstruct different features,and then uses trend attention mechanism based on stock trend adaptive function to reconstruct different moments.Empirical analysis on stocks proves that TALSTM model is superior to SVM,KNN,LSTM,Attention-LSTM and other time series prediction models in different stocks and evaluation criteria.Finally,we use the TALSTM model to further screen the quality stocks in the value stock pool to get the top 10 quality stocks with the best gains on April 11,2019.In WHSBoost classification model,the weight of the sample is first introduced to the data sampling algorithm.WHSBoost improves original ensemble algorithm of imbalanced data by improving resampling with sample weight process in ensemble algorithm.WHSBoost improves the performance of the integrated algorithm to classify imbalanced data.In the TALSTM prediction model,this paper innovatively proposes trend attention mechanism for financial time series,and carries out coding reconstruction for different characteristics and different moments by attention mechanism and trend attention mechanism.A two-stage stock selection study in this paper take into account the demand of investors for long-term operation and short-term operation,which has certain practical significance.
Keywords/Search Tags:Imbalance data classification, Ensemble algorithm, Trend attention mechanism, Long Short-Term Memory
PDF Full Text Request
Related items