Q-learning Based On Semi-markov Process And Its Application In Quantitative Investment

Posted on:2023-04-29

Degree:Master

Type:Thesis

Country:China

Candidate:Q Zhou

Full Text:PDF

GTID:2569306623495944

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

In recent years,programmed investment management using reinforcement learning methods is a popular research direction in the field of financial investment.In this paper,we combine semi-markov process(SMP)theory and reinforcement learning methods to construct a programmed stock trading model to guide investors’ investment practices.Firstly,this paper describes the theoretical basis of model construction,including Q-learning of markov decision process(MDP),Q-learning of SMP,and related theories including K-means algorithm.Among them,the K-means algorithm is used to construct discrete environmental states.In particular,compared with the Q-learning of MDP,the Q-learning of SMP considers not only the current state of the environment,but also the residence time of the state.Secondly,the key to constructing the SMP-based Q-learning model in this paper includes environment state setting,action setting and reward setting.20 new energy stocks are randomly selected for training and backtesting,and the results are compared with the MDP-based Q-learning model and the buy-and-hold model,using cumulative return,annualized return and sharpe ratio to measure the model performance.The empirical results show that both Q-learning models achieve robust returns in the individual stock market when K-means is clustered into 3 classes,and are much more risk-resistant than the buy-and-hold strategy,with the average cumulative return(116.54%),average annualized return(30.17%),and average Sharpe ratio(66.65%)of the SMP-based Q-learning model exceeding those of the MDP-based Q-learning model.learning model(47.62%),the average cumulative return(14.37%),and the average Sharpe ratio(42.25%),indicating that the SMP-based Q-learning model can achieve higher returns when taking the same risk.The same conclusion was obtained when Kmeans clustered 6 classes and 9 classes,where the average cumulative return,average annualized return and average sharpe ratio of the SMP-based Q-learning model were the largest when clustered 9 classes,and the model was the most effective.Finally,in order to improve the cumulative return of the Q-learning model of SMP,feature screening is performed from a total of 10 features containing financial indicators,technical indicators and macroeconomic indicators based on the principle of maximizing cumulative return and the ranking results of each feature combination,and the optimal feature combinations are V(daily volume change rate)among trading indicators,PE(price-earnings ratio)among financial indicators,MACD(convergencedivergence moving average)and ADO(accumulation/distribution oscillator)in technical indicators,and IBO007(interest rate)in macroeconomic indicators.The empirical results show that the model improves in terms of returns and stability based on the combination of these five features.Therefore,the SMP-based Q-learning model constructed in this paper has some reference value for institutional or individual investors to develop programmed investment management programs.

Keywords/Search Tags:

Reinforcement learning, Quantitative investment, Semi-markov process, Q-learning, K-means

PDF Full Text Request

Related items

1	Dynamic Pricing In Electronic Retail Markets By Reinforcement Learning
2	Research On Stock Portfolio Strategy Based On Supervised Learning And Deep Reinforcement Learning
3	Research On Reinforcement Learning Based Order Acceptance Model In Make-to-Order Enterprises
4	An Empirical Research On The Investment Strategy Of Stock Market Based On Deep Reinforcement Learning Model
5	Using Reinforcement Learning To Study The Features Of The Participantsâ€™ Behavior In Wholesale Power Market
6	Research On Lots Optimization Of Online Flower Sequential Auction Based On Reinforcement Learning
7	Heuristic Exercise Recommendation Model Based On Deep Reinforcement Learning
8	Model And Simulation Of Stock Market Investor Behavior Evolution Based On Complex Network
9	Research On Vehicle Routing Problem Algorithm Based On Deep Reinforcement Learning
10	Research On The Method Of Deep Learning Based On Semi-supervised And Its Applications In High Frequency Futures Trading