Research On The Quantitative Trading Strategy Based On Deep Policy Gradient Methods

Posted on:2020-03-30

Degree:Master

Type:Thesis

Country:China

Candidate:J K Li

Full Text:PDF

GTID:2428330620960074

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Value-based methods and policy-based methods are the main methods of deep reinforcement learning applied to quantitative trading strategies.Although the representative deep Q learning of value-based methods makes a good profit in monotonic market environment,it loses a lot when trends change.The representative deep recurrent reinforcement learning of policybased methods behaves much better in fluctuating market environment,however,the need of discretizing outputs when making decisions and the lack of value function to correct the updating direction of parameters when training lower the model capacity and hence reduce the profits.To make more profits,we show how to apply the deep actor critic methods to quantitative trading strategies focusing on increasing the model capacity,improving the ability to adapt to new trends and accelerating convergence.To achieve the above goals,we propose a quantitative trading strategy model based on deep policy gradient methods called deep actor critic trading(DACT).Firstly,we propose DACT with state value(DACT-SV)which apply deep actor critic methods to quantitative trading strategies to improve the model capacity and the model adaptability.Secondly,we propose DACT with q value(DACT-QV)which substitute the state value network with q value network.And to generalize better,we share the LSTM network which extract the financial environment features;to improve the adaptability further,we do the internal bagging on the q network and the policy network;to speed up the convergence,we adopt the parallel exploration mechanism.Finally,we verify the effectiveness of DACT by comparing it with deep Q trading(DQT)and deep recurrent reinforcement trading(DRRT)on the stock index data SSE 50,CSI 300 and CSI 500.The main innovations and contributions of our work are as follows:1)Implementations and improvements of DACT-SV.We apply the deep actor critic method to trading problems.For the regularization purpose,a shared LSTM network is adopted to do feature extraction.Experiment results show that the daily average profit of vanilla DACT-SV on the CSI 300 from 2013 to 2018 is 1.61 points,the one uses different LSTM makes 0.34 points more and the one uses shared LSTM makes 0.33 further.2)Implementations and improvements of DACT-QV.We substitute the state value network in DACT-SV with q value network.And parallel exploration is used to accelerate training process,voting bagging is used to improve the model adaptability.Experiment results show that the daily average profit of DACT-QV is 2.14 points training each round 20 epochs,which is comparable to DACT-SV training each round 100 epochs,and it spends only a quarter time.3)Comparison with DQT and DRRT.Experiment results show that the daily average profit of DACT on CSI 300 from 2005 to 2018 is 2.67 points,1.46 points more than DQT,1.02 points more than DRRT,on SSE 50 from 2004 to 2018 is 2.28 points,1.17 points more than DQT,0.56 points more than DRRT,and on CSI 500 from 2007 to 2018 is 5.38 points,3.5 points more than DQT,1.6 points more than DRRT.

Keywords/Search Tags:

trading strategy, actor critic, deep policy gradient methods, reinforcement learning

PDF Full Text Request

Related items

1	Exdloratory Action Correction Algorithm Based On Actor-Critic
2	Robust Policy Gadient Algorithm Based On Actor-Critic In Deep Reinforcement Learning
3	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
4	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
5	Research On Deep Reinforcement Learning Methods For Autonomous Grasping Control Of Robots
6	Research On Deterministic Policy Gradient Algorithms With Continuous Control Task
7	Option Learning Method Research With Double Actor-Critic Architecture
8	Research On Policy-Constrained Reinforcement Learning
9	Aero-engine Intelligent Control Based On Reinforcement Learning
10	Research On Non-parametric Function Approximation Methods In Continuous Spaces