Font Size: a A A

Research On Portfolio Management Based On Generative Adversarial Networks And Policy Gradient

Posted on:2022-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2518306551470654Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
At present,the use of deep reinforcement learning for portfolio management is a hot research direction,but there are two common problems in previous research methods.First,the episodes data for training of reinforcement learning agents are obtained from historical data.At this time,due to the limitation of historical data,the state space of the environment will be restricted,which makes it difficult for agents to fully explore in the environment.Second,previous studies usually model the problem of portfolio management as a standard Markov Decision Process,and only use the latest time window data to guide trading decisions,which ignores the long-term dependence of trading decisions on data from more previous time steps.To solve these problems,the main work of this paper is as follows:Firstly,to address the problem that the state space of the environment is constrained by historical data,this paper proposes a Financial Data Augmentation Generative Adversarial Networks(FDA-GAN),which uses WGAN-GP as the foundation architecture,uses temporal convolutional networks as generators and discriminators,and uses the Lambert W×F_x framework to perform gaussian processing on the data.Compared with some benchmark generation models,experimental results show that FDA-GAN can generate episodes data with higher quality.Secondly,to address the problem that portfolio management trading decisions may rely on more previous time step data for a long time,this paper models portfolio management as a Partially Observable Markov Decision Process(POMDP),and defines the observation space,state space,action space and reward function of portfolio management problems under the modeling of POMDP.Then this paper proposes a Recurrent Policy Network based Policy Gradient algorithm(RPN-PG)portfolio management model,in which the Recurrent Policy Network is a concrete realization of the POMDP modeling.Thirdly,this paper combines FDA-GAN and RPN-PG to propose a Data augmentation Recurrent Policy Gradient(DRPG)model for portfolio management.In this paper,the DRPG model is used to carry out experiments in two portfolios.The experimental results show that both Data Augmentation and POMDP modeling can improve the return of the portfolio.Compared with the PG model,the DRPG model has increased the Annualized Return of the two portfolios by 8.33%and 11.23%,respectively.The Sharpe ratio and Sortino ratio have also been improved to varying degrees,and Data Augmentation can reduce the Maximum Drawdown.This fully verifies the effectiveness of the DRPG model proposed in this paper in portfolio management.
Keywords/Search Tags:Portfolio Management, Generative Adversarial Networks, Policy Gradient, Data Augmentation, Partially Observable Markov Decision Process
PDF Full Text Request
Related items