Research On Portfolio Management Based On Generative Adversarial Networks And Policy Gradient

Posted on:2022-02-26

Degree:Master

Type:Thesis

Country:China

Candidate:K Wang

Full Text:PDF

GTID:2518306551470654

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

At present,the use of deep reinforcement learning for portfolio management is a hot research direction,but there are two common problems in previous research methods.First,the episodes data for training of reinforcement learning agents are obtained from historical data.At this time,due to the limitation of historical data,the state space of the environment will be restricted,which makes it difficult for agents to fully explore in the environment.Second,previous studies usually model the problem of portfolio management as a standard Markov Decision Process,and only use the latest time window data to guide trading decisions,which ignores the long-term dependence of trading decisions on data from more previous time steps.To solve these problems,the main work of this paper is as follows:Firstly,to address the problem that the state space of the environment is constrained by historical data,this paper proposes a Financial Data Augmentation Generative Adversarial Networks(FDA-GAN),which uses WGAN-GP as the foundation architecture,uses temporal convolutional networks as generators and discriminators,and uses the Lambert W�F_x framework to perform gaussian processing on the data.Compared with some benchmark generation models,experimental results show that FDA-GAN can generate episodes data with higher quality.Secondly,to address the problem that portfolio management trading decisions may rely on more previous time step data for a long time,this paper models portfolio management as a Partially Observable Markov Decision Process(POMDP),and defines the observation space,state space,action space and reward function of portfolio management problems under the modeling of POMDP.Then this paper proposes a Recurrent Policy Network based Policy Gradient algorithm(RPN-PG)portfolio management model,in which the Recurrent Policy Network is a concrete realization of the POMDP modeling.Thirdly,this paper combines FDA-GAN and RPN-PG to propose a Data augmentation Recurrent Policy Gradient(DRPG)model for portfolio management.In this paper,the DRPG model is used to carry out experiments in two portfolios.The experimental results show that both Data Augmentation and POMDP modeling can improve the return of the portfolio.Compared with the PG model,the DRPG model has increased the Annualized Return of the two portfolios by 8.33%and 11.23%,respectively.The Sharpe ratio and Sortino ratio have also been improved to varying degrees,and Data Augmentation can reduce the Maximum Drawdown.This fully verifies the effectiveness of the DRPG model proposed in this paper in portfolio management.

Keywords/Search Tags:

Portfolio Management, Generative Adversarial Networks, Policy Gradient, Data Augmentation, Partially Observable Markov Decision Process

PDF Full Text Request

Related items

1	Heuristic Learning Model Based On Partially Observable Markov Decision Process
2	Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments
3	Deep Value Iteration Network For Partially Observable Markov Decision Process
4	Research On Optimization Of Service Composition Based On Partially Observable Environment
5	The Design And Implementation Of Point-based POMDP Policy Iteration Algorithm
6	Markov Theory Based Planning And Sensing Under Uncertainty
7	Research On Path Planning Based On Markov Decision Process For AUV
8	Theories, Algortihms And Applications Of Policy Gradient Reinforcement Learning
9	Energy-Efficient Transmission Strategy For Wireless Sensor Networks
10	Hierarchical learning and planning in partially observable Markov decision processes