Font Size: a A A

Research On Mean-Variance Portfolio Selection Problems Based On Exploratory Entropy Regularization Framework

Posted on:2024-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y T WeiFull Text:PDF
GTID:2568306923472974Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous maturity of computer technology,the intelligent tools have been widely used in industry,military,finance and other fields.The application of reinforcement learning which is more active in the field of artificial intelligence,in the field of control has gradually become a research hotspot of scholars at home and abroad.For example,Wang et al.[1]first established an exploratory entropy regularization framework by using reinforcement learning and stochastic control theories.The authors used it to solve stochastic control problems.The portfolio selection problem is a common practical application problem in the field of stochastic control.Its main purpose is to select the optimal investment strategy within the risk tolerance of investors to achieve the expected return of investors.However,for portfolio selection problems based on meanvariance criterion,because the variance of terminal wealth is not linear in expectation,most of the well-known reinforcement learning methods cannot be directly applied.In order to overcome this difficulty,Wang and Zhou[2]used the framework in literature[1]to study mean-variance portfolio selection problems with fixed investment duration under the background of reinforcement learning.However,due to some unexpected events,investors may withdraw from investment activities before the termination time.That is to say,the exit time of investors is random.Therefore,it is of great practical significance to further study the portfolio selection problem with random exit time by combining stochastic control theory and reinforcement learning.On the other hand,the meanvariance criterion will lead to an expected nonlinear term in the objective function.Thus the optimal strategy may be time-inconsistent.That is,the optimal strategy at the initial moment may no longer be optimal at the later moment.However,most investors prefer a strategy that is optimal at all time.Thus,it is necessary to obtain a timeconsistent strategy of mean-variance portfolio selection problems under the background of reinforcement learning.Therefore,this paper firstly studies the random time horizon mean-variance portfolio selection problem and its pre-commitment strategy under the framework of exploratory entropy regularization.Secondly,the time-consistent equilibrium strategy for mean-variance problem is studied under the exploratory framework.The main research contents are as follows:(1)This paper studies a mean-variance portfolio selection problem with random exit time and its pre-commitment strategy under the exploratory entropy regularization framework,in which investors can exit randomly,and their goal is to maximize the expected average return and minimize the variance at exit.Firstly,by constructing an auxiliary problem corresponding to the exploratory mean-variance portfolio selection problem,a multi-objective optimization problem is transformed into a single-objective optimization problem.Secondly,by using the probability distribution of exit time,the problem with random time horizon is transformed into a problem with deterministic time horizon.Subsequently,the optimal strategy under the pre-commitment assumption is obtained according to the principle of dynamic programming.Finally,the solvability equivalence between the exploratory random time horizon problem and the classical problem is analyzed,and the effectiveness of the established exploratory regularization framework under the random time horizon mean-variance problem is illustrated,which lays a theoretical foundation for the application of reinforcement learning in this kind of random time horizon mean-variance problems.(2)This paper studies a mean-variance portfolio selection problem and its time-consistent equilibrium strategy under the exploratory entropy regularization framework.Since the problem of portfolio selection based on the mean-variance criterion will have timeinconsistencies,the classical Bellman optimality principle no longer applies.Firstly,for the problems with risk preference being constant and being state-dependent,the extended HJB equations are given respectively.Secondly,the corresponding time-consistent equilibrium strategies for exploratory problems are given respectively by means of the extended HJB equation and the Lagrange multiplier method.Finally,the solvability equivalence between the exploratory mean-variance portfolio selection problem and the classical problem is analyzed,and the effectiveness of the established exploratory entropy regularization framework under the time-inconsistent mean-variance problem is demonstrated.This lays a theoretical foundation for the application of reinforcement learning in this kind of time-inconsistent mean-variance problem.
Keywords/Search Tags:Entropy regularization, mean-variance, portfolio selection problem, random time horizon, time-inconsistent
PDF Full Text Request
Related items