| With the development of Chinese financial and securities market,the total management scale of asset management industry is gradually growing.To build an efficient quantitative financial model scientifically and accurately is the concern of almost all investment institutions.Since portfolio can be regarded as a Markov decision process,reinforcement learning can be used to make trading decisions.Only using reinforcement learning may be affected by the noise in financial data,which makes the learning ability of the model poor.In addition,in portfolio management,the correlation information between assets is very important,which can improve the performance of reinforcement learning portfolio model.This paper uses denosing autoencoder to achieve financial data de-noising,the reinforcement learning strategy distribution is improved,and constructs a two-stream Network to learn the correlation characteristics between assets.Add Gaussian noise to the normalized stock data,use convolutional network as the encoder,and take the output of the encoder as the result of de-noising.Experiments show the effectiveness of the module.After adding the denoising autoencoder,The cumulative return of policy gradient algorithm increased from 82.0%to 136.3%.Secondly,in the portfolio framework of this paper,the reinforcement learning algorithm of random strategy is used,and the strategy distribution is improved to Gaussian mixture distribution,which fully fits the actual strategy distribution and increases the exploratory ability.Experiments show the effectiveness of the module.After improving the strategy distribution,its cumulative return is increased from 82.0%to 105.2%.In addition,a two-stream policy network is constructed,in which three components are constructed:asset price sequence information network,assets correlation information network and decision network.State features are input into asset price sequence information network and assets correlation information network respectively to learn single asset sequence features and assets portfolio correlation features.Finally,features in two-stream network are superimposed to learn in decision network.Ablation study shows the effectiveness of the network structure.After the improvement,the cumulative yield of actor-critical algorithm increases from 123.7%to 155.9%.Finally,the backtesting period of the improved reinforcement learning portfolio model in China market,the cumulative return rate is 186.6%,the annual return rate is 98.7%,the sharp ratio is 0.656,and the maxdrawdown rate is 0.201.Compared with other methods,the model is superior to other models except for the maxdrawdown rate index,and the maxdrawdown rate is higher than Yu strategy. |