Font Size: a A A

Research On State Space Analysis Method For Deep Reinforcement Learning

Posted on:2024-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2568307064986089Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning methods have made significant progress in solving continuous decision-making tasks and have been widely applied in areas such as intelligent healthcare,autonomous driving,and intelligent military.However,the low training efficiency of intelligent agents makes it difficult to apply deep reinforcement learning technology to complex tasks.The key reason is that the information collected by the intelligent agent is insufficiently mined and the state space exploration is incomplete,which makes it difficult for the intelligent agent to learn effective policies in a short time.To address these issues,this study analyzes the problem of state space in deep reinforcement learning from two perspectives: 1)state representation learning:how to use the model network to learn effective temporal representation vectors,thereby improving the model’s ability to extract state information features? 2)State space exploration: complex environmental rules and high-dimensional state space will increase the difficulty of intelligent agents exploring new strategies.How to construct effective state feature relationships to simplify the state space search process? The main contributions of this article are as follows:(1)Proposed a time-domain state representation learning method based on Fourier transform.The state observation trajectory collected by deep reinforcement learning is a vector signal with obvious temporal characteristics.The current method of processing state sequences is usually to use neural networks to aggregate and encode multi-step feature vectors into a hidden vector representation.However,the performance of neural networks in processing temporal representation vectors is limited.When the state space is large and the state variables are continuous,the hidden vector aggregated by the neural network may lose some feature information of the current time step.To address these issues,this article proposes a multi-frequency time-domain state representation method based on Fourier transform,which can effectively distinguish historical sequence features and immediate state features and output effective time-domain state representations.Firstly,this method considers that the time-domain sequence mainly contains two main signals: historical information and immediate information.It extracts state sequence signals of different frequency ranges through a filtering method based on fast Fourier transform.In addition,a mask generation mechanism is designed to assign weights to state signals of different frequency ranges and generate new state representations.By inputting these clearly featured multi-frequency state signals,the learning efficiency of neural networks can be effectively improved.The experimental results show that by preprocessing the time-domain state signal with Fourier analysis,different frequency band representation vectors can be extracted,which can improve the representation learning ability of neural networks and help intelligent agents obtain higher scores in different task environments.(2)Proposed a heuristic state space exploration method based on the stage state graph.Deep reinforcement learning algorithms usually need to evaluate the value of different state information and optimize the policy model of intelligent agents through the value function.However,due to the uneven probability distribution of state transitions in the state space,some states are difficult to be sampled by intelligent agents.The inaccurate value estimation of these states will reduce the performance of the intelligent agent’s policy.To address these issues,this article proposes a heuristic state space exploration method based on the stage state graph,which organizes the state graph data structure by statistically analyzing intelligent agent sample information and provides heuristic guidance for the exploration process of intelligent agents through path planning.Firstly,a graph structure data structure is maintained based on sample signals,and the target vector representation is generated according to the direction of exploration planned by the graph structure.Secondly,a super network is designed to generate diverse sub-policies based on the target vector.Finally,we use the coverage rate of different algorithms in a complex maze space environment and the average distance to the end point to illustrate the differences between different algorithms in detail.The experimental results show that using the state graph to guide the exploration process of the intelligent agent can effectively improve the efficiency of exploration and reduce the impact of inaccurate value estimation on the policy performance.
Keywords/Search Tags:Deep reinforcement learning, Temporal representation learning, State space exploration, Fourier transform, State graph construction
PDF Full Text Request
Related items