| Reinforcement learning,as a branch of machine learning,trains agents to interact with the environment to obtain the strategy that maximizes the cumulative reward.Deep learning has powerful function approximation ability and representation learning characteristics,which can help reinforcement learning to make decisions in highdimensional and complex scenarios.Deep reinforcement learning combines the perception ability of deep learning with the decision-making ability of reinforcement learning to achieve end-to-end intelligent decision making.Deep reinforcement learning has been widely used in a variety of control problems,such as recommendation systems,resource management,robot control,game AI,etc.The free-model algorithm based on online policy is one of the main types of deep reinforcement learning.It refers to the algorithm that the agent obtains data through real-time interaction with the environment and learns policies directly from the data.The quality of the data obtained by an agent during its interaction with the environment determines the level of strategy it can learn.In the process of interaction,the method of exploring the environment of an agent determines the quality of interactive data,thus affecting the learned strategies.Only sufficient exploration can acquire good strategies,but excessive exploration will lead to a slow training process.In order to realize better exploration of model-free deep reinforcement learning algorithm based on online policy,so that agents can learn better strategies.In this paper,the fuzzy DQN algorithm based on stable exploration is firstly proposed.The algorithm introduces parametric noise network,and uses it to add noise to the parameter field to increase the exploration ability of the agent.At the same time,combining with fuzzy theory,the state action value is processed by fuzzy neural network,which makes the agent explores environment more table and can quickly convergence during the training process.Furthermore,a fuzzy DQN algorithm based on feature fusion is proposed.While extracting high-dimensional features of states based on parametric noise network,fuzzy neural network is introduced to extract low-dimensional features,which makes feature extraction more sufficient,thus obtaining higher rewards and learning better strategies.Finally,an intelligent virtual environment is designed and constructed to verify the exploration ability of the two algorithms in different state action spaces. |