Font Size: a A A

Mix-up Consistent Cross Representations For Data-Efficient Reinforcement Learning

Posted on:2024-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:S Y LiuFull Text:PDF
GTID:2568307052495854Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the development of deep neural networks,deep reinforcement learning has made remarkable performance on sequential decision problems,and in particular,has made significant progress on a range of challenging decision tasks,such as Go,robot control,and even in real-time strategy games such as Do TA2 and Star Craft.While deep reinforcement learning excels at solving problems where tasks containing large amounts of data can be collected through almost unlimited interaction with the environment,however,it is often difficult to extract task-relevant semantic information when interacting with limited data in the environment.Therefore,it is a challenge to construct efficient and stable reinforcement learning algorithms for sequential decision tasks with limited state observation in the environment.To address this problem,this paper proposes hybrid consistent cross-representation as a self-supervised auxiliary task to improve data efficiency and encourage prediction of future features.The main contributions of this paper are as follows:1.Increase the number of projection heads to calculate the contrast loss between low and high dimensional representations of different state observations.The nature of the more informative and invariant representations are extracted using the lower and higher layer networks,respectively,and the contrast loss computed between corresponding layers is replaced by using the cross-contrast loss to enhance the mutual information between states,thus improving data efficiency.2.A Mix-up strategy is used to generate intermediate samples and design the corresponding reward information.The mixed information samples of adjacent time step states are constructed using the hybrid strategy,and the supervised signal is designed to be the maximum of the features corresponding to the position extracted from the samples before construction.By maximizing their similarity,it is able to increase the data diversity and improve the smoothness of the representation prediction of nearby time steps.3.Design algorithms to use pre-trained encoders for the current task and compare their effect of generalizing to tasks not seen in the environment.Based on the mujoco environment,different methods of using pre-trained encoders are designed by saving the weights of encoders for certain training steps and comparing their effects produced in different environments.Experimental results show that the method proposed in this paper achieves competitive results compared to state-of-the-art methods for complex control tasks in Deep Mind Control Suite,significantly improving the ability of pre-trained encoders to generalize to unseen tasks.
Keywords/Search Tags:reinforcement learning, self-supervised learning, mutual information, smoothness
PDF Full Text Request
Related items