Mix-up Consistent Cross Representations For Data-Efficient Reinforcement Learning

Posted on:2024-03-22

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Liu

Full Text:PDF

GTID:2568307052495854

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

With the development of deep neural networks,deep reinforcement learning has made remarkable performance on sequential decision problems,and in particular,has made significant progress on a range of challenging decision tasks,such as Go,robot control,and even in real-time strategy games such as Do TA2 and Star Craft.While deep reinforcement learning excels at solving problems where tasks containing large amounts of data can be collected through almost unlimited interaction with the environment,however,it is often difficult to extract task-relevant semantic information when interacting with limited data in the environment.Therefore,it is a challenge to construct efficient and stable reinforcement learning algorithms for sequential decision tasks with limited state observation in the environment.To address this problem,this paper proposes hybrid consistent cross-representation as a self-supervised auxiliary task to improve data efficiency and encourage prediction of future features.The main contributions of this paper are as follows:1.Increase the number of projection heads to calculate the contrast loss between low and high dimensional representations of different state observations.The nature of the more informative and invariant representations are extracted using the lower and higher layer networks,respectively,and the contrast loss computed between corresponding layers is replaced by using the cross-contrast loss to enhance the mutual information between states,thus improving data efficiency.2.A Mix-up strategy is used to generate intermediate samples and design the corresponding reward information.The mixed information samples of adjacent time step states are constructed using the hybrid strategy,and the supervised signal is designed to be the maximum of the features corresponding to the position extracted from the samples before construction.By maximizing their similarity,it is able to increase the data diversity and improve the smoothness of the representation prediction of nearby time steps.3.Design algorithms to use pre-trained encoders for the current task and compare their effect of generalizing to tasks not seen in the environment.Based on the mujoco environment,different methods of using pre-trained encoders are designed by saving the weights of encoders for certain training steps and comparing their effects produced in different environments.Experimental results show that the method proposed in this paper achieves competitive results compared to state-of-the-art methods for complex control tasks in Deep Mind Control Suite,significantly improving the ability of pre-trained encoders to generalize to unseen tasks.

Keywords/Search Tags:

reinforcement learning, self-supervised learning, mutual information, smoothness

PDF Full Text Request

Related items

1	Research On Robot Motor Skill Learning Methods Based On Reinforcement Learning
2	Policy Adaptation With Contrastive Learning And Mutual Information In Meta Reinforcement Learning
3	Supervised Reinforcement Learning:methods And Applications
4	Research On Interactive Recommendation Method Based On Reinforcement Learning
5	Research On Multi-agent Reinforcement Learning Algorithms Based On Self-Supervised Learning
6	Research On Reinforcement Learning Algorithm Based On Parallel Sampling And Behavior Induction
7	Research On Reinforcement Learning Method Based On Intention Control
8	Action Anomaly Detection Based On Self-Supervised Learning In Videos
9	Reinforcement Learning Based Recommendation:A Supervision Signal Perspective Research
10	Research On The Application Of Geometric Information In The Semi-supervised Learning