| Youth education is related to the future of the country.However,it faces a severe problem of involution in education,so the government and academia are trying to find a solution.Under the background of "double reduction",education subjects are still looking forward to rescue methods in mutual influence and involution competition.From the game theory perspective,this paper uses reinforcement learning to explore the evolution,causes,and solutions of educational involution.Specifically,the following three work has been done:Firstly,explore the evolutionary laws and causes of educational involution.Construct two-person educational game,theoretically analyze its repeated game equilibrium,and use folk theorem to define the space of strategy improvement;Then,use reinforcement learning simulation experiments to examine the evolution of strategies and benefits.The evolution law obtained by 10,000 rounds of simulation is:During the gestation period,the frequency of "focus on score" increased slowly and was lower than 50%,but the income did not decrease significantly,and the involution did not form.During the formative period,the frequency of "focus on score" increased rapidly and those with higher frequencies had significantly lower incomes,and the involution was formed;During the deepening period,the income of the first person to enter the involution dropped to the bottom,but the frequency and income of those who subsequently entered the involution overtaken and slowly increased.During the stalemate period,the frequency of "focus on score" converges at 70% to 82%,and involution is deadlocked.Based on this,it was formed because the frequency of "focus on score" was too high(both above 50%),and the average return decreased significantly.Secondly,explore ways to solve the problem of group education involution in the context of "double reduction".Quantify the "double reduction" policy as a reward and punishment measure for education subjects,construct an improved education game model,and theoretically analyze its repeated game equilibrium.Reinforcement learning simulation is carried out starting from the deadlocked education game,and obtains the optimal strategy.Finally,the reliability of the optimal strategy in the group education game was examined based on the small world network.Experiments show that a win-win situation in education can be achieved after the "double reduction" policy if the education entity chooses to "focus on happiness" at a frequency of 56% to58%.The research of this paper can provide theoretical reference for analyzing and breaking the phenomenon of education involution,guide the subject of education to adopt the optimal strategy,and promote development of educational rationality and healthy. |