RoboCup(The Robot World Cup Soccer Games and Conferences)is an international comprehensive competition for artificial intelligence,multi-agent collaboration and robotics.Taking the football game as the basic field,a complex real-time multi-agent environment decision-making problem for intelligent agents is proposed to promote the research and development of artificial intelligence and intelligent robots.The RoboCup simulation 2D project is a sub-project of the RoboCup competition,which mainly focuses on the research of high-level decision-making.The high-level strategy is to use the stadium state information to design excellent team strategy and action design.In recent years,a large number of machine learning and deep learning algorithms have been used in research in this field.Offensive movement is the core behavior of high-level decision-making.Players perform movement when the ball cannot be kicked.In the underlying code of the open source RoboCup2 D,the target point of movement depends on the judgment of form,the difference between tactics and player roles,and depends on the formation..The target points of the player’s movement in the formation are only related to the position of the ball.The currently known movement model can maintain the integrity of the team’s formation and ensure the basic performance of the team,but there are still problems such as unclear offensive movement goals and the movement cannot adapt to the complex and changing pitch environment.A player’s movement should not only take into account the position of the ball,but also the real-time course information.According to the characteristics of offensive movement,this thesis uses the sorting learning algorithm to improve and optimize the attacking movement,and mainly does the following work:(1)Aiming at the problem that the movement target point only focuses on the position of the ball,it cannot adapt to the complex and unpredictable court environment.Combining the characteristics of RoboCup2 D player’s movement,on the basis of the original formation model,an attacking movement method based on sorting learning is designed and implemented.This method adds more real-time pitch features for the selection of the target points of the running position.It makes the offensive running position better adapt to the real-time pitch environment,improves the team’s winning rate and the average number of goals per game,reduces the average number of goals conceded per game,and enhances the team’s offensive ability.(2)When the mission objectives of the attacking movement are different,the key state feature should be different.In view of the long decision-making cycle of offensive movement and unclear offensive goals,the offensive movement strategy is discontinuous and out of touch with teammates’ offensive rhythm.In this thesis,the attacking strategy is optimized through pitch modeling and data mining of log files.The abscissa of the court is equal to 25 as the dividing line of the team’s offensive strategy,and the training of the running model is carried out respectively.By analyzing the experimental results,the effectiveness of the strategy optimization is proved.The method has been implemented in the simulation game,and the results show the effectiveness of the learning-to-rank algorithm in the attacking movement.Ranking learning algorithms are commonly used in fields such as search and recommendation.This study demonstrates that with appropriate feature extraction and model design,learning to rank can also be used in complex real-time decision-making systems,reflecting the broader application capability of learning-to-rank. |