With the vigorous development of artificial intelligence technology and the rapid iteration of computer chip performance,the landing of highlevel autonomous driving applications has become possible.Trajectory prediction,as a crucial link of autonomous driving,can predict the future trajectory of surrounding agents to greatly reducing the uncertainty in the environment and improving the ability to predict future risks,which has far-reaching significance for ensuring the safety,stability and efficiency of autonomous vehicles.In complex and changeable driving scenarios,there are generally multiple possible motion modes for the future trajectory of agents.In order to effectively fit the uncertainty and comprehensively predict the future risks,it is urgent to research and develop multi-modal trajectory prediction algorithms for autonomous driving.The topic of this thesis is selected from the Beijing Natural Science Foundation project "Research on Resource Allocation Algorithms for Internet of Vehicles Based on Video Content Understanding Driven by Dynamic Spatio-temporal Data",focusing on the task of multi-modal trajectory prediction driven by dynamic spatio-temporal data in autonomous driving,aiming at the problems of existing methods in this task,the research on single-agent multi-modal trajectory prediction algorithm and multi-agent multi-modal trajectory prediction algorithm are carried out successively.The main work of this thesis is as follows:1.In the research of single-agent multi-modal trajectory prediction algorithm,aiming at the insufficient consideration of the close correlation between vehicle trajectory and lane topology in existing methods,a multimodal vehicle trajectory prediction algorithm based on joint learning vehicle and lane information is proposed.Focusing on the strong correlation between adjacent vehicles and corresponding lanes,an information fusion module based on spatial attention is proposed to jointly represent the information of adjacent interactive vehicles and target’s candidate lanes,forming a set of semantic instance-level lanes that can reflect the road occupancy,which can be used as scene context for the trajectory prediction of target vehicle.In view of the strong dependence of the target vehicle on the reference lane,an autoregressive prediction module based on recursive lane attention is proposed.The attention mechanism for the local area of the reference lane could mine more detailed lane topology information,which enable accurate and reasonable prediction of multi-modal trajectories.On the real autonomous driving dataset Argoverse,the prediction error and reliability indicators of the proposed algorithm are better than the baseline method.Even in difficult scenarios with low probability,our algorithm can still predict accurate and reasonable multi-modal trajectories.2.In the research of multi-agent multi-modal trajectory prediction algorithm,aiming at the key problems of existing methods such as sensitivity to input perspective,insufficient consideration of heterogeneity,cumbersome and complicated post-processing,a multi-agent joint multimodal trajectory prediction algorithm based on heterogeneous graph Transformer is proposed.Focusing on the sensitivity of the input perspective and the heterogeneity of agents,a heterogeneous graph Transformer-based marginal multi-modal trajectory encoder is proposed,which establish a global heterogeneous graph with a unified perspective from scene view,introduce the relative spatial information brought by the perspective conversion into the edge features,consider the heterogeneity of nodes and interaction edges through the gating mechanism.Finally,using the heterogeneous graph Transformer to carry out message transmission to model the global interaction,and realizing the accurate prediction of marginal multi-modal trajectory for each agent.In term of the cumbersome and complicated post-processing,a joint multi-modal trajectory decoder is proposed.Based on the marginal prediction results,the trajectory features are aggregated and reorganized in the learnable scene-level feature space.Finally,it can replace the post-processing to realize the end-to-end multi-agent joint multi-modal trajectory prediction,which ensures the synergistic consistency of joint multi-modal trajectories in the scene.Exper-iments on two datasets,INTERACTION and Argoverse,verify the advancement and effectiveness of the overall algorithm and each module from the aspects of quantitative index evaluation and visual analysis,and can predict synergistic and consistent joint multimodal trajectories in dense interaction scenarios. |