To execute diverse tasks in partially observable environments,the unmanned platform need to utilize multiple sensors to collect multimodal data such as environmental information or user language descriptions,and to maintain uninterrupted sensing,updating,and understanding the world state.Traditional planning domain knowledge representation and description languages are difficult to accurately model the environment,and classical planning methods do not have the ability to learn from the previous solving experience.Consequently,it is hard to generalize the planning methods to heterogeneous environments or multiple types of tasks.The data-driven,learning-based planning approach utilizes low-dimensional dense vectors to represent multimodal information,and adopts machine learning algorithms to mine planning models from large-scale task datasets.In general,it possesses certain generalization performance and high solution efficiency.In this paper,we focus on the research on intelligent planning algorithm of unmanned platform driven by multimodal data in the realistic environment.Under map-free,partially observable conditions,we aim at intelligent planning of unmanned platform driven by visual and language data.To solve this problem,knowledge enabled task decomposition algorithm is proposed,and the multimodal intelligent planner based on hierarchical strategy is constructed.At last,we experiment on the embodied simulation environment for training and validation.The main work and contribution of this paper are summarized as follows:(1)Describing and analyzing multimodal data-driven intelligent planning problem and proposing solutions.We formally describes the intelligent planning problem driven by visual and language data,focus on the analysis of key challenges such as long trajectory combination planning,partially observable environment,and language ambiguity,and then propose intelligent task decomposition and dynamic planning algorithm based on hierarchical strategies to deal with the above problems,finally analyze and introduce the visual and language data processing algorithms used in the experiment.(2)Proposing a task decomposition algorithm based on pre-trained language model and domain knowledge graph.To solve the problem of natural language instruction understanding under vision-free condition,we build an environment and task centered domain knowledge graph,and design an embedded algorithm to fuse the knowledge graph into the pre-trained language model to improve semantic cognitive ability,finally construct a sequence to sequence model to transform the task instructions into subtasks sequences.Experimental results show that compared with the existing models,the task decomposition accuracy of the proposed algorithm is improved by about 5%.(3)Constructing an intelligent planner based on Transformer and hierarchical strategy,and achieving autonomous movement and interactive manipulation in the embodied AI simulator.In a mapless,partially observable condition,we design a subtask selector based on subtask sequences for hierarchical planning,utilize pre-trained models to encode natural language instructions and historical visual images,adopt a multi-layer Transformer to fuse multimodal data,finally construct the decision inference network to predict primitive actions and the corresponding object interaction mask.Then,we evaluate planning effectiveness on the embodied simulator.Experimental results show that the multimodal data-driven intelligent planning algorithm for unmanned platforms proposed in this paper can effectively complete the combined tasks of navigation and operation,and initially possesses the ability of autonomous planning. |