Briefly describing a matter has been considered a task that only humans can perform,but recent developments in deep learning have made it possible to generate computergenerated descriptions.Cross-modal descriptions add a new challenge to the task of abstracting natural language,where we need not only to abstract information from nonnatural language,but also to decode the abstracted features into descriptive text.In this paper,we investigate two types of cross-modal:tasks across programming languagenatural language and across visual-linguistic modalities.Our the main contributions are as follows:(1)We define a new task,code related error question generation,is proposed to measure the performance of programming language understanding methods,which requires understanding error messages,programming language fragments and natural language,and requiring the different parts to interact to generate reasonable questions.We collected and organized data from over 200,000 error messages and questions,and made the dataset publicly available for further exploration.We designed the pre-trained Transformer model CMPPN for this task,and the experimental results of this model show the effectiveness of the deep learning model on this task,and the performance of code error problem generation can be further improved by pre-training and copy mechanism.(2)Then we improves the performance of the code feature extraction method by using an unsupervised contrast learning approach,which is further tested on the code summarization and code repair task.Unlike traditional contrast learning that only augments raw data to obtain training samples,we extract natural language descriptions of codes and use them as queries to find positive example matches for the codes in the training set.So we could achieve the goal of combining the design purpose of the codes with the feature representations.We also reduce the problem of unbalanced feature distribution of negative example samples for traditional comparison learning with the MoCo method.We use the publicly available dataset CodeSearchNet for training,and to verify the effectiveness of the method,we complete two tasks of code summarization and code error correction on the CodeXGLUE Benchmark,and obtain good results that outperform existing code pre-train methods.(3)In another broad cross-modal domain,the visual-linguistic domain,RNN networks are still the dominant approach.We hope that by introducing the Transformer,the network can be pre-trained using corpus information from nature language corpus,to improve the performance of our method.We apply the n-gram decoder generation applicable to summary generation to 3D object caption generation.By adding MultiStream Attention to the decoder,the optimized decoder not only predicts the output of a future step,but also guarantees the correctness of more future steps.We demonstrate the effectiveness of our approach through experiments on the 3D object description dataset. |