| In recent years,with the enhancement of computer computing power and the rapid growth of data volume,research on deep learning algorithms has made great progress,and remarkable achievements have been made in related fields such as computer vision,natural language processing,speech recognition and multimodal.At the same time,its technology is also used in every corner of society.However,there is often more than one mode of data in real application scenarios,and the importance of multi-mode correlation technology is more and more valued by human beings.Among them,the task of image description under multi-mode mainly refers to transform the image into a text to describe its content,which includes feature representation,feature fusion,feature alignment and transformation in multi-mode technology.In the process of image conversion into descriptive text,the sentence structure and sentence content of the generated text are carried out simultaneously,which is difficult.At the same time,in the process of text generation,in addition to describing text accuracy,text controllability is still difficult to solve.Aiming at image description task,this paper studies the redundancy of sentence structure,semantic deviation and controllability of generated text in the synchronous generation of sentence pattern semantics in image description task.The experimental model in this paper takes encoding and decoding as the framework and integrates self-attention mechanism.The main work of this paper is divided into two aspects:1.Aiming at the complexity of synchronous sentence pattern and semantic generation in the image description task,a visual guidance attention method based on reediting framework is proposed.The method using visual guide attention as the descriptive text for the previous model generation with the knowledge of the noise,to describe the relationship between text and image area feature mapping model,the core idea of whether the existing text correction of editing framework to reduce the difficulty of the model to study again,to correct outages and sentence structure and entity semantic deviation problem.2.An approach based on global information flow is proposed to solve the problem of content body offset and text controllability in image description task.The method uses the description text constantly generated by the model,fuses and dynamically iterates the global vector,uses the dynamic global feature to ensure the correct flow of the description image area,and controls the image subjectivity of the generated text.Experiments show that this method can effectively improve the accuracy of text generation. |