| In recent years,with the continuous development of the Internet,Internet products have become an indispensable part of people’s lives.With the outbreak of deep learning,people yearn for the realization of "communication" with the machine.Humans hope that the machine can simulate the way people think,do some simple or repetitive work and content that humans can do to realize human-machine interaction and intelligence.This paper combines the knowledge of computer vision and natural language processing to propose the problem of automatically generating sentences to describe images,that is,how to let machines understand images.This is a particularly attractive interdisciplinary research field in academia and industry.The image caption task is to translate the content covered in an image,including the objects in the image,the scene of the image,the motion and positional relationship of the objects in the image,etc.,into a specific sentence structure and the natural language of grammar rules,the ultimate goal is to "translate" a picture into a human-readable textual description.This article mainly does the following work:Firstly,this paper proposes a joint model based on convolutional neural network and recurrent neural network for processing image and text problems.For the image part,the paper use convolutional neural network to process the image content,extract the saliency features of the images such as color,texture,outline,etc.For the text part,it use the long-short term memory network to process the text words,including the coherence of the preceding and following sentences,contextual semantics,and coordination of contextual emotions.The image caption problem was then solved by concatenating the two models,and then the performance of the model was verified using third-party data.Secondly,an improved algorithm based on attention mechanism is proposed.It is added to the model by transforming the method of imitating the human eye to "seeing" things into a model,which is used to assist in generating the articulation of the key words and context.By comparing the improved algorithm with the previously proposed method,it is proved that the performance of the scheme on the image description task is better than that without the attention mechanism,and the superiority of the method is proved by comparative experiments.Finally,since this paper studies image caption,it is natural to extend it to the Chinese field for research.This paper also analyzes the word segmentation technology in Chinese natural language processing problem,and proposes an improvement based on the maximum entropy model.The dictionary segmentation algorithm performs Chinese word segmentation processing,and based on the Word2Vec algorithm proposed by Google,an algorithm suitable for Chinese word embedding is proposed to embed Chinese words,and then the unique Chinese image caption dataset is used to verify the effect of the algorithm.A Chinese image caption has been implemented. |