| In recent years,image title generation has received extensive attention from the research community and the business community.However,in the existing research,the image content is described according to the object position relationship and scene information extracted from the image,and the context relationship between the words in the generated title is not well utilized.In addition,Chinese is more flexible in part of speech and organization than English,resulting in the loss of some object attributes and incomplete description sentences when generating Chinese title.In order to solve these two problems,a bidirectional assisted LSTM decoder and a multimodal attention fusion mechanism are designed and implemented.In this thesis,the deep learning technology is used to design the image Chinese Title Generation Model Based on bidirectional LSTM.The pre training recurrent neural network is used as the image feature encoder,and the bidirectional LSTM is used as the Chinese Title decoder.The first innovation of this paper is to add assistance network to two-way LSTM,which is composed of forward LSTM,forward assistance network,reverse LSTM and reverse assistance network.In this way,the state of each LSTM can be predicted by hiding the state of each LSTM in the forward direction and in the reverse direction.The second innovation of this paper is to fuse visual attention and text attention in multi modality,and use two-way assistance LSTM to pre generate forward and reverse pre titles,and transform the Chinese words in the pre title.The transformed word embedding vector is used as the pre generated text feature.The pre generated text features and image features are fused into multi morphological features,and the multi morphological features are injected into LSTM as attention for secondary decoding to generate the final image Chinese title.Two comparative experiments are conducted on the official Chinese data set of "AI Challenger" challenge to verify the advantages and disadvantages of this model.The experimental results show that the recognition accuracy of this model is 98%,and the scores of BLEU-1,METEOR,ROUGE-L and CIDEr are 75.7,27.4,56.3 and 109.4 respectively.It can solve the problems of losing some object attributes and incomplete description statement when generating Chinese title. |