| With the emergence of high-performance computer hardware,deep learning technology has developed rapidly.The increasingly shaped interconnected world of everything has produced a torrent-like amount of data.In the face of the exponentially increasing amount of image data,how to make computers replace humans to efficiently understand image content has become one of the research hotspots.Image caption is to detect and identify image content information,not only to perceive the type of scene,but also to identify object attributes and their relations,and the ultimate goal is to generate a natural language that can reasonably describe the image content.As one of the key tasks in the intersection of natural language processing and computer vision,image caption has not only research value but also practical value.Combining deep learning technology has become the main method to solve this task.Although the research on image caption tasks has made breakthroughs repeatedly,There are still problems that the generated sentences are not clear on the details of the image and deviate from human understanding.The subject research is based on deep learning image caption model,the specific work is as follows:1.An image caption model based on attention feature adaptive recalibration was proposed.On the basis of image feature fusion attention mechanism,a channel activation layer was constructed to fully capture channel-wise dependencies for attention feature adaptive recalibration,which boosted the representational power of the feature,and ultimately improved the quality of generated sentences by long short-term memory.A comparison experiment was conducted on the three standard data sets of MS COCO,Flickr8 k and Flickr30 k.The experiment results show that the scores of BLEU_1,BLEU_2,BLEU_3,BLEU_4,METEOR and CIDEr of the proposed model on MS COCO data set can achieve 69.4%,52.3%,38.6%,28.5%,23.3% and 83.6%,which is superior to the traditional neural network image caption model and can generate more accurate image caption.2.Based on the existing research work,the Chinese-oriented image caption task is realized,and the image caption model is optimized and improved.Aiming at the shortcomings of the long short-term memory that only consider the above information,a bidirectional long short-term memory was proposed as the language generation network of the image caption model,which could consider the characteristics of the above and below information at the same time and improve the generated image caption sentences.Meanwhile,at the stage of establishing Chinese vocabularies of different sizes,a speeding method for word segmentation was proposed,the method used Cython to implement the three core algorithms of word segmentation technology for word segmentation acceleration.A comparison experiment was conducted on the three standard data sets of ICC.The experiment results show that the method of word segmentation acceleration can improve the word segmentation speed by 63.9%,the image caption model with a vocabulary size of 8000 has the best performance,and uses a bidirectional long short-term memory can improve the performance of the image caption model. |