Font Size: a A A

The Research And Implementation Of Image Caption System Based On Deep Learning

Posted on:2020-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:J T LvFull Text:PDF
GTID:2428330590978635Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
With the continuous improvement of Internet Storage Capacity as well as the increasing popularity of Smart Devices,more and more people tend to take pictures to record their lives.Each day,thousands of images are captured by smart devices like smart phones and personal computers,and they are shared among millions of users on the Internet,leading to an explosive growth in image data.If we want to integrate huge image resources on the internet,it would be of great significance to grant computers with the ability of understanding and labeling the images automatically.In this way,the website administrators could easily classify and manage the images,while the users could retrieve the image they need with higher accuracy and at a higher speed.Traditional Image Understanding focuses on lower-level visual features such as colors,textures,and shapes.In recent years however,dramatic increasement of computing power of computers has caused rapid development in Deep Learning,and online-data combining visual information and natural languages can meet the data requirements of Deep Neural Networks.As a result,Automatic Generation of Image Caption based on Deep Learning becomes a leading technology in Image Understanding.This thesis studies the Automatic Generation of Image Caption and designs an end-to-end model for Image Caption.The end-to-end model(Feature pyramid networks-Neural Image Caption,F-NIC)is based on semantic features of images extracted by Deep Convolutional Neural Network,and can be built within the following steps: After extracting Image features with both Deep Separable Convolutional Neural Network and Standard Convolutional Neural Network,we applied Feature Pyramid Network in feature fusion.Eventually,semantic features are converted to natural languages with long-term and short-term memory neural network.Test results on public datasets show that feature fusion enhances the expressive power of the model.On this basis,we introduce attention mechanism to improve the image features,and redesign the loss function based on Reinforcement Learning,so as to build an improved FAR-NIC model.Also,with FAR-NIC model implemented on the embedded system,we manage to build a FAR-NIC system.According to test results on the publicdatasets,the FAR-NIC system achieve much higher BLAU,ROUGE and CIDEr scores than other models including Soft-Attention,Hard-Attention,SCA-CNN and SCST models,which means that the image captions achieved by FAR-NIC model are more detailed and accurate.
Keywords/Search Tags:Deep Learning, neural networks, semantic feature, image caption
PDF Full Text Request
Related items