Font Size: a A A

Image Scene Understanding Based On Deep Learning Fusion Model

Posted on:2020-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiaoFull Text:PDF
GTID:2438330620955592Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of modern internet technology,especially the popularity of electronic devices such as smart phones,the amount of information people receive daily is growing exponentially,and most of the perceived information comes from human vision.As one of the carriers of visual information,images hide a lot of important information.Image description technology,as one of the important research directions of deep learning,involves two fields of Computer Vision and Natural Language Processing.It has excellent performance in image recognition and unmanned driving.In this paper,the deep learning is far superior to the traditional performance of the traditional technology,the image scene description model is modeled,and the corresponding research is carried out.The specific research contents are as follows:(1)Based on the problem of insufficient expressive ability of Convolutional Neural Network and lack of information guidance for Long Short-Term Memory Networks,an image scene description model based on RBM and gLSTM is designed.The weight matrix is one of the important performances to determine the feature extraction ability of deep neural networks.This paper analyzes the characteristics of unsupervised learning and short back propagation distance in the training process of Restricted Boltzmann Machines,and its weight matrix is more fitting the training samples.Based on this,this paper uses the Restricted Boltzmann Machine to train and initialize the weight matrix for the Convolutional Neural Network,thus effectively improving the ability of the Convolutional Neural Network to express the features.In addition,considering that traditional Long Short-Term Memory networks only have the guidance of image characteristics at the beginning,three different semantic information are studied,and the Long Short-Term Memory network is introduced to generate description statements at each moment,thereby improving the accuracy of the description sentences.Finally,experimental simulations were performed on the datasets Flickr8k and Flickr30k.The experimental results show that the model has a significant improvement on the accuracy,recall and coherence of the description statement.The BLEU and METEOR index scores are 2.2 points higher than the same model.In addition,the Convolutional Neural Network accuracy is also stable at 93%.And the convergence speed is faster and smoother,and the performance of the Convolutional Neural Network optimized by the Restricted Boltzmann machine is also better.(2)Research on the relationship between intermediate semantic features and description sentences,and design an image scene description model based on PCA and Attention.The influence of the quality of the intermediate semantic features on the accuracy of the description clause is analyzed,and the principal component analysis method is adopted to reduce the feature dimension.Using a feature projection space,the projection of the image features is calculated to improve the feature contrast.Secondly,draw on the attention mechanism of the human brain and add the soft attention mechanism to the model framework.Through the weighted summation,the intermediate semantic features most relevant to the current description sentence are calculated,so that the model can ignore the irrelevant information in the intermediate semantic features and pay attention to the key information.Finally,the experimental results show that the model’s BLEU and METEOR index scores are 1.42 points and 1.61 points higher than the rest of the model,which verifies the validity of the model.Secondly,the image gray level after equalization is evenly distributed between 0 and 250,and the PCA reconstruction error is about 0.05×10-7.The side shows that the model retains the image information while reducing the feature dimension and improving the contrast.In addition,through attention visualization,it was verified that the model focused attention on specific areas of the image.
Keywords/Search Tags:Image Description, Semantic Guidance, Principal Component Analysis, Attention
PDF Full Text Request
Related items