Font Size: a A A

Research On Image Semantic Understanding Based On Attention Mechanism

Posted on:2022-12-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:J SuFull Text:PDF
GTID:1488306779482744Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
As an important way of human visual perception,image is the main source of massive data in the era of artificial intelligence.Therefore,how to intelligently understand massive image data is the main problem to be solved in the field of image understanding.In recent years,deep learning has made remarkable achievements in image understanding tasks such as image classification and image description,which further promotes the development of image semantic understanding tasks.However,image understanding with complex semantics,such as hierarchical image classification and visual storytelling,still has a lot of room for exploration.How to accurately and effectively understand images with complex semantics has far-reaching theoretical significance and broad application prospects.In hierarchical image classification problems,due to strict hierarchical relationships between the categories in the hierarchical image and high similarity between fine categories,it remains a difficult problem for computers to accurately identify these hierarchical categories.Visual storytelling,which aims to generate coherent and expressive storytelling for sequential images,requires not only to recognize the complex scenes and the dependencies among images,but also to understand the abstract semantics,which poses a higher challenge to the current image understanding technology for computers.Based on the above problems,this thesis investigates hierarchical image classification and visual storytelling in the image understanding field through attention mechanisms based on deep learning theories and methods from three aspects,namely,dual attention,local-feature attention with global semantic,and hierarchical attention,respectively.The main contributions of this thesis are as follows.(1)A hierarchical image classification model is proposed based on a dual-attention mechanism.Since most existing hierarchical image classification methods are used for fixedlevel recognition,this paper constructs a general recognition model DACL(dual-attention CNN-LSTM)based on CNN-LSTM,which introduces dual-attention modules of spatial feature dimension and spatial semantic dimension for solving both fixed and variable hierarchical classification problems.The model learns more discriminative fine-grained features corresponding to different categories by the spatial feature attention mechanism,and models the correlation between categories by the spatial semantic attention mechanism,so as to enhance the discriminative ability of key information of the model and effectively improve the generalization of the model.In this study,the algorithm is applied to CIFAR10,CIFAR100 and the design patent image datasets,and its performance is evaluated compared with the existing methods.The experimental results demonstrate that the proposed DACL method outperforms other existing hierarchical image classification methods in terms of both precision and accuracy.(2)A visual storytelling method is proposed by incorporating the local feature attention mechanism and global context semantics.This method adopts an end-to-end parallel Long Short-Term Memory module to implement visual storytelling.By contrast,traditional visual storytelling methods mainly employed serial Long Short-Term Memory modules,bringing problems such as too many network parameters,large computation,and excessive consumption of network resources.The proposed method overcomes the mentioned drawbacks.The thesis uses sequence image information as global image features while combining global context semantics with a local feature attention mechanism.Single image information is employed as a local feature,and an attention mechanism is introduced to obtain the image feature attention graph corresponding to the text and realize the association relationship construction among images and between image and text,respectively.Due to the separate input of sequence images,the traditional Long Short-Term Memory method only focuses on the relationship between a single image and text and ignores the association between sequence images.The proposed method effectively solves the above deficiency.The method was implemented on two public image datasets(DII and SIS)and achieved superior experimental results.(3)A visual storytelling algorithm is proposed based on a hierarchical attention mechanism.This thesis utilizes the rich semantic extraction capability of the BERT model to establish a two-layer Long Short-Term Memory model and introduces sentence-level and word-level attention mechanisms to realize the storytelling description of sequence images.The model first models the sentence-level semantic at the bottom layer,focusing on the mapping relationship between each image and the corresponding sentence semantic and also on the association relationship between image and image,sentence and sentence,and extracts the high-level topic information of each image,and then models the word-level semantic subject to the high-level topic at the second layer,focusing on the mapping relationship between each image and each word in the sentence text,and learns the corresponding image feature information of each word.Traditional visual storytelling methods generate sentences with many syntactic problems and over-simplified expressions.This method can effectively overcome these shortcomings.The experimental results demonstrate that the proposed model outperforms most of the methods under the automatic BLEU and CIDEr metrics as well as various metrics in human assessment.In summary,this thesis is aimed at solving several key issues in image understanding with attention mechanisms,combined with the latest deep learning theoretical methods.This thesis includes innovative research in the intersection disciplines of CV and NLP,which has important research significance for solving real-world application problems.
Keywords/Search Tags:Hierarchical image classification, Visual storytelling, Attention mechanism, CNN-LSTM
PDF Full Text Request
Related items