Font Size: a A A

Image Semantic Understanding Introducing Word Embedding And Attention Augmentation Mechanisms

Posted on:2022-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:B DongFull Text:PDF
GTID:2518306326490804Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Image semantic understanding is a technology that integrates computer vision and natural language processing.It can convert images into semantic text statements that describe the content of the image.And it has a wide range of applications in image retrieval,visual assistance,human-computer interaction and other fields.Different from the image classification task,the task of image semantic understanding not only needs to recognize the objects in the image,but also needs to recognize the actions and attributes of the objects,understand the relationship between the objects,and then generate a reasonable image semantic description.Traditional image semantic understanding techniques generate descriptive statements based on templates or retrieval.It has the disadvantages of poor flexibility,need a lot of manpower and material resources,the quality of the generated text is not high.With the development of deep learning in recent years,image semantic understanding algorithms based on encoder-decoder structure emerge one after another and achieved good results.However,there are still some problems in the existing research methods.For example,the description statement of the image is not comprehensive,the words before and after the sentence are not strongly related,and the logic is poor.Therefore,this paper proposes an image semantic understanding algorithm that introduces word vector and attention enhancement mechanism.The main research contents and innovation work are as follows:(1)Aiming at the problem that the existing image semantic understanding algorithms do not fully utilize the image local information and semantic information.An image semantic understanding algorithm with word vector and dual attention mechanism is proposed.The algorithm adopts encoder-decoder structure.In the encoding part,the spatial features of the image are extracted using the Res Net-50 network.In the decoding part,attention mechanism is added to the input and output of the long and short-term memory network respectively.In the decoding stage,word vectors representing semantic information are introduced.Then,the transformation of the image features to the image semantics is implemented.(2)Aiming at the image semantic understanding algorithm with word vector and dual attention mechanism is not rich in semantic description and does not make full use of the memory information carried by the long and short memory network.On the basis of this algorithm,an image semantic understanding algorithm based on feature fusion and attention enhancement mechanism is proposed.In the encoding part,the image features extracted by Res Net-50 network are fused.Then the algorithm selects the fused feature instead of a single feature and inputs it to the decoding network.The fused features can represent the information contained in the image more comprehensively.In the decoding part,the hidden state of long and short-term memory network at time t-1 and the hidden state at time t are fused and inputted into the attention mechanism at the output stage to instead of the hidden state at time t.In addition,for attention mechanisms in the input and output phases,the algorithm selects different image fusion features as their input feature vectors.The feature information of the image is fully utilized.In conclusion,this paper researches and improves the image semantic understanding algorithm based on encoder-decoder.Aiming at the problems existing in the traditional image semantic understanding algorithms,some improved methods are proposed.The experimental results show that the proposed algorithm can more comprehensively identify the objects and scene information in the image,and the generated semantic description is more accurate and fluid.The validity of the algorithm is fully proved.
Keywords/Search Tags:image semantic understanding, feature fusion, attention enhancement mechanism, word vector, long and short-term memory network
PDF Full Text Request
Related items