Font Size: a A A

Image Understanding Based On Attention Mechanism

Posted on:2019-04-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:C GuoFull Text:PDF
GTID:1368330542998005Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the fast development of the network and the popularity of cameras and mobile devices,the number of photos has shown explosive growth trends.People are taking more photos than ever before.They record their lives by photos,show their opinions by photos and also share their experience by phototheir opinions by photos and also share their experience by photos.It was reported that there were about 1.2 trillion photos taken in 2017[1]and the number of photos in storage was about 4.7 trillion.Some of these photos are stored in the local devices of users,while some are uploaded to the social networks for sharing.Photos are part of human daily lives now.Users always have several kinds of methods to organize their photos:1)single photo.These photos are stored independently.The content of photos is very rich,such as objects,scenes and portraits.2)event-based photo album.These photos are stored in folders.They record events of users' daily lives,such as birthday parties and hiking.3)object-based photo album.These photos are also stored in folders.The photos always describe one kind of specific objects,such flowers and dogs.4)face-based photo.These photos contain faces.Users want to know who are in these photos.It is a challenge to organize such a huge number of photos.Since 2012,CNNs have achieved outstanding performances in object and scene recognition.The models trained by CNNs shows promising transferred ability?Based on these models,attention networks can help the models achieves better performance than the base models.The idea of attention network comes from human recognition processing.To have detail descriptions of objects or scenes,we human will pay more attention at the parts of objects which show specific features,and then make conclusions.This can be reflected in the machine learning that import information should obtains higher weights than the non-important ones.In this thesis,we use the idea of attention for image classification tasks and intro-duce differnet models for diffrent image classification tasks.For the single photos,one photos always have several objects.To recognize these photos,we should treat this problem a multi-label image classification instead of sinlge-label image classification.We build an attentnion network to learn the masks of labels,and use these masks to extract detail features of labels,meanwhile learn the relations of the labels.For event-based photos,we recognize the events in the photo albums.Since pho-tographers have numerous styles for taking photos and one photo always can only de-scribe part of the event,we introduce an attention model to learn the importances of photos.Then we build the album features by weight averged the photos' features.With the help of multiple features and hierachical structure,we make our final predictions.For object-based photos,they belong to the same categories.For example,a bird album contains various kinds of birds.This is fine-gained image classification task.We use attention network to find the detail features and make our final perdictions.For face dectection task,we adopt the idea of atteniton into the tradictional cas-cade face detection algorithm and introduce the tree-based cascade structure to detect multiple views of face in one model.
Keywords/Search Tags:deep learning, covolutional neural network, attention mechanism
PDF Full Text Request
Related items