Font Size: a A A

Research On Image Annotation Based On Deep Neural Networks

Posted on:2020-03-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y DuFull Text:PDF
GTID:1368330623458172Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of online platforms,image annotation has been a significant fundamental to serve the tasks of image retrieval,human-machine dialogue,visual assistant,etc.Manually annotating images is a workable but time-consuming and laborintensive way to obtain accurate annotations.It is impossible to annotate a too large number of images via manual annotation.Therefore,automatic image annotation methods are proposed.In recent years,the state-of-the-art methods are mostly based on deep neural network.However,due to the ‘semantic gap' between basic visual features and semantic understanding,more efforts are required to improve image annotation,e.g.exploring the usage of image assisted information,improving the semantic annotating results,accelerating the deep image annotation methods,etc.This dissertation explores the deep neural networks based image annotation methods and then proposes the implicit feature learning based social image re-annotation methods,and the semantic understanding and captioning based image annotation methods.According to the explorations on image annotation tasks from various perspectives,some deep neural network-based methods are proposed.Lastly,a CNN training framework is proposed to accelerate the training process of deep neural network distributed GPUs.Specifically,the main contributions of this dissertation are as below:1.An image re-annotation method based on noise estimation is proposed,in order to alleviate the impact of the noises in social tags by taking Cauchy distribution estimate the noises in social tags.That improves the learning results of latent features in Matrix Factorization methods.Compared with various noise distribution hypotheses,Cauchy distribution is robust to all kinds of noise and it is more suitable to model the tagging noise of social images.Thus,it could improve the performance of the latent features.Therefore,Cauchy Matrix Factorization,which is the proposed matrix factorization method with Cauchy noise hypothesis,could better explore the effective information in social tags for better annotations.The conductive experiments on the MIRFlickr and NUS-WIDE datasets demonstrate that the latent features learned via Cauchy Matrix Factorization perform well would generate nice image re-annotation,and serve well in image retrieval tasks.That verifies the Cauchy distribution is a nice probability hypothesis over the social tags noises.2.An image re-annotation method based on modeling dimension correlations between latent features is proposed,in order to improve the latent feature methods via modeling the dimension correlations between latent features.The method applies outer product over the latent features to explicitly model the pairwise correlations between their dimensions and obtains a 2-D interaction map.Over the interaction map,the method employs a stack of convolutional layers to extract the high-order correlations among latent feature dimensions layer-by-layer.The final prediction,i.e.the result of image re-annotation,is built over all the correlations.The experimental results verify that the method effectively improves the usage of hidden features and improve the performances of image re-annotation.3.An image annotation method based on a multi-modal bidirectional recurrent neural network is proposed,in order to improve the captioning results of the existing method by integrating the contexts.The method uses the convolutional neural network to obtain the semantic features of the images,uses the word vector to represent the input word,and uses a bidirectional recurrent neural network to generate the sequence features of the context.Then it fuses the features through a multi-modal layer to form a comprehensive multi-modal feature.Finally,the method generates the next-words sequentially according to the multi-modal features.To explore the impact of multi-modal features on the results of image annotation,three instantiations of multi-modal layers are devised.The conductive experiments on the Flickr30 K and MSCOCO datasets verify that the improved multi-modal feature fusion method has a positive impact on image captioning,and the method is capable of generating grammatical sentences with appropriately semantic contents.4.An image annotation method based on large-scale corpus is proposed,in order to promote the application of a simple,non-parametric image annotation method for large scale corpus,by reducing the time-spatial complexity of the existing method.The key is to preprocess the images via hashing encoding.Specifically,all images in the corpus are converted to hash codes which not only compresses the image storage space but also improves the efficiency of caption matching.Thus,by comparing the hash code of the query image and the images in the corpus,it would be faster to obtain the candidate images which is similar to the query image.Then the consensus caption among the captions for the candidate images is selected to be the matched one.The experimental results show that the proposed method can achieve tens of times time efficiency improvement and hundreds of times space efficiency improvement compared with the existing method without changing the captioning performances.5.An accelerating framework is proposed to reduce the training time of image annotation methods(i.e.convolutional neural networks)with distributed GPUs,by break the constraint of data synchronization on GPU computing via hybrid parallelism and alternate strategy.Specifically,hybrid parallelism partitions and deploys the convolutional neural network in a simple rule,in order to simply convert the singlemachine model to a distributed GPU parallel model.Compared with the classic data parallelism,it has much fewer data to be synchronized between servers.Alternate strategy deploys multiple workers in each GPU.The workers in the same GPU run alternately,that reduces the idle rate of the GPU,thereby the integral framework achieves high computational efficiency.The experiments in a common distributed environment(Gigabit network,two servers,four GPUs)show that using this architecture to train Alex Net on the Image Net dataset can achieve 3.07 times of training efficiency of Caffe.
Keywords/Search Tags:Image Annotation, Deep Neural Networks, Social Tags, Image Captioning, Parallel Computing
PDF Full Text Request
Related items