Font Size: a A A

Research On Image Content Analysis And Retrieval Methods For Large Scale Data

Posted on:2019-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:P F ZhangFull Text:PDF
GTID:2428330545453702Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology in recent years,there is a tremendous explosion of multimedia data.On the one hand,the total amount of data is huge,covering a wide range,including data information of all walks of life.On the other hand,the manifestation of data presents diversity,such as text,images,voice,and video,and the relationship between these modalities is complex and diverse.So,how to effectively and efficiently analyze the content of data,store and retrieve such large amount of data is becoming more and more challenging,and has received more and more attention and exploited.In terms of data content analysis,deep learning,as a kind of representational learning method,has received more and more attention.It not only can obtain more and high-quality feature representations,but also avoid to manually extracting features and thus greatly improve the qualities of features and The efficiency of extraction.As a result,deep learning has rapidly become a research hotspot and applied in various fields of machine learning one of which image classification is.In recent years,many image classification methods which are based on neural network have been proposed,and have achieved good results in single-label image classification tasks.However,in real world,one image may be associated with rich contents such as different objects,scenes,and so on.To deal with it,many multi-label image classification methods were proposed.However,these are many problems need to be further considered.For example,some methods directly use the entire image as input,and then predict the label based on the features extracted by the neural network;Some methods think objects in one picture is independent from each other,so they first extract regions that may contain objects,and then extract features of these regions by the neural network.Finally,they predict the image labels based on these features.However,these methods only perform simple content analysis on the images and do not take into account the co-occurrence dependencies between the labels,so it is difficult to achieve satisfactory results when dealing with complex images.Considering this problem,we proposed an effective deep learning method that takes into account the co-occurrence dependencies between labels during multi-label images classification tasks.Specifically,in multi-label image classification tasks,in addition to applying the classic application network to obtain the feature representations of the image,we also use the labels to construct a label co-occurrence matrix,and then construct a new neural network to process the co-occurrence matrix to capture the label co-occurrence dependencies between labels.Finally,these two representations are merged to predict the label of the image.Extensive experiments on two public benchmark datasets demonstrate that the proposed method obtains satisfying results and outperforms several state-of-the-art methods.As for the retrieval and storage of data,how to effectively and efficiently store and retrieve such large amount of data is becoming more and more challenging.To solve this,many hashing based methods were proposed which map the high-dimensional data into compact binary codes in a Hamming space while preserving the similarity in the original space.Due to the fast query speed and low storage cost,hashing methods have attracted considerable attention.According to whether the query items and the retrieved items are from the same modality or not,the existing hashing methods can be roughly classified into unimodal and cross-modal hashing.Compared to single-modal hashing which can only retrieve data between single modal data,multi-modal hashing can better meet the demand of retrieval tasks today,such as retrieving pictures on web pages,retrieving videos,and so on.Many cross-modal hashing methods were proposed and have achieved good results.However,there are still many problems that we need to solve.Essentially,given a similarity matrix,most of these methods tackle a discrete optimization problem by separating it into two stages,i.e.,first relaxing the binary constraints and finding a solution of the relaxed optimization problem,then quantizing the solution to obtain the binary codes.This scheme will generate large quantization error.Some discrete optimization methods have been proposed to tackle this;however,the generation of the binary codes is independent of the features in the original space,which makes it not robust to noise.To consider these problems,in this paper,we propose a novel supervised cross-modal hashing method--Semi-Relaxation Supervised Hashing(SRSH).It can learn the hash functions and the binary codes simultaneously.At the same time,to tackle the optimization problem,it relaxes a part of binary constraints,instead of all of them,by introducing an intermediate representation variable.By doing this,the quantization error can be reduced and the optimization problem can also be easily solved by an iterative algorithm proposed in this paper.Extensive experimental results on three benchmark datasets demonstrate that SRSH can obtain competitive results and outperform state-of-the-art unsupervised and supervised cross-modal hashing methods.
Keywords/Search Tags:Deep learning, Image classification, Hash learning, Multi-modal retrieval, Approximate nearest neighbor search
PDF Full Text Request
Related items