Font Size: a A A

Research And Application Of Image Representation Based On Large-scale Datasets

Posted on:2021-12-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:1488306302462274Subject:Computer applications engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence and computer vision,image processing and analysis have become an important link that cannot be ignored in modern scientific research.Especially in the era of big data,with the popularization of electronic products and the Internet,a large amount of image data is generated almost every moment,so it is particularly important to analyze and process these image big data.The key of analyzing and understanding these data depend on the feature extraction of the image,which is also an indispensable part of the computer vision tasks,and the obtained image representation is utilized as input for subsequent computer vision tasks analysis and calculation.This dissertation will study this key factor,extract features from image big data,and obtain the final image representation for the image processing and analysis.Specifically,this dissertation takes image retrieval task as an example to discuss how to obtain a more robust image representation in image retrieval tasks.The main research content will focus on the following issues:during the image retrieval based on a large-scale dataset,1)how to solve image representation of the unlabelled data set,2)how to obtain a lightweight image representation,and 3)how to study with incremental image representation for product search applications.To solve these three issues,we have carried out exploration and research respectively.In the next parts,we summary how we solved these three issues.First,we aim to achieve effective image representation for image retrieval in an unsupervised manner.To this end,we propose a fully cross-dimensional weighting pooling method.In particular,we aggregate multi-scale features extracted by convolutional neural networks using the proposed method,taking into account multiple aspects of visual features captured by the networks.Different weights can be assigned to the features extracted by different layers of the networks.To reduce the effort for parameter tuning,we propose an initial strategy to prune the searching space of the weights,which is achieved by designing constraint rules based on the prior knowledge on relations between the layers of the networks.Based on this,we propose weighted multi-layer feature fusion for similar image representations.Extensive experiments conducted on four public real-world datasets demonstrate the effectiveness of the proposed FCroW method and the pruning strategy for image retrieval.Second,activated hidden unites in convolutional neural networks,known as feature maps,dominate image representation,which is compact and discriminative.For ultra-large data sets,high dimensional feature maps in float format not only result in high computational complexity,but also occupy massive memory space.To this end,a new image representation by aggregating convolution kernels is proposed,where some convolution kernels capturing certain patterns are activated.The top-n index numbers of the convolution kernels are extracted directly as image representation in discrete integer values,which rebuild relationship between convolution kernels and image.Furthermore,a distance measurement is defined from the perspective of ordered sets to calculate position-sensitive similarities between image representations.Extensive experiments conducted on Oxford Buildings,Paris,and Holidays,etc.,manifest that the proposed method achieves competitive performance on image retrieval with much lower computational cost,outperforming the ones using feature maps for image representation.Third,with the development of image processing and computer vision technology,content-based product search has been widely applied in our life,such as online shopping,automatic checkout systems,and intelligent logistics.Given a query product image,existing product search systems mainly perform the retrieval process on pre-defined databases that have fixed product categories.However,in real-world applications,we usually need to expand new categories or update existing products in the product database.For existing product search methods,the models of image feature extraction and indexing must be retrained with the whole updated data,which is expensive in the cost of data annotation and training time.To this end,we propose a few-shot incremental product search framework with meta-learning,which need very few annotated images and reasonable training time.In particular,our framework contains a multi-pooling based product semantic extractor to learn a discriminative representation for each product.Moreover,a meta-learning based feature adapter is designed to guarantee the robustness of few-shot features.Furthermore,when expanding new categories in batches during a product search,we reconstruct the few-shot features by the incremental weight combiner to accommodate the incremental search task.At last,extensive experiments show that the proposed framework can achieve excellent performance for new products while guaranteeing high search accuracy of base categories after gradually expanding towards new product categories.In this dissertation,first,we mainly studied two different image representations,a feature fusion method,and a new measurement distance.Then,we also used part of the work to solve the new defined problem of image retrieval with few incremental samples.Finally,we deploy parts of our proposed methods and technologies to the application of Jingdong artificial intelligence checkout counter.
Keywords/Search Tags:Computer vision, Convolutional neural network, Content-based image retrieval, Multi-scale feature fusion, Few-shot learning, Incremental learning
PDF Full Text Request
Related items