Font Size: a A A

Research And Application Of Multi-scale Representation And Regularization Method In Image Recognition

Posted on:2015-02-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:M WuFull Text:PDF
GTID:1268330422988743Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the nonstop popularization of mobile devices and the continuous advancesof the Internet, the information exchanges with images as the carrier become moreprevalent. How to automatically understand what a taken photo conveys at the seman-tic level is a both pragmatic and imperative demand. Image recognition, which is ofessence regarding the recognition of objects in the image and the scene the objects re-side in, serves as an indispensable tool. Generally, object recognition ranges from therecognition of the individual, to subordinate categories and to basic-level categories.Scene recognition (a.k.a, scene categorization) is to identify the semantic label (e.g.,mountain, coast, etc.) of an image via its content and thus provides the valid contextfor tasks like object recognition. To solve these two kinds of pattern recognition prob-lems, with an analysis on the status of key modules of an image recognition system,the dissertation puts an emphasis on the multi-scale information in devising a featureand regularization in constructing an effective method and conducts a broad researchof their applications in illumination preprocessing for image recognition, facial genderrecognition, object recognition, and scene recognition.With a view to the influence of illumination on image recognition, the first prob-lem under consideration is to solve the color shift caused by illuminant change forthe purpose of color constancy. In general, single color constancy algorithms cannotconstantly gain perfect results on the images with various texture for their specific as-sumptions. A fusion method based on texture pyramid matching and regularized localregression (TPM RLR) is proposed accordingly. We first construct a texture pyra-mid based on multi-scale representation and use the Weibull distribution parametersto extract texture features of images. Then we define a new image similarity measureto retrieve similar images of the test image. Finally, considering the integration of data-driven methods and prior-knowledge-based methods, we apply a regularized lo-cal regression to fuse the results of single algorithms in an opponent color space, i.e.,lαβ. The experimental results on two natural image datasets indicate TPM RLR canstrikingly improve the illumination estimation with at least29%decrease w.r.t medi-an angular error compared to the best-performing single algorithm, and its correctedeffect is superior to those of other combination methods under either subjective andobjective criteria.Focusing on a specific object, i.e., face, we consider the facial gender recogni-tion as the next topic. First, we propose a multi-scale learned pattern (MSLP) basedface descriptor. The proposed feature learns multi-scale convolution templates via oneof algorithms like ICA, PCA or K-means, then encodes the images according to theresponse order of these templates, and finally forms a more compact and discrimi-native histogram feature for facial representation. Having witnessed the success oflinear representation based classifiers (LinearRCs) in face recognition and consideredthe different characteristics of data distribution between face recognition and genderrecognition, we systematically analyze the application of LinearRCs on facial gen-der recognition. Stemming from the idea of prototype generation, a LinearRC usingpartial least squares (i.e., LRC PLS) and its group based variant are proposed. The ex-tensive experiments on facial gender recognition show that the features using MSLPare better than previous hand-engineered features, that LRC PLS is more stable thanother LinearRCs while having relatively less prediction time, and that its group variantcan further boost the performance.What’s more, in light of the success of prototype concept in gender recognition,the dissertation presents a LinearRC named multi-scale query-expanded collabora-tive representation based classifier with class-specific prototypes (QCRC CP) fromthe viewpoint of dictionary learning in order to deal with more complex object recog-nition (i.e., to deal with objects with multiple pose or view, more generic objects, etc.).We firstly expand a single query image into a query set by scaling, and then constructa query-dependent dictionary via canonical correlation analysis (CCA) between thequery set and the samples from a specific class. By integrating the prototype selectionbased methods and prototype generation based methods, the proposed method of dic-tionary learning generates query-related class-specific prototypes in a class-wise way, in which the implicit query-dependent data locality discards the outliers. In the end,multivariate collaborative representation based classification (CRC) using newly con-structed dictionary is employed to determine the identity according to the rule of mini-mum normalized residual (MNR). The experiments cover multi-pose face recognition,leaf species recognition, character recognition in natural scene and generic multi-viewobject recognition. The results demonstrate that QCRC CP can gain pleasing resultsand the proposed approach of dictionary learning is superior to previous methods viaprototype selection or generation. For instance, in contrast to other LinearRCs, QCR-C CP improves the recognition accuracy with an increase of over10%on characterrecognition.Last but not least, considering a proper distance metric in scene recognition caneffectively reflect semantic distance between samples in high-dimensional space, wepropose a metric learning algorithm based on regularized linear discriminant anal-ysis (RLDA) to learn a full parameter matrix of Mahalanobis distance metric. It’sacknowledged that high-dimensional features with relatively limited samples mightintroduce huge computational cost and overfitting in conventional metric learning. Anovel framework is thus presented to resolve these problems. In our framework, theestimation of the parameter matrix is decomposed into that of a projection pool ma-trix and a non-negative diagonal selection matrix, which can significantly reduce thenumber of estimated parameters to a small scale. We first construct the projection vec-tors by tuning regularized LDA with different parameters. Based on side information,nonnegative l2-norm regularized least squares is then used to select and weigh the pro-jection vectors on the constructed training dataset. The training dataset is comprisedof the pairwise squared differences of the projected samples in the dissimilarity andsimilarity pair subsets. To maintain the balance of two subsets, a simple but effective s-trategy based on K nearest neighbors (KNN) is provided. It turns out that the proposedalgorithm can achieve higher recognition rates on two datasets for scene recognitionwhile retaining such high efficiency that obtains several-fold, and even dozens-foldacceleration compared with conventional metric learning methods.
Keywords/Search Tags:image recognition, multi-scale representation, regular-ization method, object recognition, scene recognition
PDF Full Text Request
Related items