Font Size: a A A

Local Features Understanding:Models And Applications

Posted on:2018-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:X P ZhangFull Text:PDF
GTID:1368330590455290Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Image classification and detection have been the basic and core problems in computer vision.In order to deeply understand the natural images,we need to capture the crucial information in the images and describe them efficiently.However,due to the lack of abundant annotations,training the corresponding detection models is in no way an easy task.Moreover,viewpoint variations,illumination variations,object deformation and occlusion further make object detection very challenging.On the other hand,due to the semantic gaps between the low-level image representation and high-level visual concepts,feature representation is not optimal for specific application.Hence,designing new kind of features which are suitable for specific applications is of vital importance for recognition and detection.The main research topic of this dissertation focuses on local feature understanding and its related applications,which include local feature localization and representation,and we make some improvements over the existed approaches.Here local features indicate the interested salient regions,target object and parts etc..This dissertation proposes several strategies to capture the local information and describe them effectively.Specifically,we apply these algorithms to fine-grained recognition and weakly supervised object localization and detection tasks.The main contributions of this dissertation includes:This dissertation proposes a new framework to cope with part localization and description in fine-grained domains.For part localization,the less deformable parts are first detected with the template-based model,which can be regarded as semantic prior of an object.Then the other parts are obtained by geometric alignment of foreground mask under such semantic prior.The semantic prior is incorporated into geometric alignment,which enables more accurate part localization.For description,we learn One-vs-All Features,which are simple and transplantable,and the learned mid-level features are dimension friendly and more robust to outlier instances.Considering that some subcategories are too similar to tell them apart,we fuse them iteratively using Neighbor Joining method,and learn Fused One-vs-All Features(FOAF)based on these fused subcategories.Integrating all these techniques produces a powerful framework for finegrained visual categorization,and outperforms the existing methods by a considerable margin.This dissertation develops a framework for fine-grained recognition which is free of any object / part annotation at both training and testing stages.Our method incorporates deep neural activations for both part localization and description.We claim two major contributions.Firstly,a picking strategy is utilized to select distinctive neurons which respond to the specific parts significantly.Based on these picked neurons,we elaborately choose positive samples and train a set of discriminative detectors via a regularized multiple instance learning task.Secondly,we develop a simple but effective feature encoding method,which we call SWFV-CNN.SWFVCNN packs local CNN descriptors via spatially weighted combination of Fisher Vectors,which considers the importance of each descriptor for recognition.Integrating the above schemes produces a powerful framework,and shows notable performance improvements on several widely used fine-grained datasets.This dissertation proposes to learn a set of detectors in a weakly supervised paradigm.The main contribution is an iterative optimization strategy for detector learning,which we formulate as a confidence loss sparse Multiple Instance Learning(cls-MIL)task.Different from conventional MIL methods which represents each positive image with a single instance and treats each image equally important,cls-MIL represents each positive image as a sparse linear combination of its member instances,and considers the diversity of the positive images,while avoid drifting away the well localized ones by assigning a confidence value to each positive image.The responses of the learned detectors formulate an effective mid-level image representation for recognition.Another interesting finding is that different from most previous methods which treat image classification and object localization separately,the proposed approach is able to effectively integrate the two tasks into a whole framework.Benefit from the powerful discriminative ability of the learned part detectors,the detector responses by our approach are able to indicate the locations of the objects.Experiments conducted on benchmark datasets demonstrate the superiority of the proposed representation.This dissertation proposes an end-to-end network architecture for weakly supervised object detection.Our proposed model aims to learn a semantic loss regression detection model with automatically object instance mining.Different from previous works which first mine object instances from weakly labeled images and independently train detection models,we combine these two procedures into an integrated framework.The basic idea is to train an end-to-end network,which simultaneously performs object instance mining and detection model training.To this end,we branch the network into two sibling substreams,one stream for image-level classification,and the other for region-level detection.Since only image-level labels are available,the object instances used for region-level detection stream training are mined online automatically,guided by the image-level classification stream.These two streams are jointly trained,which we formulate as a multi-task learning procedure.Experimental results demonstrate that they can benefit from each other.
Keywords/Search Tags:Local Features, Fine-grained Object Recognition, Weakly Supervised Object Detection, Mid-level Feature Representation, Multiple Instance Learning
PDF Full Text Request
Related items