Font Size: a A A

Weakly Supervised Learning Cosegmentation And Localization For Images

Posted on:2019-02-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y RenFull Text:PDF
GTID:1368330572952250Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Weakly supervised image and video cosegmentation and localization,under the domain of computer vision,aims at exploiting common information embedded in images and videos with mere supervision,which relieves the reliability of labeled data and the burden of manual annotation.As an under-developed researching area,commonality analysis has being arousing increasing attention.Many brilliant works have been published by researchers.However,quite an amount of challengeable difficulties are still waiting to be conquered,such as the variation of illumination,the diversity of objects' scale and orientation,the occurrence of obstacles,the balance between efficiency and accuracy,to name just a few.Therefore,in this thesis,we mainly focus on three specific problems under the domain of commonality analysis:eye localization,image cosegmentation and video cosegmentation.The main contributions of this thesis can be briefly concluded as follows:1.As for eye localization,which refers to simultaneously localize the dedicate positions of two eyes from images with human faces,we intend to deal with the problem of precisely locating eyes in arbitrary rotation degrees from not only small eye regions and face images but also human portraits.A rotation invariant eye localization method is proposed through inheriting the characters from invariant local features and propagating them to the whole localization process.Following the classic three steps of object detection:representation,classification,detection and localization,our method firstly learns a feature codebook based on a small group of eye patterns,then generates heat maps of potential eye positions with the integration of a Sparse Representation Classifier and a pyramid-like detecting and locating strategy,and finally refines the results with the assistant of prior information.Outstanding rotation invariant performances are obtained by the proposed framework compared with other state-of-the-art eye localization methods.Likewise,our approach is adequately flexible to be implemented on a wide range of image regions from small eye areas to large portraits.2.As for image cosegmentation,which refer to delineate co-occurrent objects at pixellevel from a group of images only with the information of class-labels,we intend to realize the mutual learning between two main properties of coherent objects:saliency and similarity.Most of the existing image co-segmentation methods only pay emphasis on either of them.Our work is twofold.(a)On one hand,we proposed a mutual learning framework based on the combination of structured sparsity and discriminative learning for image cosegmentation(named as "OUR1").Weighted structured sparsity,with low-rank matrix decomposition,is constructed for the saliency detection of common objects.While,discriminative learning,based on logistic regression,is established for the correspondence evaluation.The mutual learning process is realized through the interaction between the weights of structured sparsity and the parameters of logistic regression.This framework is among the first to learn prior knowledge for structured sparsity rather than using experienced values,which is capable of generating object-oriented saliency hints for common objects.(b)On the other,considering the fully exploitation of geometrical information from image hierarchies,another treegraph cut like formulation is constructed based on the integration between tree-structured sparsity and tree-graph matching(named as OUR2).Compared with the aforementioned OUR1,the participation of tree-graph matching technique intensively explores the geometrical correspondences among coherent objects.Meanwhile,additional texture and consistency terms are involved into the treegraph cut framework to further refine the final cosegmentation results.Experimental experiments show both OUR1 and OUR2 process comparable perfor-mances on benchmark datasets,which mainly due to the mutual learning framework between saliency and similarity.OUR2,particularly,outperforms OUR1 through thor-ough discovery of geometrical information with the tree-graph cut framework.3.Further considering the efficiency and flexibility of image cosegmentation,a unified mutual learning framework is proposed(OUR3)based on tree-structured sparsity and tree-graph matching,which deeply exploit the potentials of hierarchical structures.We utilize Laplacian Matrix to realize the unification and optimize the formulation with Augmented Lagrange Multiplier and Smoothing Proximal Gradient.Meanwhile,two strategies,active node strategy and tree reconstruction strategy,are generated to improve the efficiency and accuracy of the algorithm.The former manages to automatically select key nodes based on object-oriented saliency,which dramatically shrinks the searching space.The latter adjusts the structures of image trees based on mutual learning results,which fully improves the accuracy of cosegmentation.Confirmed by the experimental results,the unified mutual learning framework delineates dedicate segments of common objects,which performs better than other state-of-the-arts on benchmark datasets.4.A deep-descriptor based video cosegmentation framework is proposed,which intend to exploit the reuse value of pre-trained deep networks.A pre-trained CRF-RNN model originally for image semantic segmentation is utilized to extract deep features for video cosegmentation,which inherits the discriminative representation ability of deep neural networks.Then a clustering cosegmentation method is implemented with deep descriptors,which computes the intra-frame,inter-frame and across-video correspondences of common objects spatially and temporally.Finally,the refinements of the pixel-level segmentation results are implemented through weighting the layers of the pre-trained CRF-RNN models.Experiments shows the potentials of combining video cosegmentation framework with deep learning.
Keywords/Search Tags:computer vision, weakly supervised, eye localization, image cosegmentation, video cosegmentation
PDF Full Text Request
Related items