Font Size: a A A

Saliency Detection And Weakly Supervised Learning For Intelligent Visual Information Processing

Posted on:2019-10-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:D W ZhangFull Text:PDF
GTID:1368330623453432Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
As one of the most critical aspects of Artificial Intelligence,visual information processing has achieved rapid development over the past few decades.With the massive growth of the acquirable visual data,learning models,such as deep learning,that have the ability to handle such big data are playing the cole role in visual information processing.They are the key to improve the performance of computer vision tasks.Normally,such visual information processing technology mainly rely on manually annotated data to train learning models under the specific tasks.However,in order to train these models,people need to spend a lot of energy and time to annotate the training data,which,however,cannot meet the requirement of large-scale visual learning.To some extent,the heavy requirement of human annotation has hindered the existing visual information processing systems on the use of large-scale acquirable visual data.Consequently,intelligent visual information processing,which aims at autonomously understanding and analyzing the(semantic)contents of the image and video data,is becoming one of the most important issues in the research filed of the new generation of Artificial Intelligence.In intelligent visual information processing,how to improve the system autonomy,i.e.,how to implement visual learning under the minimal human supervision,is an important yet challenging problem,which,unfortunately,still remains under-solved today.To alleviate this problem,this thesis engages on proposing novel saliency detection and weakly supervised visual learning paradigms to endow machine the visual information processing systems the capability of more intelligent and autonomous visual understanding.The designed paradigms can be compatibly integrated with the existing depth learning models,providing large convenience for their application in various computer vision tasks.Specifically,on one hand,we explore ways to endow computer the capability to selectively attend to the portion of informative and salient visual stimulus like humans,which is known as the computational visual attention.On the other hand,we enable machine to mine and learn useful patterns from the weakly labelled visual data in a more autonomous way,which can hopefully overcome the contradiction between the‘unlimited visual data’ and the ‘limited human annotation’ and thus dramatically improve the autonomy and intelligence of visual information processing.The main contributions can be summarized as follows:1.We make the earliest effort to train powerful deep salient object detector without using any pixel-level human annotation.It is of great significance as it can integrate the advantages of the supervised deep neural network(DNN)-based approaches(i.e.,high performance)and the traditional unsupervised approaches(i.e.,high convenience).To this end,we reveal the insight of “supervision by fusion”,i.e.,generating reliable supervisory signals from the fusion process of weak saliency models in iterative learning stages.Specifically,through fusion,we can use the obtained fusion map to provide more reliable supervision and the sample confidence weights to generate the dynamic learning curriculum.Through comprehensive experiments on four benchmark datasets,we demonstrate that our insight can be successfully implemented via a novel unsupervised learning framework based on the two-stream fusion.2.We advance the understanding of a meaningful yet understudied problem,i.e.,event saliency discovery,by analyzing and detailing its main challenges in this research direction.To explore this novel direction,this paper proposes an unsupervised event saliency revealing framework.It first extracts features from multiple modalities to represent each shot in the given video collection.Then,these shots are clustered to build the clusterlevel event saliency revealing framework,which explores useful information cues(i.e.,the intra-cluster prior,inter-cluster discriminability,and inter-cluster smoothness)by a concise optimization model.Compared with the existing methods,we approach can highlight the intrinsic stimulus of the unknown events in the unconstrained internet video collections.The saliency moments discovered by the proposed method can improve the understanding about the video content.Comprehensive experiments on three benchmarks have conducted to demonstrate the effectiveness and efficiency of the proposed entire framework as well as the key components considered in our approach.Notably,the proposed method is able to achieve comparable or even better results to the existing supervised method.3.By clarifying a natural relationship between cosaliency detection and weakly supervised learning(WSL),we incorporate the multiple-instance learning(MIL)to capture implicit metrics for co-saliency detection.In addition,a novel self-paced learning(SPL)formulation is also proposed,which leverages two useful prior knowledges for co-saliency detection,i.e.,the sample diversity,and the spatial smoothness,in the learning procedure.By combining these two components,we establish a novel and general SP-MIL paradigm by integrating MIL and SPL into a unified model.The proposed SP-MIL model can gradually achieve faithful knowledge of co-saliency in a pure self-learning way.Experiments on benchmark datasets together with multiple extended computer vision applications demonstrate the superiority of the proposed framework beyond the state-of-the-arts.4.We propose a novel WSL framework based on Bayesian principles for detecting objects from optical remote sensing images(RSIs),which extensively reduces human labors for annotating training data while achieving performance comparable with that of the fully supervised learning approaches.For obtaining the effective high-level feature representation,a unsupervised feature learning scheme is proposed,which uses the deep Boltzmann machine(DBM)to capture the structural and spatial patterns for the geospatial objects.For demonstrating the effectiveness of our approach,three optical RSI datasets have been established by us,witch contain RSIs with different spatial resolutions and various objects of interest.Finally,extensive evaluations on these datasets are carried out to comprehensively analyze the components considered in the proposed framework and the effectiveness of the proposed methodology is verified successfully.
Keywords/Search Tags:Saliency detection, weakly supervised learning, object detection, event saliency, remote sensing images, unsupervised learning, multiple-instance learning, self-paced learning, deep Boltzmann machine, deep neural network
PDF Full Text Request
Related items