Object detection,which aims to determine the location and size of specific objects in an image,is a fundamental problem in the field of computer vision.Research results have been widely used in video surveillance,intelligence-assisted driving,and humancomputer interaction,thereby indicating the important academic significance and application value of object detection.Although substantial progress has been achieved in the field of object detection,the appearance of an object in real application scenarios is often affected by cluttered background,illumination changes,poses changes,partial occlusion,and other factors,which may degrade the performance of object detection algorithms.There are usually spatial and semantic contextual relationships among image elements,and contextual information can help to improve the accuracy of object detection.Thus,contextual information is widely used to overcome the aforementioned factors.A Conditional Random Field(CRF)is a Probabilistic Graphical Model(PGM)for context modeling,which represents image elements by nodes,and expresses contextual relationships of image elements by constraint relationships on nodes.Its core is to construct an energy function(cost function)to jointly predict the object categories of all image elements through Inference,therefore takes advantages of global image information for classification of image elements.In this thesis,we focus on object detection by utilizing contextual information through CRFs.The major contributions of this thesis are summarized as follows:(1)Hough transform-based methods detect objects by casting votes to object centroids from object patches.It is difficult to disambiguate object patches from the background by a classifier,as an image patch only carries partial information about the object.Context information among image patches can help to improve the accuracy of classification.To leverage the contextual information among image patches,we capture the contextual relationships on image patches through a Conditional Random Field(CRF)with latent variables denoted by Locality-constrained Linear Coding(LLC).The strength of the pairwise energy in the CRF is measured by using a Gaussian kernel.In the training stage,we modulate the visual codebook by learning the CRF model iteratively,and then learn its spatial occurrence distribution.In the test stage,the binary labels of image patches are jointly estimated by the CRF model.Image patches labeled as the object category cast weighted votes for possible object centroids in an image according to the LLC coefficients.Experimental results on the INRIA Pedestrian,TUD Brussels,and Caltech Pedestrian datasets demonstrate the effectiveness of the proposed method compared with other Hough transform-based methods.(2)In this work,we propose a pedestrian co-detection method that combines the strengths of Convolutional Neural Networks(CNNs)and Locality-constrained Linear Coding(LLC)in a unified Conditional Random Field(CRF)model.First,we obtain object candidates by using a Region Proposal Network(RPN).Second,we build a fully connected CRF that consists of unary potentials on individual object candidates and two types of pairwise potentials on pairs of object candidates.The unary potential is computed independently for each object candidate by using the baseline method.The pairwise potentials consist of CNN and LLC representation-based potentials,which contribute to the capturing of relationships among object candidates in test images.Finally,we jointly predict the category labels of all the object candidates through the mean field inference in the CRF.We evaluated the proposed method on the ETH,Caltech,and INRIA Pedestrian datasets.The experimental results demonstrate the effectiveness of the proposed method as compared to the baseline method.(3)In recent years,co-detecting objects through the use of contextual information across multiple images has attracted considerable attention.In this work,we introduce an object co-detection method that exploits contextual information among multiple images through a higher-order Conditional Random Field(CRF).First,we obtain object candidates from each image of a test set by using a pre-trained detector,and extract multiscale ROI features.Second,we feed the object candidates into a higher-order CRF that consists of unary potentials,pairwise potentials and higher-order potentials.The unary potentials rely on the output of the baseline method.The pairwise potentials capture the contextual relationships on pairs of object candidates.The higher-order potentials express the object category co-occurrence costs in an image.Finally,we jointly predict the category labels of all object candidates through the mean field inference in the CRF.Experimental results on the Caltech Pedestrian,PASCAL VOC 2007,PASCAL VOC 2012,and COCO datasets demonstrate the effectiveness of the proposed method compared to the baseline method.(4)In this work,we propose a pedestrian detection scheme based on hierarchical context information,which improves the performance of pedestrian detection by capturing the context information between different levels of image elements(object level and pixel level).First,we adopt the baseline detector to obtain pedestrian candidates from each image.Second,we feed the pedestrian candidates from multiple images into a fully connected Conditional Random Field(CRF)that capture context information at the object level.The initial confidence scores of the pedestrian candidates are obtained simultaneously through inference in the CRF.Finally,we optimize the confidence scores of the object candidates in accordance with the semantic segmentation clues based on the CRF at the pixel level.Experiments on several evaluation settings of the Caltech benchmark demonstrate that the proposed method achieves better detection results than the baseline method. |