Font Size: a A A

Research On Scene Understanding Methods Based On Probabilistic Graphical Models

Posted on:2014-09-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:L MaoFull Text:PDF
GTID:1268330401967856Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Scene understanding, as an important basic problem and ultimate goal in computervision, has been widely applied in many fields, such as, robot navigation, security,medical treatment and web search. According to the idea of “Divide and Conquer”, eachbranch of scene understanding, including object detection, image segmentation andscene classification, has made a breakthrough. However, the overall sceneunderstanding is far from achieving. In recent years, according to the idea of “Mergethese subtasks”, scholars have put forward the concepts of semantic segmentation, andlater joint object detection and semantic segmentation so as to realize the ultimate goalof scene understanding. In some sense, scene understanding can be formulated assemantic segmentation, and besides, other high-level semantic information be obtainedfrom it. Joint object detection and semantic segmentation can localize each object andprovide the number of objects, and besides, achieve semantic segmentation. However,current research results are not satisfactory. This dissertation focuses on the researchhotspots and difficulties, including object detection, semantic segmentation, joint objectdetection and semantic segmentation. In order to overcome the shortcomings in theexisting methods, this dissertation proposes some solutions based on probabilisticgraphical models. In this dissertation, the main contributions are described below:1. This dissertation focuses on the way to build advanced conditional random fieldmodels, which can accurately reflect real constraints in the visual scene and thusimprove the semantic segmentation performance. This dissertation puts forward threemodels:(1) Pairwise conditional random field model based on the enhanced texton map.This model is composed of unary item and pairwise item (model I). The unary item isconstructed by jointboost classifier, and the pairwise item reflects the smoothnessconstraint between adjacent pixels. The model is simple and thus simplifies the learningprocess of the model parameters. To describe the texture characteristics better, LBP,SIFT and Color SIFT are used to enhance the original texton map; on the other hand, toobtain more discriminative features, the texton-layout filter is defined on the enhanced texton map, and is used as the weak classifier of jointboost, which introduces the shape,location and context information. The experimental results show that the modelachieves better semantic segmentation performance.(2) Higher-order conditional random field model based on the global same topicconstraint (model II). In order to overcome the limitations of model I, higher-order itemis introduced to build up higher-order conditional random field model, which reflectsthe global same topic constraint. Firstly, normalized cuts segmentation is performedseveral times; secondly, the same topic segments are found by using topic model; andthen the higher-order item is defined on the same topic segments; finally, thehigher-order item and the model I are combined to achieve higher-order conditionalrandom field model. This model not only considers the local texture feature constraintfor pixel categories, but also reflects the consistency of the same topic segments’category. Good semantic segmentation results are obtained in the experiments.(3) Hierarchical conditional random field model fusing both of the basic processingunits, i.e. pixel and segment. This model is composed of observation data layer, pixellayer and segmentation layer. Observation data layer is the original image; the model Ibased on pixels constitutes the pixel layer, which reflects the local texture constraint forpixel categories and smoothness constraint between neighbouring pixels; the model Ibased on segments constitutes the segmentation layer, which reflects the featureconstraint extracted from segments for segment categories, region consistencyconstraint and smoothness constraint between neighbouring segments. The associatedenergy term is defined on segments and pixels within them, and thus fuses both of thebasic units, that overcomes the defect of using only a processing unit. This articleseparately adopts two methods to generate the segmentation layer, i.e. multiplesegmentation mode and constrained parametric min-cuts. In addition, this dissertationpresents a new first-second-order pool method to describe the segmentation area morestably and reliably.2. This dissertation proposes an object detection method based on partial leastsquares analysis. Firstly, multi-scale sliding window searching is performed, and thehigh-dimensional feature description is obtained through intensive sampling. Secondly,the partial least squares method is used to extract out a few of latent components fromthe original high-dimensional features, which constitute low-dimensional feature space. In this dissertation, quality ratio is used to determine the best number of latentcomponents. Finally, the mean shift with Gaussian kernel is used to perform nonmaximum suppression, which removes overlapping bounding boxes, and gets the finaldetection result. The experiment results show that, the method is better than PCA inreducing dimentions, and gets more discriminative low-dimensional feature expression,and obtains better results than Dalal’s algorithm.3. This dissertation proposes a new higher-order conditional random field model tosolve the problem of joint object detection and semantic segmentation. Its basic idea is:on the basis of the model II, we define the object detection higher-order energy item,which introduces the results obtained by the object detector into the energy equation, asa kind of constraint. This constraint competes with other constraints, e.g. local texturefeature, smoothing prior between pixels, region consistency constraint, to jointlydetermine the category of pixels. Additionally, this dissertation puts forward two kindsof methods to generate detection energy term: one is to directly use results generated byobject detector, the other is to extract the global shape characteristics and the localtexture features from the bounding box at the same time, and obtain more robustexpression of these characteristics through first-second-order pooling, and then computethe detection energy item based on output of logistic regression classifier. Theexperimental results show that the model can complete both the object detection andsemantic segmentation tasks simultaneously. Moreover, it shows superior to manycurrent semantic segmentation algorithms.
Keywords/Search Tags:Scene Understanding, Semantic Segmentation, Object Detection, JointObject Detection and Semantic Segmentation, Conditional Random FieldModel
PDF Full Text Request
Related items