Font Size: a A A

Scene Semantic Parsing Using High-Order CRFs And Sparse Dictionary Learning

Posted on:2017-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:G B XuFull Text:PDF
GTID:2348330491451591Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Image semantic labelling as one of the fundmental problems of scene understanding,which has been one of hot research topics in the fields of computer vision. And, with the important value of academic research and engineering application, it can be extensively applied in the exploration search, automatic navigation, security, health care and other fields. This paper presents a scene semantic labelling framework with high-order conditional random fields(CRFs) and sparse dictionary learning. First, the superpixel region of the given scene can be obtained with the multiscale and hierarchical oversegmentation algorithm fused by depth information.Then, the second-order CRFs model with the fusion of the multimodal features of the perceiving regional cues, is presented to implement the image scene label based on the bottom-up and regional representation of the given scenes. Final, the high-order CRFs model via discriminative sparse dictionary learning, which is represented by the sparse dictionary on the visual feaures under the constraint of the statistcal prior knowledge from different classes, is exploited to implement the scene semantic label constrained with the top-down discriminative categorization cost of the given scenes.To resolve current fast unsupervised segmentation problem of being hard to produce superpixel regions with consistent edges for the complex scenes, a gpb-ucm-based hierarchical oversegmentation method for the given RGB-D scene is proposed by multiscale combinatorial grouping of the multi-source perception cues. First, to generate the regional segment map under different scales, we adopt the gpb-ucm segmentation approach to implement multiscale and hierarchical oversegment the given scene using the perception cues such as color, texture and its depth. Then, the projection transform is applied to align the regional segment maps with different scales. Finally, the edge weights of the aligned and oversegmented regional maps with different scales are fused and integrated to obtain the final segmentation map of the given scene. The experiments on NYU Depth v2 dataset shows that, the depth-based oversegmentation scheme with multiscale fusion can improve the performance of the regional boundaries of the supperpixel regions with respect to the seperated objects in the given scene, which can provide the compact and reliable structural representation of the given scenes for the semantic labelling framework with the probabilistic graphical model on basis of the assumed nodes using the obtained superpixel regions.In view of the poor representation ability of local visual features with single modality and being hard to capture effective dependency between contextual semantic labels in traditional semantic labelling scheme a bottom-up image semantic labelling method based on second-order CRFs model with multi-modal feature fusion is proposed to construct the context relationship in the given scenes. First, the superpixel region oversegmented by the given scene are taken as nodes in CRFs model,the normalized color-driven kernel description appearance features and normalized depth-driven HHA geometric features are extracted for every given superpixel region respectively. Then we concatenate both the normalized multimodal visual feature vectors in each superpixel region node and exploit the support vector machine(SVM) classfier to obtain the discriminative score for each semantic label in every superpixel as the unary potential of the assumed node in the CRFs model, while adopting the differences of LUV color spaces between the neighboring superpixel regions as the pairwise potential of the predefined edge in the CRFs model to bottom-up build the probabilistic graph model with regional level representation of the given scene.Next, we learn the predefined second-order CRFs parameters with Block Coordinate Frank-Wolfe(BCFW) optimization algorithm by minimizing the error between groudtruth annotation map and the predicted label map inferenced by the proposed CRFs model in the framework of structured support vector machine(SSVM).Final, if given the second-order CRFs model and input test image, the result of semantic labeling can be predicted by using graph-cut-based model inference method in the framework of maximizing the posterior probability(MAP).The experiments on NYU Depth V2 dataset show that the given CRFs-based labeling framework with depth cue and multi-modal feature fusion can produce final semantic label map with stronger visual expression and higher accuracy.In order to address the issue in which cannot capture long-range superpixel region interactions and object-level interdependence information from local context information in the second-order CRFs during labeling the given scene, a top-down semantic labelling scheme based on high-order conditional random fields(CRFs) guided by discriminative sparse dictionary learning is proposed. First, according each semantic category,visual features are extracted from corresponding superpixel region respectively which are sparse coding by sparse dictionary learning method to initial corresponding specific category sparse dictionary.The semantic labeling model based on top-down high-order CRFs is built by introduction of the high-order constraint cost of the sparse code histogram from every semantic class. Next, we learn the sparse dictionary with gradient descent method by controlling CRFs parameters unchanged,while learning the CRFs parameters with optimization method Block Coordinate Frank-Wolfe(BCFW) algorithm by minimizing the error between groudtruth annotation map and the predicted label map inferenced by the proposed CRFs model by controlling the sparse dictionary unchanged in the framework of structured support vector machine(SSVM)..Final, if given the high-order CRFs model in which high-order potential and unary potential according to statistics in superpixel region for each category are unified and input test image, the result of semantic labeling can be predicted by using graph-cut-based model inference method in the framework of maximizing the posterior probability(MAP).The experiments on the NYU Depth v2 and GRAZ02 datasets show that the model is more discriminational compared with other high-order models, the corresponding labeling framework can improve the accuracy of the target object labels.
Keywords/Search Tags:Semantic Labelling, Gpb-Ucm, Kernel Descriptors, Depth Informations, CRFs, Dictionary Leaning
PDF Full Text Request
Related items