Font Size: a A A

Visual Media Semantic Segmentation Based On Location And Shape Modeling

Posted on:2015-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2348330491962767Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Semantic segmentation for visual media is the key problem in pattern recogni-tion community.This thesis focuses on exploring the semantic segmentation problem for both scene images and 3D meshes.Theories and methods are presented from the perspective of data representation,feature extraction,models,and evaluation methods,respectively.We first explore the impact of modeling location and shape on image semantic segmentation problem,which aims at segmenting and labeling each pixel by its se-matic class.Existing methods for scene image parsing can be divided into two classes,strongly-supervised methods and weakly-supervised methods.The former utilizes full-pixel label information containing rich context information,hence can well model the spatial layout,object shape and region interaction.However,this kind of methods need full labeling datasets which are heavy burden for human labeling.The latter only needs the datasets that contain image-level labels,which can be easily obtained.However,it is still a challenge to explicitly model the relationships between textual labels and visu-al image regions.Hence,only simple location relationship can be utilized and modeled in this kind of methods.In this thesis,two models are proposed for scene parsing,respectively.The first one is the Weakly-Supervised Coherent Latent Topic Model(WCLTM).WCLTM builds on the multiple over-segmentated region representation,which can avoid the influence of instability brought by an segmentation algorithm.WCLTM combines the LDA framework with Markov modeling between topics and an annotation rea-soning modular.The Markov constrains can assist preserve the consistency between spatial nearby topics and the annotation reasoning modular allows this model only need image-level labels for training.The variational EM algorithm for training is also pro-vided.The second model,Multiscale Sum-Product Networks(MSPN),is a strongly-supervised model utilizing the multiscale unary potentials as inputs and modeling the spatial layout of image content in a hierarchial manner.The multiscale unary potentials can handle the semantic ambiguity problem.With the proper designed structure,MSP-N can characterize the interaction of the regions in a fine-to-coarse manner and model object shape as well.Finally,an over-segmentaion region based refinement scheme is proposed for improving the parsing results of MSPN.The experiments on MSRC,SIFTFLOW,UIUC Events datasets demonstrate the power of these two models.We further explore the impact of modeling shape on 3D mesh semantic segmen-tation problem.In this thesis,we present an automatic framework,which achieves seg-mentation in two stages,comprising hierarchical spectral analysis and isoline-based boundary detection.During the first stage,a single segmentation field is defined to capture the concave-sensitive shape information of sub-eigenvectors from a concavity-aware Laplacian.Specifically,we define weights by evaluating the utility of sub-eigenvectors for identifying segmentation boundaries and compute the single segmen-tation field by combining these sub-eigenvectors in a optimized way.The segmentation field can well model 3D shape and facilitate the following segmentation.During the second stage,we sample a number of isolines from the single segmentation field.Then we propose the divide-merge algorithm to group isolines and select one from each group as the final boundaries.The experiment results on the PSB dataset outperforms other state-of-the-art non-learning algorithms,which demonstrate the effectiveness of the proposed method.
Keywords/Search Tags:semantic segmentation, location modeling, shape modeling, topic model, sum-product network, spectral analysis
PDF Full Text Request
Related items