Font Size: a A A

Study On Image Semantic Segmentation

Posted on:2015-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhangFull Text:PDF
GTID:2308330464959714Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the bloom of Internet and the development of multimedia technology, there is a prolonged explosion on the number of multimedia data, represented by web images. Massive data also brought challenges coming from many aspects, such as the storage, retrieval and management. Major research engines are based on the text descriptions of multimedia data, which is not always relevant to the data and thus leads to poor performances. Moreover, facing up to large-scale web images, many multimedia data do not have any text description at all. So how to automatically and effectually organize and manage massive data, as well as how to effectively and efficiently understand the semantic meaning of massive data have become the burning questions.In this thesis, we focus on two crucial tasks in large-scale image retrieval and un-derstanding, i.e., automatic image annotation and image semantic segmentation. We take insight into the critical problems of these tasks.and propose novel approaches.Automatic image annotation task amis to train computers with human-labeled im-ages to add semantic labels on an unlabeled image, as a list of objects that appear in the image. Comparing with content-based image retrieval which estimates the similar-ity among images with the feature extracted, retrieval on semantic labels is much more rapid. The proliferation of web images largely enriches the availability of images with human-added labels. So it is of great importance to develop automatic image annotation techniques in dealing with large-scale image retrieval and understanding.Based on image-level feature and semantic, the main limitation of image annota-tion models is that they could neither obtain nor provide the location of each semantic label in the image, and thus leads to inaccurate prediction. Aiming to provide the se-mantic category of every region and pixel, semantic segmentation recently has attracted great research interests.In order to better learn the semantic of images and superpixels, we propose a fully supervised semantic segmentation approach using multiple graphs with block-diagonal constraints. Describing images from different views, each feature is of its unique pros and cons, and it is difficult to identify the importance of features to different categories. The proposed fully-supervised approach constrains the affinity matrices of each feature to be block diagonal, an essential of semantic segmentation, and maintains consistencies among multiple heterogeneous feature spaces as well as the semantic space, to learn multi-view superpixel affinities. We formulate the problem as a convex optimization problem which can be addressed efficiently. The label prediction process is finally performed with the learned multi-view affinity graph.However, traditional approaches heavily rely on training resources manually la-beled on each pixel, which are quite limited. It is intuitively reasonable to find methods that utilize millions of images with only image-level labels to overcome this shortage, known as weakly-supervised settings. The most crucial difficulty in this scenario arises that, image-level labels are much coarser cues which are difficult to be effectively in-corporated into the segmentation model.We propose a novel weakly-supervised semantic segmentation approach that takes advantage of evaluating models instead of training by exploiting characteristics of data distribution in category-specific subspace. For a certain category, provided a classi-fication model, we firstly learn the basis superpixels of the subspace spanned by this model, then evaluate the parameters by measuring the errors of negative and positive superpixels reconstructed by these bases as scores, and finally choose the model with best score as the classifier this category. To accelerate the proposed approach and avert the expense of random sampling, we design an Iterative Merging Update (IMU) al-gorithm based on Gaussian Mixture Models to fit the distribution given parameters of scores and indicate the probable area of optimal parameter for the classification model.Experimental results on real-world datasets show that the proposed approach out-performs the existing methods. In spite of training in weakly-supervised settings, the proposed weakly-sueprvised approach competes with state-of-the-art fully-supervised methods.
Keywords/Search Tags:Computer Vision, Machine Learning, Semantic Segmentation, Image An- notation, Model Evaluation
PDF Full Text Request
Related items