Font Size: a A A

Research On Perception-Oriented Image Scene And Emotion Categorization

Posted on:2012-09-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y LiuFull Text:PDF
GTID:1118330335499419Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of multimedia technology and computer network, the number of available images increases with an explosive speed. However, the increasing number of images brings some trouble to users:they cannot find what they really need from huge amount of available data. Therefore, image scene and emotion categorization technologies are required urgently. Usually, the user focuses on the semantic meanings reflected by image content during his classifying process. Due to the limitation on image understanding, it is quite difficult to infer the image semantics directly from its visual features. From the persective of cognitive psychology, our work narrows down the well known semantic gap between low-level visual features and high-level semantic concepts with the hierarchical structure, following the routine of "Bag of Visual Words-—Semantic Topic Model—Emotional Mapping Function". The former part of this dissertation mainly focuses on the scene categorization of natural scene image, especially about how to model the Bag-of-Semantic-Visual-Words from the image and how to effectively integrate into the contextual information to generate the semantic topics. The latter part of this dissertation mainly focuses on the emotion categorization of natural scene image, especially about how to establish an effective machine learning model to resolve the emotion categorization problem based on the visual cognitive theory.For the scene categorization problem based on the visual words, by analyzing the fact that the performance of Bag-of-Visual-Words methods depends in fundamental way on the visual words, the dissertation presents a novel learning framework to design discriminating semantic visual words. The process generates the visual words based on the semantic similarity instead of manually labeled, which is different from the traditional way of learning the visual words based on the appearance similarity. Specifically, Gaussian Mixture Modeling (GMM) is firstly applied to adapt the visual features of patches to semantic interpretation by taking the scene class label as a bridge. Furthermore, the Information Bottleneck (IB) algorithm is introduced to cluster the patches into "semantic visual words" from the perspective of semantic interpretations. Once obtained the semantic visual words, the appearing frequency of the corresponding semantic visual words in a given image forms a histogram, which can be subsequently used in the scene categorization task via the Support Vector Machine (SVM) classifier.For the scene categorization problem based on the semantic topic model, the dissertation presents a novel scene categorization algorithm by integrated the semantic contextual information to resolve the synonymous and the polysemous problems in the visual words. This algorithm combines the image patches'feature appearance similarity and contextual semantic information together to generate the topics. Specifically, the useful contextual semantic constrained information are the semantic co-occurrence probabilities of image patches obtained by probabilistic Latent Semantic Analysis (pLSA) instead of by manually labeled. Furthermore, we introduce the pseudo-likelihood of labeling to combine the feature appearance similarity and contextual semantic information together in order to provide the more accurate semantic topic representation. Once obtained the semantic topics, the appearing frequency of the corresponding topics in a given image forms a histogram, which can be subsequently used in the scene categorization task via the Support Vector Machine (SVM) classifier.For the emotion categorization of natural scene images, the dissertation presents an emotion categorization using Affective-probabilistic Latent Semantic Analysis model based on the visual cognitive theory. Traditional emotion categorization algorithms were regarded as the general machine learning problem. Distinct from them, the proposed approach resolves the emotion categorization problem based on the fact that the visual information at the emotion level is aggregated according to a set of rules. Specifically, each image is modeled as a matrix, where elements record the correlations of pairwise visual words. Then we discover the emotion topics using a novel Affective-probabilistic Latent Semantic Analysis (Affective-pLSA) model, which is an extension of the pLSA model. Considering that the natural scene image evokes multiple emotional feelings, emotion categorization is carried out using the multi-label K-nearest-neighbour (ML-KNN) approach based on emotion topics. Although the pLSA model is widely used for image classification tasks, none of them attempts to discover topics based on the composition of visual words, which is a more suitable and comprehensive manner for emotion categorization.
Keywords/Search Tags:Scene Categorization, Emotion Categorization, Information Bottleneck, Markov Random Fields, probabilistic Latent Semantic Analysis
PDF Full Text Request
Related items