Font Size: a A A

Research On Middle Semantic Representation Based Image Scene Classification

Posted on:2012-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:W J JieFull Text:PDF
GTID:1118330335451307Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of multimedia technology and computer network, the content-based image retrieval (CBIR) system becomes more and more important to organize, index and retrieve the massive image information in many application domains, which has emerged as a hot topic in recent years. Scene classification appears under the background above. Scene classification annotates automatically images based on a group of given semantic labels, which helps to provide effective contextual information on the higher level for image understanding task such as object recognition. The key point lies in how to train the computer to understand the semantic content of scenes from human cognition perspective, and recognize the similarities and diversities among scenes of different categories.Based on the middle representation of scene, our work focuses on how to extract effective visual information from the scene images and narrow down the well known semantic gap between low-level visual features and high-level semantic concepts. This paper achieves the following research results:Our work proposes a multiple class-specific visual dictionaries framework for scene category, where the class-specific visual dictionaries are constructed using mutual information as the feature selection method. According to the contribution of visual words to classification, universal visual dictionary is tailored to form the class-specific codebook for each category. Then, an image is characterized by a set of combined histograms which are generated by concentrating the traditional histogram based on universal codebook and the class-specific histogram grounded on class-specific codebook. Additionally, this paper also proposes a practical adaptive weighting method that leads to competition between the traditional histogram and the class-specific histogram. The proposed method can provide much more effective information to overcome the similarity of images of different categories and improve the categorization performance.Our work proposes a novel and practical algorithm for scene category called Multi-Scale Multi-Level pLSA model (MSML-pLSA). It consists of two parts: multi-scale part, where the image is decomposed into variant scales and diverse visual details are extracted from the layers of defferent sclaes to construct the multi-scale histogram, and multi-level part, where the representations corresponding to diverse numbers of topics are linearly concentrated to form the multi-level histogram. It is constructed to represent scene in variant visual granularity and semantic granularity. The MSML-pLSA model can create a more complete representation of the scene due to the inclusion of fine and coarse visual detail information in a joint approach and the comparative study shows the superiority of the proposed method.Our work presents a scene categorization approach by unsupervised learning the contextual information to extend the'bags of visual words'model to a'bags of contextual visual words model'. The contextual visual words represent the local property of the region of interest and the contextual property (from the coarser scale and neighborhood regions) simultaneously. By considering the contextual information of the ROI, the contextual visual word gives us a richer representation of the scene image which reduces ambiguities and errors.Our work focuses on the relationship between the number of interest points and the accuracy rate in scene classification. Here, we accept a common belief that more interest points will generate higher accuracy rate. But, few efforts have been done in this field. In order to validate this viewpoint, extensive experiments based on the bag of words method are implemented. In particular, three different SIFT descriptors and four feature selection methods are adopted to change the number of interest points. Experimental results show that the number of interest points can aggressively affect the classification accuracy.
Keywords/Search Tags:Scene Classification, Bag of Words model, Feature Selection, Scale Invariant Feature Transform, probabilistic Latent Semantic Analysis Model, Class-specific Visual Dictionary, Contextual Information
PDF Full Text Request
Related items