Font Size: a A A

Research On Content-based Scene And Object Category Recognition

Posted on:2012-03-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:F X LuFull Text:PDF
GTID:1118330362958360Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development and spread of the Internet, a large corpus ofinformation in all possible forms,including images and video, is stored anddistributed over the web. Thus, it is becoming more and more important for usto classify and retrieve images by an easy and faster way. However, scene andobject category recognition is one of the most challenging problems in computervision due to illumination, scale, rotation, viewpoint, and pose variation. Inaddition, visual annotation ambiguities should be carefully considered in problemformulation. In this thesis, we focus on recognizing image categories fast andcorrectly.First, we design a practical scene and object category recognition system,and propose a multiple-feature-channel-based image representation. Being com-petent for diverse image classification tasks, this system extracts several feature"channels", each of which computes visual word histograms over the whole imageor subimage based on the Bag-of-Words (BoW) model and organizes them in aSpatial Pyramid (SP) to incorporate position information. The main difierenceamong feature channels exists in what type of feature detector/feature descrip-tor combination is used in the BoW model. Therefore, difierent feature channelsachieve difierent levels of trade-off between discriminative power and invariance.The proposed image representation forms a unified framework for efiectively orga-nizing a lot of commonly used feature detectors and descriptors. Support VectorMachines (SVMs) are then used to obtain the posteriori probabilities of an un-known image belonging to each possible category from the individual channels.Those intermediate results are finally combined to predict the label of the un-known image by using logical or statistical reasoning. The practical scene and object category recognition system largely reduces the time complexity, and iswell suited for the scenarios of large-scale image databases. Experimental resultsover several benchmark image datasets show that the method based on the pro-posed multiple-feature-channel-based representation achieves higher recognitionrates than most state-of-the-art methods.Second, we investigate the problem on feature combination, and propose fourrules (collectively called x-max): max-max, sum-max, prod-max and classifier-max. X-max combines intermediate results of the individual feature channelsin difierent ways. Specifically, for max-max, sum-max and prod-max, the scoreof a novel image belonging to a given category is determined by the maxima,arithmetic mean and geometric mean of the scores obtained from the individualfeature channels. However, for classifier-max, the score of a novel image belong-ing to a given category is determined by the 2nd-level classifiers. Compared withMultiple Kernel Learning (MKL) and Linear Programming Boosting (LPBoost),which are mostly used feature combination strategies for image classification,x-max has the following three advantages: (a) It has higher robustness, less pro-cessing load at each combination node, and is suited for parallel processing dueto its inherently distributed combination architecture. (b) It has good general-ization ability. When a new feature channel is added, only the classifiers relatedto this channel need to be trained. (c) It is orders of magnitude faster than MKLand LPBoost. Thus, x-max is very suitable for practical applications. Experi-mental results over five benchmark image datasets show that, combining multiplefeature channels by using x-max obtains recognition rates, which are higher thanMKL and slightly inferior to LPBoost, with much lower time complexity.Finally, we propose an image classification method based on Pyramid His-togram Of TOpics (PHOTO) image representation and AdaBoost classifier. Prob-abilistic Latent Semantic Analysis (pLSA) has been used to find the topics of textdocuments in the past decades. We incorporate position information into pLSAthrough Spatial Pyramid (SP). The image specific mixing proportions over theindividual cells of the corresponding pyramid are computed by using EM algo-rithm, and concatenated to a"long"vector representing the image. A variant ofAdaBoost is then used to recognize scene and object categories. In fact, pLSA is used to reduce the dimension of PHOW nonlinearly and extract the semanticconcepts in an image. The proposed PHOTO achieves satisfactory results on sev-eral benchmark scene and object datasets, and it is especially suited to recognizescene categories.
Keywords/Search Tags:Scene Recognition, Object Category Recognition, Multiple-Feature-Channel-Based Representation, Feature Combination, Bag-of-Words, Prob-abilistic Latent Semantic Analysis, Pyramid Histogram of Words, Pyramid His-togram of Topics
PDF Full Text Request
Related items