Font Size: a A A

Research On Features And Models In Image Classification And Recognition

Posted on:2009-08-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:D F HanFull Text:PDF
GTID:1118360245963358Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In image classification and recognition, image features and learning methods are usedto categorize images. Although this task is usually easy and natural for human beings, itis difficult for computers. Despite the great advance of image classification techniques inthe past few years, it is still one of the fundamental challenges in computer vision. Al-though numerous different technologies exist for solving this problem, a popular and domi-nant paradigm for image classification system involves that an appropriate statistical modelshould firstly be given, and then a learning procedure should follow from a collection oftraining example images of the category. In this thesis, we will focus our attentions on theimage classification and recognition problems from features to models.We will focus on two main types of works for image classification and recognition task.One type can be viewed as bag-of-word model and another is part-based model. A commonapproach in classifying images is to treat them as a collection of image features, describ-ing only their appearances and ignoring their spatial relations. Similar models have beensuccessfully used in the text community for analyzing documents and are known as"bag-of-word"methods, since each document is represented by a distribution over a fixed vocab-ulary. These models are generative models. For discriminative models, SVM is widely usedto obtain classifiers and in the amount of supervision that is required. These methods usingbag-of-word scheme to form the features and SVM is used to train the model. These meth-ods have shown good performances on a comparative evaluation with several state-of-the-artrecognition methods on some object databases. It can be seen from the analysis above thatthe main advantages of bag-of-word methods include: (1) Bag-of-word methods are simplebut effective; (2) Because of using local features, these methods have rotation, scale, trans-lation and affine invariant properties; (3) They are based on the statistical views; (4) Most of them are dense models, so these methods can solve partial occlusion problems. How-ever, spatial information is missing from these methods. Besides those methods inspired bybag-of-word, part-based methods are widely studied by researchers. These methods aim atdescribing the image or object as sparse parts. Such schemes model both the relative po-sitions of the parts as well as their appearances, given a sparse representation that capturesthe essence of the object. Both spatial relations and appearance information are captured.It can be seen from the above analysis that the part-based methods can always capture thespatial information. However, they have the following main disadvantages: (1)The modelsare formed by sparse parts(commonly less than 10); (2) They are sparse models, so oncesome parts are missing the whole model is invalid; (3) Most of them are supervised learningmethods. So the tedious manual training process will not be suitable for large scale imageclassification and recognition system; (4) Part-based methods use explicit object models.The explicit models can't always re?ect the real and main relations.It can be seen from above analysis, local features methods are better than global featuresespecially for resisting rotation, scale, translation and affine transforms. We have analyzedthe local features methods from detection to describing and matching. Based on the re-search, we propose to use recall, precision and three matching strategies as the criterions foranalyzing local features. The matching methods include: (1) Nearest neighbour matching;(2) Similarity matching; (3) Ratio matching. Additionally, we construct an experimentalplatform to test the different properties for different local features. All above work can givethe theoretical and practical guidance on how to choose local features in image classificationand recognition tasks.A multi-scales fast interest point detector based on Haar integral image and Fast pointdetector is proposed. We call it Haar-Fast detector. Gaussians are optimal for scale-spaceanalysis. In practice, however, the Gaussian needs to be discretized and cropped, and evenwith Gaussian filters aliasing still occurs as soon as the resulting images are sub-sampled.Also, the property that no new structures can appear while going to lower resolutions mayhave been proven in the 1D case, but is known to not apply in the relevant 2D case. Hence,the importance of the Gaussian seems to have been somewhat overrated in this regard. Weuse a simple way to replace the original Gaussian filter. Haar integral image is used and itis fast and can be viewed as a substitute for Gaussian filter. Inspired by the Fast method,the multi-scale Fast algorithm called Haar-Fast is proposed. Haar-Fast detector is faster thansome standard detectors. Based on the detector, a new signal process tool bidimensional empirical mode de-composition is used to describe the local regions. The main contribute of this algoithm isintroducing BEMD and Hilbert-Huang transform to local descriptors. The local regions aredecomposed by bidimensional empirical mode decomposition and several intrinsic modefunctions (IMFs) and the residual part are obtained. Then, Hilbert spectral analysis areconducted to describe the local regions. BEMD is locally adaptive and suitable for anal-ysis of non-linear or non-stationary signals. Because the local regions we would processalways occurs in the non-stationary regions, it would be very suitable to describe the localregions using BEMD and Hilbert-Huang transform. At the same time, it shares the goodfeatures with wavelet and Fourier analysis. The studies show that it can provide much bettertemporal and frequency resolutions than wavelet and Fourier analysis. The experiments onstandard data set show that Haar-Fast detector is faster than the compared methods with sim-ilar repeatability and the proposed descriptors have better results for the image illuminationchanges and geometry transforms. Additionally, the proposed algorithm is a new attempt tothe Hilbert-Huang transform.On the model level, a statistical model called latent local spatial relations (LLSR) ispresented as a novel technique of a learning model with spatial and statistical informationfor semantic image classification. The model is inspired by probabilistic Latent SemanticAnalysis (PLSA) for text mining. In text analysis, PLSA is used to discover topics in acorpus using the bag-of-word document representation. In LLSR, we treat image categoriesas topics, therefore an image containing instances of multiple categories can be modeled asa mixture of topics. More significantly, LLSR introduces spatial relation information as afactor which is not present in PLSA. LLSR has rotation, scale, translation and affine invari-ant properties and can solve partial occlusion problems. Using the Dirichlet process andvariational EM learning algorithm, LLSR is developed as an implementation of an imageclassification algorithm. LLSR uses an unsupervised process which can capture both spatialrelations and statistical information simultaneously. The main idea of our method is that therelations between regions have a strong effect on the semantic classification results. We tryto describe the spatial relations using the latent relationship which leads to latent local spatialrelations. Because it is difficult for previous methods to exactly capture both the appearanceand the spatial information simultaneously, the main idea is to design a scheme which cancombine the spatial relations with image statistical information together. Consequently, ourmethod shares the good features of bag-of-word and part-based methods. Our contributions are threefold. First, it can capture the statistical information as in bag-of-word methods.Also the learning algorithm is unsupervised, which can avoid manual controls. Secondly,we use affine invariant local region method to construct the representation of the local re-lations. It can obtain rotation, scale, translation and affine invariant properties. Thirdly, itis a dense model, which means our method can resist part occlusion. One can get a betterunderstanding of the model by going through the generative process for creating an imagein a specific category. We begin by first drawing a probability vector that will determinewhich topic to select while generating each local region of the image. In order to create eachlocal region in the image, we first determine a particular topic out of the mixture of possibletopics. For example, selecting a"face"topic will privilege some local regions that occurmore frequently in faces (e.g. eye regions). For each local region, we will determine whichvisual words will occur in this local region. We repeat the process of drawing the topics,local regions and the occurrences of visual words in each local region many times, eventu-ally forming an entire bag of regions that would construct a face image. The correspondinganalysis and experiments show the feasibility and validity of the proposed algorithm.For the learning framework, it is shown that supervised learning methods need morelabeled samples and will cost manual training process. However it is often easy to obtainunlabeled images in computer vision. We present a semi-supervised learning framework tolearn local spatial relations as a novel model combining spatial and statistical informationfor semantic image classification and recognition. We show how to use both unlabeled andlabeled images to train a discriminative classifier. Using labeled and unlabeled data is oftenreferred to as semi-supervised learning in the machine learning community. In particularwe show how semi-supervised learning can be used to learn classifiers with fewer trainingimages than the corresponding supervised framework. Our method are straightforward butefficient. We present a semi-supervised learning method based on random sampling strategy(Random Semi-Supervised Sampling, RSSS). The goal of this algorithm to efficiently useunlabeled images to train a discriminative classifier. The main point is the sampling methodwhich can gradually improve the classifiers and update the training sets. Random semi-supervised sampling (RSSS) is presented which is an iterative algorithm. It includes: (1)Graph-based semi-supervised classification; (2) Random semi-supervised sampling. Themodel is trained using SVM and the algorithm alternates between the steps till convergence.Besides the semi-supervised learning framework, we present a different perspectiveto look at the spatial relations problems. We propose local spatial histogram features to construct the representation of the local relations. Though histogram based features showexcellent performances, it doesn't contain spatial information. Local spatial histogram cancombines local regions histogram with local spatial information together. The main idea ofour features is that the relations between regions have a strong effect on the semantic clas-sification results. We try to describe the spatial relations using the local spatial histogram.We design a discriminative features which can combine the spatial relations with image sta-tistical information together. Consequently, our method shares the good features of bothunsupervised and supervised learning methods. The proposed method can be regarded asa discriminative model. The proposed algorithm uses a semi-supervised learning processwhich can capture both spatial relations and statistical information simultaneously. The ex-periments are demonstrated on some standard data sets and show that the proposed methodcan use both labeled and unlabeled images for training models and shows good performancesfor semantic image classification and recognition problems.In conclusion, the achievement of our research results enriches the approaches to theimage classification and recognition technology and its applications, and has a certain the-oretical and practical importance. This thesis provides useful methods and approaches forthe research and the development of image classification and recognition technology and itsapplications.
Keywords/Search Tags:Image Classification and Recognition, Graph Model, Local Region De- tectors, Local Region Descriptors, Spatial Relations, Expectation Maxi- mization, Supervised Learning, Semi-Supervised Learning, Unsupervised Learning, Topic Model, Spectral Clustering
PDF Full Text Request
Related items