Font Size: a A A

Local Feature Modeling For Visual Object Categorization

Posted on:2013-12-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:H X WangFull Text:PDF
GTID:1228330395967380Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Visual object categorization is a very challenging key technology in the application of image retrieval, massive video data, visual perception alternative, automatic robot, interactive games, and etc. It has a wide range of application needs and significant value both in applications and in researches. The research goals of visual object categorization are to detect objects in images and to determine the object’s categories. This dissertation studies local feature modeling for visual object categorization using bag-of-visual-words method, including local feature linear coding, pooling and spatial matching scheme. Some improved methods are proposed. The main work and innovation are presented as following:(1) Locality-constrained principal component linear coding (LPLC) is proposed. Through the linear correlation analysis between the local feature and its K-nearest-neighbor visual words and significance testing of locality-constrained linear coding, this dissertation finds that the fundamental reason for causing nonsignificance of the weight coefficient is the multicollinearity of K-nearest-neighbor visual words. So LPLC is presented. Experiments have been conducted for comparing and evaluating the proposed method utilizing the Caltech-4dataset. Experimental results show that the proposed method improves the classification accuracy.(2) Locality-constrained linear coding based on the principal components of visual vocabulary is proposed. LPLC solves the multicollinearity, but it increases the time overhead of the coding. To determine the principal components of K-nearest-neighbor visual words of each local feature is simplified to only determine the principal components of visual vocabulary. The number of principal components of the visual vocabulary is determined according to the cumulative contribution ratio. Experimental results show that linear encoding time is reduced by1/3using the proposed method; in the case of the comprehensive consideration of coding time and classification results, the proposed method is optimal when the cumulative contribution ratio is85%as well as the number of principal components of the visual vocabulary is20; in the each pooling way, the two proposed linear coding methods all improve the classification accuracy and their classification accuracy results are similar, indicating that locality-constrained linear coding based on the principal components of visual vocabulary reduces the time overhead and the same time it retains the advantages of LPLC.(3) The spatial pyramid matching (SPM) approach, which is based on approximate global geometric correspondence, disregards invariance to translation, scale and rotation of visual objects in images. A novel spatial matching method based on visual vocabulary shape description model is proposed. According to this method, spatial geometric model relative to the geometric center of each visual word is constructed to guarantee translation invariance; Log polar spatial pyramid matching is presented, log polar radius and polar angle are subdivided in proportion and a consistent orientation to visual word is assigned in order to achieve scaling and rotation invariance. Experiments have been conducted for comparing and evaluating the proposed method utilizing the Caltech-4dataset and our own dataset. Experimental results show that the proposed method improves the classification accuracy, especially for the dataset containing images with obvious translation, scaling and rotation changes, and is more robust because of its smaller variance.(4) On the basis of studying locality-constrained linear coding, discuss the pooling of local feature codes and introduce power normalization into local feature modeling. Experiments have been conducted for comparing and evaluating the proposed methods utilizing the Caltech-4dataset. Experimental results show that the introduced power normalization reduces the sparsity of local feature model vector, and realize that the sparsity of local feature model vector can significantly affect classification results.(5) Integrate the proposed methods related to local feature modeling. Experiments have been conducted for comparing and evaluating the integrated method utilizing the Caltech-101dataset and Pascal VOC2007dataset and using Matlab. Experimental results on Caltech-101dataset show that average classification accuracy (ACA) of the integrated method is higher than the other five methods. Experimental results on Caltech-101dataset show that average precision (AP) of the integrated method is similar to the best system in Pascal VOC2007challenge. Finally, we implement a visual object categorization system prototype based on C/S.
Keywords/Search Tags:local feature linear coding, spatial matching scheme, pooling, localfeature modeling, visual word, visual object categorization
PDF Full Text Request
Related items