Font Size: a A A

Visual Context Analysis Based On Local Features And Its Applications

Posted on:2012-03-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:W G ZhouFull Text:PDF
GTID:1118330335462514Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the fast development of computer and multimedia techniques, the popularity of digital devices and Web applications, last decade has witnessed the explosive growth of multimedia data available on the Web. How to manipulate the web-scale diverse data, especially image data, is a significant and also very a very challenging task. Visual context analysis, focusing on the intrinsic replationships among images and visual features, lends itself as effective means to address that issue. Recently, with the introduction of local visual features, many researchers from computer vision and multimedia community pay their attention to visual context analysis based on local features.Although some advances have been made in visual context analysis,there are many issues to be addressed, due to the semantic gap between low level visual features and high level semantic concepts. In this paper, according to different applications, we study different kinds of visual context, and apply them to image re-ranking, canonical image selection, large-scale partial-dupliate image search and license plate detection, respectively. Based on comprehensive research on visual context analysis, the contribution in this paper can be summarized as follows:Firstly, we propose a new latent visual context learning scheme to address irrelevance and redundancy, which are two common prombles to text-query based image search. In the latent visual context learning framework, on one hand, we explore the latent semantic topics among images and visual words. On the other hand, we construct visual link graphs for visual words and images, respectively. With graph analysis techniques, we discover the importance of visual words and images. Based on image importance, initial ranking of text-query based search results can also be fused for final image reranking. Further, we propose a weighted set coverage algorithm for canonical image selection.Secondly, we propose several conding schemes for geometric visual context representation, including spatial coding, ring coding, geometric square coding and gometric fan coding, which can be applied for fast geometric verification in large-scale partial-duplicate image retrieval. In traditional Bag-of-Visual-Words model, local matches from feature quantization usually contain geometricly inconsistent ones, which will cast negative impact on retrieval precision. In our work, we exploit the state-of-the-art SIFT features(Lowe,2004) for image representation, and propose several conding strategies to effectively describe the relative geometric positions of visual words. With the invariant virtue of SIFT features, our coding maps can achieve translation invariant, scale invariant, and/or rotation invariant. Based on the geometric context coding, we propose an efficient geometric verification scheme to discover those globally geometric inconsistent matches. Keeping only those geometric consistent matches, image similarity will be defined more accurately, which will benefit image retrieval accuracy. To further improve retrieval performance, some enhancements are proposed, including affine transformation estimation and query expansion.Thirdly, we propose a novel scheme of principal visual word discovery for automatic detection of license plate. To address the drawback of traditional edge-map based methods,we propose to generate principal visual words from the perspective of geometric visual context of local features. A set of principal visual words with rich geometric context are trained for each license plate character. Then, given a test image, based on the matching of local features to principal visual words, the location of potential license plate can be accurately estimated. The discovered principal visual words are both distinctive and descriptive. More importantly, they are related with some specific high level semetc context, i.e., plate characters, which is a significant contribution from the perspective of bridging―semantic gap‖.To summarize, in this paper, based on local visual fetures, we explore and mine the diverse visual context from novel and distinctive perspectives for several applications in multimedia processing. Comprehensive experiments on large-scale real datasets reveal the supreriority of the proposed alogorithms over state-of-the-art approaches with promising performance.
Keywords/Search Tags:local visual features, visual context, image re-ranking, canonical image selection, latent visual context learning, geometric verification, geometric coding, partial-duplicate image search, license plate detection
PDF Full Text Request
Related items