Font Size: a A A

Research On Algorithms To Reassign Labels To Regions

Posted on:2012-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z TengFull Text:PDF
GTID:2178330335497431Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today, with the boom of the new Internet, searchable image data exists with extremely diverse visual and semantic content, spanning geographically disparate locations, and is rapidly growing in size. In face of such massive image databases, requirements of accurate and efficient image retrieval become more and more practical and important. Efficient technology of image retrieval can greatly help people obtain entertainment on Internet and enhance the quality of lives.Nowadays, textual retrieval performs well. However, various technologies of image retrieval developed by some famous corporations such as Google, Baidu and Flickr are still far from satisfying the requirements of common users for their tremendous disadvantages. For most current researchers, they mainly focus on retrieving image data by means of visual semantic features contained in the images. First of all, we extract sorts of low-level visual features from the images, for example, color, texture, shape and key points. Then, we analyze these features comprehensively by means of image similarity evaluation or pattern recognition and machine learning and try to obtain the high-level semantic information from the images. Finally we show relevant retrieval results integrating with some other algorithms.The study of content-based image retrieval has developed for decades and many different kinds of methods have been proposed, though the performance still does not seem well. The main reason of the unacceptable performance on one hand is the existence of the sensory gap. The visual features extracted from the images can still not express the contents of the images sufficiently. On the other hand, the main reason lies in the semantic gap. We do not yet have a universally acceptable algorithmic means in the context of interpreting images. Hence, it is not surprising to see continued effort in this direction, either how to better mine and construct the visual semantic information contained in the images or how to make use of the relationships between the features more reasonably.In this paper, we mainly pay attention to the study of algorithms to reassign labels to regions, by which the ultimate purpose is still to promote the accuracy and efficiency of content-based image retrieval. We propose an EM-based unsupervised algorithm which can automatically reassign the manually annotated labels from the image-level to their corresponding local semantic regions. First, we extract SIFT feature points by dense sampling from all the images in the dataset and introduce the model of "Bag of Words" to create the image visual codebook by K-means clustering of all the extracted SIFT feature points. Then an EM iterative process is constructed to evaluate the likelihood of each existed word for each label in the images. Finally, we choose the image visual words with the highest confidence to derive the most probable regions for labels in each image. Experiments on MSRC dataset demonstrate the encouraging performance of our proposed algorithm in solving the problems of unsupervised automatic annotation, appearance diversity and multi labels on the premise of sufficient samples. The next work will be focus on the improvement of the variety of visual features and the way of constructing various features. At the end there are conclusion and expectation about the future of content-based image retrieval.
Keywords/Search Tags:Image Process, Content-Based Image Retrieval, SIFT, Bag of Words, Expectation Maximization
PDF Full Text Request
Related items