Font Size: a A A

Research On Feature Representation And Index Method In Image Retrieval

Posted on:2012-10-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y D CaoFull Text:PDF
GTID:1118330371460292Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Many multimedia resources including image have scattered around network nodes with the development of Internet and digital technology. So it is of importance how to organize and retrieve multimedia data effectively. The research on representation of image feature and high dimensional data indexing method is presented in this thesis.Image retrieval comprises two types:text-based retrieval and content-based retrieval. Early text-based image retrieval system searches similar images in Internet using key words, which are input by users. The text-based method requires that the images must be labeled with text information beforehand. The work of labeling image is complex and the accuracy of label affects the effect of image retrieval. Subsequently the content-based image retrieval (CBIR) reveals its importance, which indexes and retrieves directly images with extracted visual features.An important and detailed research is given firstly on visual feature representation of image in this thesis, which includes global feature and local feature. The local feature can capture tiny difference in images and has robustness to attack of clutter and deformation. Among these local features, SIFT (Scale Invariant Feature Transform) and MSER (Maximally Stable Extremal Regions) have been widely used in image retrieval task. The detailed analysis and study on detectors and descriptors of SIFT and MSER are presented in this thesis; then, the fusional feature representation on image is designed based on SIFT, MSER and moment invariants. The fusional feature is more discriminating and robust than single local feature because it merges three local features together perfectly. The two-level matching strategy from coarseness to fineness is designed for the fusional feature, which is effective and efficient.Another focus of the research is index of image data. Generally, the image feature is a high dimensional vector comprising tens and hundreds of elements. A good indexing high dimensional data method can improve the speed of retrieval. However, the performance of traditional indexing method (say, R-tree) goes very bad on processing high dimension data because of the affect of "curse of dimensionality" problem. Evenly, it goes worse than linear search (also called sequence search or exhausted search). LSH (locality sensitive hashing algorithm) is quite popular in high dimensional data indexing method because of its perfect performance. LSH scheme indexes image data from a different perspective. The data is projected on a special feature space. Then, the k hash functions are selected randomly from a family of hash functions, and the two or more data points will be viewed as the near neighbors and be put into the one bucket because of the same projection values on k hash functions. A query point can also be hashed into buckets with the same method and its near neighbors can be acquired through linear scanning the bucket containing query point with a constant probability. The design of hash function depends on the similarity measure between images and the four type hash functions are introduced in Chapter 2.Two methods of constructing LSH function is proposed with weakly supervised learning technology of machine learning, one of which selects better hash functions using some similar sample pairs based on the Euclidean LSH of M. Datar, and the other of which generates directly hash functions using some similar sample pairs. The later method can eliminate stochastic character of hash function.Another method of constructing LSH function is presented based on image data distribution information, which selects the axis of projection without using labeled sample and these axes of projection were orthogonal. According to experiments, the presented method deduced space complexity effectively.The proposed three methods of improving LSH are practical. The labeled information is not accuracy and specific although many images can be downloaded on the Internet. Often, there is not the label information with the uploaded images and it is expensive or unreal to label a large of images. The unlabeled data is used directly or a small quantity of labeled data and a large quantity of unlabeled data are used together in the data-dependent or weakly supervised LSH methods. Image retrieval in a large database demands more storage space but memory capacity is limit. The image retrieval will be accelerated if the process of indexing data and searching are only performed in memory.The experimental results show that the improved LSH schemes, whose performance is access to that of linear search algorithms, deduced effectively the usage of memory.
Keywords/Search Tags:Image retrieval, high dimensionality dada index, LSH, similarity measure, fusional feature
PDF Full Text Request
Related items