Font Size: a A A

Research On Hierarchical Indexing Technique For Large Image Database

Posted on:2007-10-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:L HeFull Text:PDF
GTID:1118360215470489Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
It is crucial to construct a proper indexing scheme to improve the performance of content-based retrieval of a large image database. Traditional indexing methods always fail in dealing with high-dimensional data because of the so-called"curse of dimensionality". To solve this problem, aiming at the application in content-based retrieval of large image databases, surrounding the"high dimensionality"of image feature, and begging with the subspace of the high-dimensional space, the thesis studies the indexing of high- dimensional data.Based on relative researches, the thesis clarifies the main issues lying in high- dimensional indexing at first, then it puts forward a framework for the construction of the indexing scheme. Aiming at the similarity measurement, clustering and dimension reduction in the framework, the thesis makes specific researches respectively, and proposes a high-dimensional indexing scheme for content-based retrieval of a large image database. The main contributions of this thesis are as follows:Propose a similarity measurement method based on subspace for high-dimensional data. Traditional similarity measurements always measure the similarity in the whole space of data sets. Under the situation of high-dimensionality, the noisy attributes will make significant influence on the measurement results if still using those methods to compute the similarity. To solve this problem, the thesis proposes a new measurement method based on subspace measuring. Using our new method, when computing the similarity between two individuals, we can avoid the influence of noisy data on the measurement as much as possible to obtain more exact results.Put forward a density-based subspace clustering algorithm. It is hard to cluster high-dimensional data using traditional clustering algorithm because of the sparsity of data. Aiming at this problem, the thesis puts forward the concept of"dimension maximal subspace clustering". Based on this concept, it proposes a density-based subspace clustering algorithm. With the combination of density-based clustering and subspace clustering, the algorithm not only makes full use of the merits of density-based clustering to discover clusters with arbitary shapes, but also obtains the proper tradeoff between the size of clusters and the dimensionality of the subspace which the clusters belong to. The results from this clustering algorithm can provide support more perfectly for the index construction.Propose an indexing structure and a corresponding similarity search algorithm based on subspace clustering. The density-based subspace clustering can discover clusters with arbitary shapes, so it can't represent a cluster clearly to use a center of the cluster only. Aiming at this problem, based on the idea of representative points, the thesis uses multiple representative points to represent a cluster, and puts forward the corresponding method to choose the proper representatives. What's more, it is important to match the sample with the clusters during the similarity search, so the thesis devices the matching method between them. Additionally, to overcome the influence coming from the different subspace that different clusters belong to, the thesis puts forward a coefficient to regulate the distance computation between a sample and a cluster, thus can improve the performance of similarity search much more.Develop a dimension reduction indexing scheme. Traditional dimension reduction methods always implement dimension reduction on the whole data set and obtain a uniform result for all of the data individuals. Considering the indexing in content-based image retrieval, those dimension reduction methods will inevitably affect the performance of similarity search because of too much information loss. Aiming at this problem, the thesis generalizes the concept of intrinsic dimension of data sets to that of single data individuals, and develops a dimension reduction indexing scheme for the 72-dimensional HSV color feature. Compared with subspace clustering, dimension reduction aims at reducing the computation time to construct the high-dimensional indexing.In a word, the thesis makes researches on indexing algorithms in content-based image retrieval. Since the feature vector of an image is always high-dimensional, and the indexing construction in CBIR is faced with the influence from the"curse of dimensionality", it is hard to measure the similarity between high-dimensional feature vectors and to construct an indexing structure in the whole feature space. This thesis makes researches on similarity measurement and clustering of high-dimensional data respectively at an angle of the subspace, and studies dimension reduction from the point of the intrinsic dimension of data individuals. These researches are useful for the overcoming of the"curse of dimensionality", and can provide feasible schemes to construct a high-dimensional indexing structure to support the content-based retrieval in large image database. Those investigations are valuable for relative researches in both theory and practice.
Keywords/Search Tags:Content-based image retrieval, Indexing, Subspace, Curse of dimensionality, Similarity measurement, Clustering, Dimension reduction
PDF Full Text Request
Related items