Font Size: a A A

On Image Retrieval Based On Unsupervised Component Analysis

Posted on:2008-02-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:G X SunFull Text:PDF
GTID:1118360242473646Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With expanding computing power and electronic storage capacity, the amount of digital content in the form of images and video has been increasing exponentially in recent years. In particular, the World Wide Web has seen an increased use of digital images and video which form the base of entertainment, educational and commeral applications. The efficient management of the rapidly expanding visual information became an urgent problem. It was widely recognized that it needs a more effective and intuitive way to represent and retrieve visual information based on properties that are inherent in the images themselves. This need formed the driving force behind the emergence of Content-Based Image Retrieval (CBIR) techniques.Image retrieval means searching in digital image data. Every image data set is different and offers individual qualities. CBIR uses the visual contents of an image such as color, texture, shape and spatial layout to represent and index the image, and the visual contents that are inherent in the images themselves. The technique can automatically extract the visual contents of the images in the database and describe by multi-dimensional feature vectors. The feature vectors of the images in the database form a feature database. The system changes the query example images or sketched figures provided by users into its internal representation of feature vectors, calculates the similarities between the feature vectors of the query example and those of the images in the database, and performs retrieval with the aid of an indexing scheme. The indexing scheme should provide an efficient way to search for the image database. Two major research areas of database and computer vision in a concerted efforts, Content-based image retrieval has become a very active research field since the 1990s.Independent component analysis (ICA) is a new signal processing and data analysis method, or a general purpose statistical and computational technique for decomposing a complex dataset into independent sub-parts. It is a comprehensive subject, refers to statistics, signal processing, neural networks, information theory, artificial intelligence, engineering and such. In the ICA model, the data variables are assumed to be linear or nonlinear mixtures of some unknown latent variables, and the mixing system is also unknown. The latent variables are assumed nongaussian and mutually independent, and they are called the independent components, sources or factors of the observed data. The goal of ICA is to recover independent sources given only sensor observed data.Both ICA and Principal Component Analysis (PCA) generally represent a multivariate data set by a linear coordinate system and they are all belong to unsupervised, adaptive component analysis method. Unlike PCA, ICA not only decorrelates the signals (2nd-order statistics) but also reduces higher-order statistical dependencies, attempting to make the signals as independent as possible. In other words, ICA is a way of finding a linear non-orthogonal coordinate system in any multivariate data and it can be seen as an extension of PCA. Now the activities around ICA have grown fast and vigorously the last few years. ICA has been applied to many different applications and research areas, range from speech signal processing, biomedical and medical signal processing, image de-noising, image retrieval, telecommunications and stock predictions etc.A feature is defined as a descriptive parameter that is extracted from an image. Features may be used to interpret image content, or as a measure for similarity in image database. Feature extraction is one of the elementary problems in the area of image retrieval. It is the key to the retrieval problems based not only on domain specific visual contents, general visual contents but also on semantic contents. Content based image retrieval is a highly challenging aspect of image analysis. The task is hard because of the limited understanding of the relations between content descriptions and basic image features extracted. A good visual content descriptive feature should be invariant to the accidental variance introduced by the imaging process (such as the variation of the illuminant of the scene). It should be point out that there are no "best features" for all image domain. It is a matter of creating a good "solution" using multipul features for a specific application. Recently, researches have shown that increased storage capacity could be offered by utilizing statistical characteristics of image in analysis of digital content. Low level visual features could be directly extracted from images with unsupervised component analysis, ICA based on high order statistical dependencies between variables can extract the local features, direction features, bandwidth features and improve image classification.In this thesis, after a brief introduction to the development history, the current research status and practical applications of CBIR and ICA, lots of exploratory research work has been done around some key techniques of CBIR, which include low level feature extraction, similarity measure, relevance feedback and so on. The presented study is the current research focus of image processing and information retrieval. Thus, the research has both the theory and the application value. The fundamental theory of ICA is introduced, including the mathematical definition of ICA, the assumptions made about ICA problems and the mathematical theory and methods commonly used in ICA. Then, some algorithms and applications of ICA were investigated deeply. ICA which can compute prevailing attributes of a data set is used as a major mathematical tool in the dissertation. ICA is applied to transform data spaces to offer increased storage capacity. Some kinds of visual features based on ICA are extracted that include domain specific visual features, general visual features and semantic features with watermarking annotation. Some proper feature fusion is used in image retrieval, and then several image retrieval schemes based on ICA are proposed. The main contributions of this thesis are composed of the following parts:1. Face data scanning of a face image constitutes a high dimensional signal space. However the intrinsic dimensionality of the face space is much smaller, this despite the variations in expression, pose and lighting etc.. The faces are believed to be clustered on some low dimensional manifolds. In this thesis, a novel kind of content feature is proposed to be extracted by independent component analysis (ICA), and used for face image retrieval. So a novel content-based face image retrieval scheme is created based on this content feature-principle independent content feature (PICF). In this scheme, PICF based on higher order statistics is firstly extracted in the reduced space for effective representation of face images. The PICF with locally salient feature information is closer to edgelets or fiduciary spatial details of images. The PICF method is more suitable for the case that training images have some tilting and rotation, besides frontal position, and the simulations verify its feasibility.2. A PICF-based similarity measurement is proposed in the face image retrieval stage to ensure the efficiency of computation and the accuracy of retrieval. An enhanced retrieval performance is achieved using the PICF-based similarity measurement after two stages retrieving. Experimental results show that the PICF scheme has better performance than the popular PCA method or ICA method with the usual measurement.3. To ensure the efficiency of computation and the accuracy of retrieval, a algorithm is used in the PICF-based face image retrieval stage which can eliminate the ambiguity in the order of independent features and select those independent features based on their discriminatory power from a combination pool of PCA and ICA features. The simulation proves the feasibility of the proposed method and the average precision can reach 95.14% per thousand experiments.4. A new image retrieving method combining color with texture features of partition in a classification image and using principal component analysis (PCA) and independent component analysis (ICA) is proposed in the thesis. Experiments have shown that the proposed features provided unsupervised grouping of data are well aligned with human perception, amenable for color content descriptions and are invariant to the accidental variance introduced by the imaging process. By comparative experiments, the results show that HSV color space is the most effective of the three color spaces. In the classification image subspace, color features based on the proposed approach is invariant to scale and rotation, and it is a good compensation for the directionality texture feature. The method can improve the efficiency of image retrieval based on their hybrid features.5. The goals of medical information systems have been defined to deliver the needed information at the right time, the right place to the right persons in order to improve the quality and efficiency of care processes. Some valuable structured information such as age, sex and profession of the patient can be stored with the actual images and good annotated atlases of medical images do exist that contain objective knowledge. So the combination of text with visual content features has the most practical value in medical image retrieval. A novel method of combining watermarking annotation with independent content feature (ICF) for medical image retrieval is proposed. The ICFs for representing medical images employ locally content representation information from training images, operate in a reduced dimensionality space and that are extracted by independent component analysis (ICA). The digital watermarking is the practice of imperceptibly altering to embed patients' structured information about the data for medical image retrieval and it can ensure the integrity of the medical image and safeguard the privacy right of the digital cases with patients. The simulation is performed by using the x-ray stomatic imaging and experimental results show that the scheme has good retrieval performance, and watermarking algorithm used in the scheme has robustness property to the JPEG compression.The research history of image retrieval is no more than several decades, and ICA began to be applied to this field just a few years ago. Although it can provide new solutions and novel ideas for the development of image retrieval, there are still many issues which are worthwhile to be discussed and researched. In this dissertation, ICA is used as a major mathematical tool. Image retrieval is combined statistical analysis of image pixels with feature-based analysis and extraction to get some kinds of visual content features based on ICA for image retrieval. The domain specific visual contents features based on ICA are extracted for face image retrieval successfully. In the classification image subspace, combining the general color content features based on unsupervised component analysis with texture feature fusion for color image retrieval can achieve an enhanced retrieval performance. The contents features combining with watermarking annotation semantic contents based on ICA are extracted for medical image retrieval. This method can ensure the integrity of the medical image and safeguard the privacy right of the digital cases with patients under retrieval accuracy. At the last part of this thesis, the dissertation summarizes the to-be-resolved problems in image retrieval research and the emphases of further research.
Keywords/Search Tags:Image Retrieval, Feature Extraction, Color, Texture, Digital Watermarking, X-Ray Imaging, robustness, Principal Component Analysis, Independent Component Analysis, Similarity Measurement, Average Precision, Average Recall
PDF Full Text Request
Related items