With the prevalence of digital imaging and storage equipment, there are more and more images available on the Internet. Compared with text information, visual images are more vivid and easy to understand. These digital images have been widely used in the business, education, science and technology. Thus, how to design efficient and effective image retrieval technologies have been an important research direction for academic. A key solution to this problem is automatic image annotation technology.But most of automatic image annotation approaches are studied in limited circumstances, e.g. only designed for the collection of small-scale artificial image databases, without considering the real-world image annotation problem. This causes that when existing image annotation methods are applied in practical application, they has encountered many problems, such as low image annotation performance, bad user feeling for image annotation and cannot handle a large number of semantic concepts, etc. Therefore, researching on the extension of current methods to real-world situation and researching on new real-world methods to solve problems of existing methods are very important.Additionally, we design an image retrieval demo system based on the proposed image annotation approaches. We also research on some other key problems of image retrieval, such as image representation and image ranking. The main contributions of this dissertation are as follows:1. Proposed a large scale distance metric learning algorithm based automatic image annotation method. First, we proposed a discriminative distance metric learning (DDML) algorithm which can improve the KNN-based image annotation methods. Then, an aggregated distance metric learning method (ADML) is proposed, which can train DDML in a parallel way or an online way. Thus, ADML can handle large scale problems. The experimental results show that the proposed method can improve both effectiveness and efficiency of image annotations.2. Proposed a large scale support vector machine algorithm (ASVM) to automatically annotate images. Instead of learning from the entire data, our method divides the training set into subsets. A series of sub-models can then be learned from subsets of training samples by SVM, followed up by a simple global aggregation. ASVM can largely improve the scalability of original SVM solvers. And millions of data can be trained in a short time by ASVM.3. Proposed a bipartite graph reinforcement model (BGRM) for web image annotations. How to utilize this information to help tagging images is the key of web image annotations. The proposed model extracts surrounding text and other textual information of images as candidate annotations. They are then extended to include more potentially relevant annotations by searching and mining a large-scale image database. All candidates are modeled as a bipartite graph. Then a reinforcement algorithm is performed on the bipartite graph to re-rank the candidates. Only those with the highest ranking scores are reserved as the final annotations. The experimental results show BGRM can largely improve the annotation performance.4. Proposed a real-world image annotation approach based on statistical model (SRIA) for real-world image annotations. SRIA can leverage large scale training data set to annotate both personal and web images in a unified framework efficiently. The experimental results show SRIA not only improves the annotation performance but also speed up the annotation process.5. Proposed a cross language image annotation framework. The proposed framework can utilized the large scale multilingual web image data as training set, and provide multilingual annotations according to the mother languages of users. By using the idea of "two languages are more informative than one", we proposed a multilingual annotation fusion algorithm (MAF) for candidate annotation ranking and translations. The experimental results show the good performance of the framework.6. Proposed an optimization-based image annotation refinement algorithm (OptTag). Based on the proposed algorithm, we provide a unified image annotation framework. OptTag perform non-parametric image annotation refinement based on 0-1 integer optimization model using the prior and joint local probabilities. It can be efficiently solved by semi-definite optimization problem. Additionally, it can directly determine final tags while many previous approaches just use predefined thresholds for deciding unrelated words. The experiments demonstrate the effectiveness of OptTag.7. Proposed a spatial visual topic model based image representation, and an image static ranking called SocialRank for revealing the importance and quality of images. By incorporating with proposed image annotation method, a real-time image retrieval demo system is established based on a large scale image database.In a word, research on real-world automatic image annotations helps to understand the deep relation between images and concepts, benefits achieving the unified representative model of visual information, is of great significance not only to research on multimedia, but also to large scale learning theory. |