Font Size: a A A

Research On Image Classification And Annotation Based On Semantic Analysis And Fusion

Posted on:2016-07-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:X R WangFull Text:PDF
GTID:1228330467493264Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
It is a very hot topic in the multi-media research on how to organize, index and manage the huge amount of images in the Internet. The image classification and annotation technologies are the key to address this problem. The research on this area has broad practical usages and significant academic and industrial value. Although the image classification and annotation technologies have achieved some momentum on a few specific data sets, the evaluating results and some real practices have showed that the performance of these technologies will drop significantly when handling the huge volume of Internet data. They are still below the requirements of the broad adoption by the industry. There are several critical problems in the area of image classification and annotation which worth further in-depth research. The most critical one is the semantic gap issue on image understanding; how to design the classification and annotation system for the big amount of data on Internet; how to design personalized and intelligent query system for end users. This dissertation focuses on these critical problems for image classification and annotation research based on semantic fusion. The major contribution and innovations in this dissertation are listed as below:(1) The existing image segmentation algorithms are lack of capabilities for general usage and the semantics of segmentation result may not complete. To address these issues, a new cluster ensemble-based image segmentation algorithm is proposed and evaluated.A new cluster ensemble-based image segmentation algorithm (CE-IS), which leverages the cluster ensemble technology to fuse the intermediate segment results based on different features, is proposed and evaluated. This algorithm takes the advantages of several features for segmentation and achieves more stable performance on different categories of images. In each sub-segmentation, the PageRank algorithm is enhanced to make use of semantic similarity and spatial relationship among regions. Comparing with traditional segmentation methods based on visual similarity, this algorithm could effectively reduce the impact of the semantic gap issue and get the semantic objects accurately from the background. On Weizmann data set for image segmentation, comparing with Gpb, Mean Shift and N-cut algorithms, the CE-IS method could improve the average F-measure by15%,31%and37%and the average number of fragments is reduced by32%,51%and44%respectively. On BSDS500data set, the CE-IS algorithm increases the average F-measure by8%over the Gpb algorithm and reduces the average number of fragments by26%. On Weizmann horse data set and MSRC mixed data set, comparing with Spatial-LTM algorithm, the CE-IS algorithm increases the average F-measure by29%and19%respectively. The experiment results show that this algorithm could also deliver better performance on broad categories of images and achieve better segmentation quality in general.(2) A new algorithm for object detection based on exemplar object semantic expression is proposed to increase the integration capability for objects in the same class, and improve the distinguishing capability for objects in different classes. It improves the effectiveness of the object class modeling and also reduces the computing complexity during the modeling process.The new object detection algorithm (EOD) is proposed based on semantic expression of exemplar objects. During the object class modeling process, this algorithm introduces the multi-feature tree to generate the exemplar objects as the leaf nodes. Then it uses every exemplar object as the coordinate origin to build the "exemplar-distance" feature space. In this new feature space, the linear SVM (LSVM) is used to train the similarity classifier for each exemplar object. In the EOD algorithm, this classifier is regarded as a kind of weak semantic expression. The combination of all weak semantic expressions can reflect the diversity of features in the same class and the differentiation of objects in different classes. So, it is the semantic expression of the object classes. During the detection process, in order to avoid the full scan on each image, this algorithm uses the detect windows generated from the cluster ensemble-based segmentation algorithm mentioned above. Since this segmentation method fully leverages the structure information of images, the detect windows generated are not related with any specific classes. So, there is no need to generate the detect window for each class and it dramatically reduces the number of detect windows. In addition, the EOD method introduces a bottom-up segment merging strategy and uses the intermediate results during the merging as the multi-scale detect windows to improve the coverage of target objects. Comparing with DPM and SS-fast algorithms, the EOD algorithm improves the MABO measure by4%and7%respectively in terms of the quality of generated object locations. And the number of detect windows generated by EOD algorithm is only about0.06%and56%of the numbers from DPM and SS-fast algorithms. On the performance of object detection, the EOD algorithm improves the average precision by42%and18%respectively comparing with DPM and SS-fast algorithms. From the experiment results, it demonstrates that this object detection algorithm not only dramatically reduces the number of detect windows, but also significantly increases the accuracy of object detection.(3) To address the issue that existing image classification algorithms can’t effectively retrieve and express the scene semantics of images, a new image classification algorithm based on object bank semantic expression is proposed and evaluated.In this dissertation, a new image classification algorithm based on object bank semantic expression (EODB-N-gram) is proposed. The key idea of this algorithm is to retrieve the scene semantics of images from the EODB object bank. And the classifier is trained based on the image scene semantics. The EODB object bank consists of the object detectors and N-gram. During the image semantic modeling, this algorithm uses the object detectors to retrieve the visual content of images. And then, based on these video contents, N-gram is built according to the visual co-existence relationship among objects and visual grammar rules. The image scene semantics are retrieved from the modeling of spatial layout relationship between N-gram and objects in images. The classifier is trained based on the scene semantics and the semantic-based scene classification is achieved. On Scene-15data set, the EODB-N-gram algorithm increases the average accuracy measure by16%,10%and9%respectively comparing with KSPM, OB, and WSR-EC algorithms. On MIT Indoor data set, the EODB-N-gram algorithm improves the average accuracy measure by36%and32%respectively comparing with OB and WSR-EC algorithms. On Caltech-256data set, comparing with OB and WSR-EC, the EODB-N-gram algorithm has41%and31%improvement in terms of average accuracy. The experiment results show that the EODB-N-gram algorithm effectively leverages the scene semantics of images and significantly improves the classification quality. It effectively reduces the impact of semantic gap issue in existing algorithms.(4) To address the problems of dynamically updating the training set for image annotation and the modeling of high-level semantics, a new high-level semantic annotation algorithm based on Internet hot topics is proposed and evaluated.A new high-level image semantic annotation algorithm based on hot Internet topics (HLIA) is proposed in this dissertation. This algorithm includes two independent sub-tasks, the dynamic update of the training set based on hot Internet topics and the image annotation sub-task based on search algorithm. In the sub-task of dynamic updating the training set, this algorithm builds the model for the abstract semantics of images by leveraging the similarity among images, the coexistence relationship of topics and the mapping relationship between images and topics. Through the complex graph clustering, the hot topics, which represent the high-level semantics of images, are retrieved and the related images are integrated into the original training set. In the annotation sub-task, for a query image, we search for the similar candidates from the training set according to the visual features first. Then the un-related candidates are filtered out via the hyper-graph and spectrum clustering algorithms. In the last, the annotation words are built from the remaining candidates. On NUS-WIDE data set, the HLIA algorithm has25%and58%improvement on average accuracy over SBIA and LTA annotation algorithms. On the data set of20groups of food security hot issues, the HLIA algorithm has22%and52%improvement on average accuracy over SBIA and LTA respectively. The results of the experiments show the dynamic update mechanism can refresh the semantic coverage of the training set in real-time. It is very critical to build the synchronization between the semantics of the training set and the hot Internet topics, which can greatly improve the annotation quality for a huge volume of data set. Also, in the annotation experiment on a mimic environment of Internet, when the ratio of training set and testing set increases from1:0.6to1:5.5and1:55, the average annotation accuracy of the HLIA algorithm is about60.2%,56.8%and36%respectively. This result shows that the HLIA annotation algorithm could deliver more stable annotation performance on Internet environment.(5) Research on the switch mechanism between text semantics and image semantics, and an image auto-generation system based on text semantics is proposed, designed and implemented.In this dissertation, an image auto-generation system based on text semantics, Text to Image (TTI) system, is proposed, designed and implemented. It consists of three key components:the understanding of text semantics, the switch between text semantics and image semantics, and the spatial layout of the image semantics. Given a text article, TTI could generate an image template based on its text semantics. The template has the constraints on targeting visual objects and the spatial layout of these visual objects for this text semantics. Then TTI finds the matching images in the data set according to the template. In fact, TTI template is a summary of visual characteristics for images, which have similar text semantics. And it also represents the visual contents and scene semantics of this image set. In the experiments, the templates generated from TTI system are used for image classification and annotation. In the image annotation experiment, the annotation method based on TTI template matching has37%improvement on average accuracy comparing with Synset algorithm. In the image classification experiment, our method based on TTI template matching has about15%and2%improvement on average accuracy over OB and SaOC algorithms. In addition, from the training complexity perspective, the image classification and annotation based on TTI template matching is much easier to implement since it does not need the training set. It overcomes the dependance on the quality and quantity of training set in traditional classification and annotation methods. And the results demonstrate that TTI system could accurately express the text semantics in the image, through the constraints on visual contents and scene semantics. It is a brand-new method to address the semantic gap problem by fusing the semantic spaces of text and image.
Keywords/Search Tags:semantic analysis, semantic fuse, clustering ensemble, visual grammar, image expression, hot topics
PDF Full Text Request
Related items