Font Size: a A A

Research Of Large-Scale Web Image Annotation And Interpretation

Posted on:2011-05-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:D Y XiaFull Text:PDF
GTID:1118330332978545Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one of practical and effective ways for large-scale web image retrieval, automatic web image annotation and understanding have been hot topics both in academic and industrial research areas. This dissertation mainly focuses on research issues such as mining of relevance relationship between visual features and surrounding text, image interpretation, large-scale data clustering and deep learning of image features.In order to resolve above mentioned issues, this dissertation proposes a data-driven automatic web image annotation and understanding framework (Automatic Web Image Annotation and Interpretation, AWIAI). For the sake of annotating images with suitable words, AWIAI first calculates the visibility of words in surrounding text to build the "image-word" matrix, then extends the initial annotation result by latent visual and semantic analysis, and the final annotated words are obtained by unsupervised learning of visual correlation and co-occurrence of annotation words.The current approaches of image annotation only utilizes several discrete words to describe the image semantics since those approaches neglect the statement-level syntactic correlation among the annotated words. As a result, those approaches are inability to render natural language interpretation for images such as "pandas eat bamboo". To solve this problem, "Image Interpretation" is proposed in this dissertation. The basic idea of image interpretation is to discover the statement-level syntactic correlation among annotated words, and produce interpretation results by natural language.AWIAI framework is a data-driven pipeline for image processing, which often encounters the problem of large-scale data clustering. This dissertation presents two kinds of clustering approaches for large-scale data with a dense similarity matrix. Partition Affinity Propagation (PAP) passes messages in the subsets of data first and then merges all of data together. PAP can effectively reduce the number of iterations of clustering. Landmark Affinity Propagation (LAP) passes messages between the landmark data first and then clusters other data. LAP is a large global approximation method to speed up clustering.Recent advancements in neuroscience have indicated that our human being brain perceives the outside world with a hierarchical learning process. Motivated by such research, a model-based and data-driven hybrid architecture (DMD) is proposed in AWIAI to boost image annotations by learning out discriminant features. DMD first selects a deep learning pipeline to progressively learn visual features from simple to complex. Then DMD integrates deep model-based learning and data-driven learning pipelines together. After the discriminant image representations are obtained by a sparse regularization from both pipelines in an unsupervised way, a supervised learning algorithm is conducted to predict image objects in images.
Keywords/Search Tags:Automatic Image Annotation, Image Interpretation, Word Visibility, Data Clustering, Deep Learning, Data-Driven
PDF Full Text Request
Related items