Font Size: a A A

Research On Image Semantic Annotation And Caption Based On Deep Learning

Posted on:2018-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z ZhengFull Text:PDF
GTID:2348330518956591Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information science and technology,a variety of media data is rapidly increasing,which mainly benefited by the popularity of digital devices and the development of storage technology.In the face of a large number of unlabeled data,such as text,audio,images and video and so on,how to manage these unlabeled data,and making it possible to use,which seems to be a problem to be solved.The current image semantic annotation technology can effectively label the image,which not only can help people manage a large number of unlabeled images,but also allow the machine to understand the image more intelligently,so the image semantic annotation is a significant research work.Image understanding technology,its core technology is based on the analysis of image processing,combined with computer vision and natural language processing and other related theories,to analyze and understand the image content,to feedback to humans in form of text semantic information.Therefore,achieving image understanding technology needs not only image semantic annotation,but also image description.Image annotation takes image as the object,semantic information as the carrier,to study the objects of image,connection between objects.Further,the natural language processing techniques are used to analyze and generate annotations and combine them into natural language descriptive statements.This work is called image description.In recent years,the image description has been a great interest in the research community,they have a broad application prospects as image marking work.The main line of the thesis is the semantic annotation of the image,and the image in the multimedia data is taken as the object of study,image description is taken as application extension of study.According to the feature extraction representation-semantic mapping model construction-analysis and understanding of semantic research ideas,focusing on image labeling in the target recognition and semantic analysis,including feature learning,multi-tag classification,semantic relevance analysis and word sentence sequence generation and other technologies.Based on the above research,there are the main works of this paper:1.In order to reduce the semantic gap between different modal data,an image multi-label mixing architecture CNN-ECC based on Deep Convolutional Neural Network(CNN)and integrated classifier chain(ECC)is proposed.The model framework is mainly composed of two types of generated feature learning and discriminant semantic learning.The first step is to use the improved convolution neural network to learn the advanced visual features of image multi-instance fusion.The second step is based on the acquired classifier chain to train the integrated classifier using the semantic label set of the image.The integrated classifier chain can not only learn the semantic information contained in the visual feature,but also can fully exploit the relationship between the semantic labels,and make association stronger between generated labels,thus avoiding redundant tags.Finally,the training model is used to mark the unknown image automatically.2.In order to compose the annotated words generated by the image into the statement of natural language,a method based on Convolutional Neural Network(CNN)and Double Long-Short Term Memory cell(DLSTM)are proposed,which is called as CNN-DLSTM image caption system.The model framework consists of two parts:visual model and language model.First,the visual model is used to learn the concept of image visual content,generating image key semantic words.Second,the language model based on the artificial description sequence to learn lexical and grammar,combined with visual concepts and the corresponding grammar to generate corresponding language description,complete the image description task.In order to make the model-generated statement more humanized,CNN-DLSTM finally added a confidence model used to evaluate the quality of description to generate a selective output score that is higher in image description statement.The content of the image is not only complex and abstract,but also is ambiguity and ambiguity and so on in the semantic concept.Therefore,this paper makes an improvement on the key tasks such as feature learning and semantic learning of image annotation,which can mark the image automatically and improve the image annotation and description performance.
Keywords/Search Tags:image annotation, convolutional neural networks, semantic learning, recurrent neural networks, double long-short term memory, image caption
PDF Full Text Request
Related items