Font Size: a A A

Generate Text Description From Content-Based Annotated Image

Posted on:2013-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhuFull Text:PDF
GTID:2248330374482552Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the digital cameras and the Internet, with the improvement of mobile phones and tablet computer, etc., more and more images, video and other hypertext media information are pouring into people’s lives and continuous spreading. Facing such a large-scale image data, how to organization and retrieval them, is a important problem in both academia and industry.This paper studies on using descriptive words or long text of natural languages for describing static images. This is a strategy for solving the semantic-gap problem in image understanding and image retrieval, the ultimate goal is that giving an image, we can use keywords, sentences or even text fragments for description.The problem of translating the complex characteristics of the image representation into simple language description has a long history, and there is still much room for development. In this article, use the graph-cut method to split the static landscape images and mark the keyword has been the basis for work, while using some sentences or even text to describe the sophisticated semantic information of images is a meaningful work. It can further improve the richness of image annotation. It will be better than simply keywords or tags labeling images, because sentences and text contain more abundant information. Therefore, this paper proposes a simple method to generate sentences automatically to describe what are the relative-positions between two objects in a landscape image. The sentences are concise, just including the main ingredients:subject, predicate, and object. The predicate is preposition or prepositional phrase describing the relative position.The main contributions of this paper include:First we proposed a statistical generative model to Annotation-based sentences generating for image content——that is, generate sentences from an annotated picture. The images are segmented into regions (using graph-based image cut) and that features are computed over each of these regions. Given a training set of images with annotations, using filtered sentence set, we parse the image getting position information, then, we use Machine Learning to get the probabilities of combinations between labels and prepositions, obtain the data to text set. We use a standard semantic representation to express the image message. Finally generate sentence from the xml report using the statement template. Second, we improved the above model. By optimizing the middle of the bridge and add a retrieval part, the positioning method got improved, and statements generating part got improved through the introduction of the concepts of main area, minor are. We also realized treating sub-graphs and the characteristic parameters they carry as the implied conditions, and more specific understanding of image content.What can be seen from the experiments are that our method attract parameters from sub-graphs with notations not from one original picture, and it can make sound effects on Landscape pictures while the majority of text-generation work focus on action detection. Then, the paper provides an improvement of this method, makes the sentences actually generate from the original image and treat the segmented result sub-images as implicit information.In general, we obtained the preliminary results. The next step we could take is to form and optimize complex and long sentences to achieve the real "figure-composition".Migrating this method to other identification of images in future work is another goal. We wish image generation statement can be applied to image retrieval, sorting, classification, reflecting its true value.
Keywords/Search Tags:Text Generation, Image Annotation, Machine Learning, Cross-Media Retrieval
PDF Full Text Request
Related items