Font Size: a A A

Study On Multi-Topic Based Image Caption

Posted on:2020-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:H M ZhangFull Text:PDF
GTID:2428330578450935Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Image Caption is a comprehensive class of problems that combines computer vision(CV)and natural language processing(NLP),it can be easily understood as the process of translating an input image into a description about the content of the image.Achieving this task is challenging for the machine and needs to be divided into the following sub-tasks:(1)Identify the target object in the graph;(2)Find the connection between the target objects;(3)State the image expression content in natural language.Understanding the connection between the target objects and describing them in natural language is a difficult point in the task of image caption.The application scenario of the task is very wide.And the photo can be matched with the text,if the user takes a photo,and the image description generation technology can be matched to the appropriate text,which is convenient for the user to retrieve and saves the user's manual matching time.Or applied to help visually impaired people understand the image content and so on.So far,common image caption methods can be roughly divided into three categories,among which neural network-based is the most accurate and most valuable image caption method.The image caption method based on neural network generally adopts the coding and decoding structure.When the decoder is used to generate the word sequence for the intermediate coding,usually only the word distribution of the training text is considered,and the word distribution under any topic is assumed to be consistent.The influence of the topic on the word distribution is not considered,resulting in the decoder fitting the word distribution in the general sense.In fact,the differences in word distribution under different topic are often very obvious.Therefore,how to combine the topic of the image and the image features to obtain a more accurate text description is the first problem solved in this paper.The next work of this paper is to generate a complete summary of the images under the same topic.The general techniques for generating abstracts are mainly divided into extraction and generation.The representative algorithm of the extraction is mainly TextRank algorithm,but when the algorithm selects sentences only considering the similarity between samples,and ignoring the diversity after the generation of the abstract and the integrity of the information of the abstract,the generated abstract is often related to the group selection sentence,how to combine the sentence group to get more accurate and complete Abstract is the second major problem solved in this paper.Aiming at the above problems,this paper proposes a topic-based image caption method TIC(Topic based Image Caption)and a group-based multi-image summary generation method GIC(Group based Image Caption),the main contents include:(1)A topic-based image caption method TIC is proposed,which designs a multi-topic neural network structure.The network structure is mainly composed of two parts: the traditional NIC model and the probabilistic model based on the topic image description.The model is combined with the topic of the image and the features of the image to be independently trained to obtain a more accurate text description.(2)A group-based multi-image abstraction generation method GIC is proposed.Firstly,the TextRank algorithm is used to find the order of importance of the image generation description under the same topic,and then set a similar threshold,if the two sentences are similar.The threshold is considered to be a group of sentences.When extracting a sentence to generate a digest,assuming that 40% of the original document sentence needs to be extracted,it is necessary to take the total number of sentences in each group and multiply by 40% as the number of sentences extracted by the group.Sort the sentences to generate a summary to ensure the fluency and readability of the summary generation.(3)Finally,a large number of experimental verifications were carried out on several datasets such as MSCOCO,Flickr8 k,and Flickr30 k.The experimental results show that compared with the traditional image caption method,the proposed method TIC is suitable for image caption,and the proposed group-based multi-image abstract generation method does improve the corresponding evaluation criteria.
Keywords/Search Tags:Long and short term memory network, convolutional neural network, image caption, topic model
PDF Full Text Request
Related items