Font Size: a A A

Research On Chinese Text Description Of The Images Based On Deep Learning

Posted on:2020-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:S W LvFull Text:PDF
GTID:2428330575991199Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the increasing number of images on the network since the media era,the image text description technology that enables the machine to accurately understand the image content and feed back to the user in the form of natural language is an important research content in the field of artificial intelligence.Image text description can be used in image retrieval.Unmanned driving,intelligent service robots,early childhood education,virtual reality,etc.It plays a major role.Image text description is a comprehensive task combining computer vision technology with natural language processing technology.The goal is to have the machine use a Chinese sentence to describe the subject of the image,the scene in the image and the relationship between the objects in the image,as well as the attributes and activities in which they are involved.Therefore,how to describe the content of the image accurately and meticulously,and how to generate sentences in line with human reading habits,is the main problem of image text description.In recent years,a breakthrough has been made in the method of describing images in English.However,due to the particularity of Chinese and the scarcity of data sets,the research on Chinese text description of images has been made.Although it can be realized there are some problems such as poor coherence and readability of generated Chinese sentences and incorrect description of image content by generated statements.In view of these problems,this paper mainly carries on the following research work.This paper uses the deep learning method to study the Chinese text description of the image.Propose a new Chinese text description model(IRRU).The IRRU model uses the deep convolutional neural network(DCNN)combined with a double-gate-gated loop unit(GRU)network to complete the encoding and decoding of RGB images and the generation of Chinese sentences,using AICC Image Chinese description data set to complete the training of the model.The first is the feature extraction of the image dataset.The pre-trained network model parameters of the ImageNet image classification dataset are migrated into the Inception_ResNet_V2 network,and the image dataset is extracted using the network.For the text description set,the neural network language model modeling method is used to generate the word embedding matrix of the tag word vector,and the extracted image features are mapped to the word embedding space based on the fully connected method,so that the feature dimension is unified.The final language generation model,based on GRU,designs a two-layer GRU network model,and uses the image features and word embedding features to train the network and obtain the final image Chinese text description model.Finally,it is tested on the evaluation set published by AICC.The model proposed in this paper IRRU is compared with the English text description model(NIC)and the additional expansion model based on the NIC model by using the objective evaluation index Perplexity,BLEU,and ROUGE-L of the language model.The experimental results show that the proposed model can describe the contents of the image in Chinese and the quality of the generated statements is better than that of the other two models.
Keywords/Search Tags:Chinese text description, deep learning, deep convolutional neural network, gated recurrent unit
PDF Full Text Request
Related items