Font Size: a A A

Research On Image And Text Matching Method Based On Deep Leaming

Posted on:2019-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:L Y YuFull Text:PDF
GTID:2428330596466403Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the number of image and text data has grown dramatically.These data are difficult for the computer to understand and use.In order to better understand,find and manage these data,we propose some methods to process these data properly.We can use deep learning methods to determine whether images and texts are semantically similar.The main research work and achievements of this thesis are as follows:1.We have designed an evaluation model to identify image main objects.It is found that the methods of image detection and recognition cannot accurately identify the image main objects.To solve this problem,we designed a set of evaluation model to identify main objects in complex background images.The model uses selective search algorithm to extract candidate regions of image and uses convolutional neural network to identify multi-objective of images.After the image objects are extracted,we use the established model to calculate the score of each object,and the highest scored object is taken as the main objects.Finally,we determine whether the object is the image main objects based on the size of the candidate area and the object's rare level.It is found that the evaluation model of the extracted main objects has excellent expression effects.2.We propose an image semantic extraction model that fuses the image main objects and scene's knowledge.It is found that the quality of the image semantics produced by general semantic extraction model is low.In order to produce high-quality image semantics,we fuse the image main objects and the prior information of the scene to generate the image's semantics,and propose an MS-Net model to generate the image's description information.We embed the main object and the prior information of the scene into the process of generating the feature vector of the image.The image semantics is extracted by MS-Net are found to have advantages over other methods in BLUE,METEOR,and CIDEr through experiments.3.We have designed a calculation method for the similarity of images and texts.At present,the method of image and text similarity calculation is to extract the main semantics of the image and text first,and then calculate the similarity between the main semantics.However,it will produce a large error.In order to solve this problem,we designed a method to calculate image and text similarity from two different perspectives.We use the WordNet tree to expand the semantic information of the sentence to calculate the similarity of the text.We deduced the context information of the sentence through a recurrent neural network model.We built our own data set to calculate the similarity between images and text.Then we verify that the precision and recall values of the two methods are higher than other methods.In summary,we analyzed the key issues of image and text matching,and provided corresponding solutions.We found that it had a good effect through experiments.
Keywords/Search Tags:Convolutional neural network, Language model, Image semantic representation, Matching image and text
PDF Full Text Request
Related items