Font Size: a A A

Design And Implementation Of Image Natural Language Description Based On Feature Fusion

Posted on:2020-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:B W LiFull Text:PDF
GTID:2428330572473570Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Currently,image object recognition and detection technology has been widely used in transportation,sports,medical treatment and other fields.On the basis of image object recognition and detection technology,visual data generating natural language description has become a key point of further research in computer vision.Image generating natural language description requires accurate recognition of not only image obj ects but also the interrelationships between different objects in the image,which can generate new natural language descriptions of new scence like human.In the existing models,the image generating natural language description is not accurate enough,which performs worse than the human description.The models generate simple sentences which cannot describe the rich information of images accurately and completely.This paper proposes an image generating natural language description algorithm based on feature fusion that is mainly divided into three parts.To encode the image and text features,the encoding model extracts the multi-dimensional features of the image including the gray feature,the texture feature and the key point feature based on the scale invariance.Deep convolutional neural network is used as the image encoder to extract the deep leval feature.Self-organizing feature Map is used as text encoder for clustering to get the text annotation.To fuse the features of image and text in different feature space,this paper uses the Canonical Correlation Analysis to calculate the correlation of matrices with different feature space.The features are mapped to the same space and fused to expresse the original information more efficiently and accurately.To decode the fused features and generate the serialized sentences,the bidirectional long and short term memory neural network is used as the feature decoder.The bidirectional structure can solve the prediction problem of sequence.Feature decoding is processed according to context information to generate a detailed and accurate natural language description.This paper develops an image generating natural language description system applied to actual scenes in which it plays a significant role in real production.
Keywords/Search Tags:Natural Language Description, Feature Encoding, Feature Fusion, Feature Decoding
PDF Full Text Request
Related items